Updated: Sep 19, 2023
This github is originally forked from https://github.com/sheila12345/biastools
- samtools=v1.11
- bcftools=v1.9
- bedtools=v2.30.0
- gzip=v1.9
- tabix=v1.9
- bowtie2=v2.4.2
- bwa=v0.7.17
- mason_simulator=v2.0.9 (only for biastools --simulate)
- SeqAn=v2.4.0 (only for biastools --simulate)
pip install biastools
git clone https://github.com/maojanlin/biastools.git
cd biastools
Though optional, it is a good practice to install a virtual environment to manage the dependancies:
python -m venv venv
source venv/bin/activate
Now a virtual environment (named venv) is activated. Install biastools:
python setup.py install
$ biastools --simulate --align --analyze -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_id>
With the example command, biastools
- Simulates reads based on
<ref.fa>
and<vcf>
, generating pair-end.fq.gz
files for both haplotypes (work_dir/sample_name.hap{A,B}_{1,2}.fq.gz
). - Aligns the reads to the reference
<ref.fa>
, generating a BAM file with phasing information (work_dir/sample_name.run_id.sorted.bam
). - Analyzes the BAM file with the context-aware assignment method, generating bias reports and plots.
Biastools supports Bowtie 2 and bwa mem aligners. BAM files from other aligners (named with <work_dir/sample_name.run_id.sorted.bam>
and tagged with haplotype information) can be analyzed with
$ biastools --analyze -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_id>
Biastools can also analyze real sequence data with the -R
option using the context-aware assignment algorithm. The resulting plot does not include simulation information (sample_id.real.indel_balance.pdf
).
$ biastools --analyze -R -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_id> \
--bam <path_to_target.bam>
Multiple analysis result can be combined into one single indel-balance plot.
$ biastools --analyze -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_id> \
-lr file1.bias file2.bias file3.bias... \
-ld run_id1 run_id2 run_id3...
The output file sample_name.combine.sim.indel_balance.pdf
merges the indel information of the bias reports after the -l
option with the simulated balance information. For real bias report only, using the -R
option to generate combined plot file sample_name.combine.real.indel_balance.pdf
without simulated balance.
Biastools can predict if a variant is bias or not by:
$ biastools --predict -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_pd_id> -pr <path_to_bias_report>
With the example command, biastools
4. Generates two files: sample_name.real.pd_id_bias.tsv
and sample_name.real.pd_id_suspicious.tsv
. The bias.tsv
report contains all sites predicted to be biased by the model. The suspicious.tsv
file contains the sites which suspicious of lacking enough information from the VCF file. In another word, the reads align to the site shows different pattern to the haplotype indicated by the VCF file.
$ biastools --predict -o <work_dir> -g <ref.fa> -v <vcf> -s <sample_name> -r <run_pd_id> \
-pr <path_to_bias_report> \
-ps <path_to_simulated_bias_report>
If the report of the sample based on simulated data is presented, biastools can generate cross prediction experiment result. In the experiment, the ground truth bias sites are based on simulation data.
$ biastools_scan --scan -o <work_dir> -g <ref.fa> -s <sample_name> -r <run_id> -i <path_to_target.bam>
Biastools transforms the <path_to_target.bam>
into the mpileup format and generates baised and suspicious regions (sample_name.run_id.bias.bed
and sample_name.run_id.suspicious.bed
).
$ biastools_scan --compare_bam -o <work_dir> -g <ref.fa> -s <sample_name> -r <run_id> \
-i <path_to_target.bam> \
-i2 <path_to_second.bam> \
-m <path_to_target.mpileup> \
-m2 <path_to_second.mpileup>
Biastools generates a common baseline from path_to_target.bam
and path_to_second.bam
, and uses the new common baseline to recalculate the bias regions based on the two mpileup files. The mpileup files can be generated by running scanning first, or directly run the bcftools consensus.
User can also generate the comparison of the bias reports without a common baseline (not recommended):
$ biastools_scan --compare_rpt -o <work_dir> -s <sample_name> -r <run_id> \
-b1 <path_to_target_bias.bed> \
-b2 <path_to_improved_bias.bed> \
-l2 <path_to_improved_lowRd.bed>