This is a refactored version of SV2. Tested with Python 3.9.7. We recommend creating a separate conda/miniconda (or mamba/micromamba) environment with this version of Python, and then using your created environment's pip
command to install the requirements, i.e. pip install --requirements.txt
Example run command:
python run_sv2.py --alignment_file /expanse/projects/sebat1/genomicsdataanalysis/resources/NA12878/illumina_platinum_pedigree/NA12878.alt_bwamem_GRCh38DH.20150706.CEU.illumina_platinum_ped.cram \
--reference_fasta /expanse/lustre/projects/ddp195/j3guevar/resources/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa \
--snv_vcf_file /home/j3guevar/tests/SV2_nim/src/SV2_nimpkg/refactored_sv2/input_files/NA12878.vcf.gz \
--exclude_regions_bed data/excluded_regions_bed_files/hg38_excluded.bed.gz \
--sv_bed_file data/test_data/file.nbl.test2.bed \
--sex male \
--sample_name NA12878
As of now, we're targeting only reference genome hg38. We can also make it usable for hg19 and other reference types later.
- Test on
HG002
andNA12878
and some ofREACH
cohort - Run from different compute environments/folders
- Once ROC curves are generated and analyzed, we have to decide on FILTER flag criteria
- There might be (un)subtle bugs when generating QUAL scores or even genotype likelihoods, need to catch those
- Add SV2 commands to VCF
- Default output files should include sample name
- Automatic find the
regions_bed
andgc_reference_table
(reduce number of command line arguments) - Make exclude bed file optional?
- Put requirements.txt file in here
- Test when run_sv2.py is a symbolic link