VCFpytools

Assortment of VCF processing scripts

Requirements

Ensure you have Python3 installed on your system
Clone or download the script from this repository

git clone https://github.com/juliawiggeshoff/VCFpytools.git

Merge VCF Headers

This script allows you to merge headers from multiple VCF (Variant Call Format) files into a single output header file. It's useful when you have VCF files with similar headers and you want to consolidate them. This can then be used after or before running bcftools concat --naive[-force], where --naive-force ignores the differences in headers, for example, to modify the concatenated VCF files with all header information from both files, if you opt to include the merged headers only after concatenating your VCF files with bcftools (other concatenation tools).

Input can be gzipped or not. Output filename is optional, but recommended. If not given, will create merged_headers.txt in the working directory.

python merge_vcf_headers.py -vcfs file1.vcf.gz file2.vcf.gz [...] -o output_merged_header.txt

Optionally, you can include headers other than the mandatory ##fileformat, ##contig, ##FILTER, ##INFO, and ##FORMAT. These are usually commands used to process the VCF files (e.g. bcftools, vcflib, etc), dates of file creation, version of tool, etc.

python merge_vcf_headers.py -vcfs file1.vcf.gz file2.vcf.gz -o output_merged_header.txt -fullheader

Example usage with bcftools concat

Re-head VCF files before concatenation:

python merge_vcf_headers.py -vcfs file1.vcf.gz file2.vcf.gz -o output_merged_header.txt
(cat output_merged_header.txt; zcat file1.vcf.gz | grep -v '##') | bgzip -f -c > reheaded_file1.vcf.gz
(cat output_merged_header.txt; zcat file2.vcf.gz | grep -v '##') | bgzip -f -c > reheaded_file2.vcf.gz
tabix -f -p vcf file1.vcf.gz 
tabix -f -p vcf file2.vcf.gz 
bcftools concat --naive file1.vcf.gz file2.vcf.gz  > reheaded_concatenated.bcf

Re-head VCF file after concatenation

bcftools concat --naive-force file1.vcf.gz  file2.vcf.gz > concatenated.bcf
bcftools view concatenated.bcf -Oz -o concatenated.vcf.gz
python merge_vcf_headers.py -vcfs file1.vcf.gz file2.vcf.gz -o output_merged_header.txt -fullheader
(cat output_merged_header.txt; zcat concatenated.vcf.gz | grep -v "##") | bgzip -f -c > reheaded_concatenated.vcf.gz
tabix -f -p vcf reheaded_concatenated.vcf.gz

Parse and Calculate VAF from Strelka2 somatic VCF files

This script parses Strelka2 Variant Call Format (VCF) files and calculates Variant Allele Frequency (VAF) for INDELs or SNVs. It follows the recommendations from the developers.

Input can be gzipped or not. Output will be a VCF (not gzipped) and setting the filename is mandatory, as well as setting the type of variant based on the somatic output files from Strelka2. Information found here

python add_vaf_strelka2.py --input somatic.snvs.vcf.gz --output somatic.snvs.VAF.vcf --variant snv
python add_vaf_strelka2.py --input somatic.indels.vcf.gz --output somatic.indels.VAF.vcf --variant indel

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
add_vaf_strelka2.py		add_vaf_strelka2.py
merge_vcf_headers.py		merge_vcf_headers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VCFpytools

Requirements

Merge VCF Headers

Example usage with bcftools concat

Parse and Calculate VAF from Strelka2 somatic VCF files

About

Releases

Packages

Languages

License

juliawiggeshoff/VCFpytools

Folders and files

Latest commit

History

Repository files navigation

VCFpytools

Requirements

Merge VCF Headers

Example usage with bcftools concat

Parse and Calculate VAF from Strelka2 somatic VCF files

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages