-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exomiser v14.0.0 is not recognizing unstructured key meta-information as a valid header line. #559
Comments
That's weird. Did it run OK on earlier versions? Under the hood Exomiser uses the HTSJDK, so support for whatever version of VCF is entirely down to that. I think it only supports up to 4.2. |
Shruti, I'm pretty sure that the issue lies with In VCF 4.3 (https://samtools.github.io/hts-specs/VCFv4.3.pdf) under the changes section 7.2, page 37
This is shown on page 7 section 1.4.8
So, this would mean that the line should have the form I think HTSJDK effectively supports VCFv4.3 read and VCFv4.2 writing, which would explain why the error is happening. It would be more useful if they could precisely support the version stated in the header or throw an error about the type not matching the version they do fully support. What they actually support isn't clearly defined outside of checking that the file starts with the header |
Thanks Jules. |
Hi there,
I have some old vcfs that have 16 rows in header that start with “##META” and are unstructured meta information lines. However, this seems to be allowed in vcfv4.4 (page 5, section 1.4).
When I run this vcf through exomiser, I get the following error:
htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Invalid VCFSimpleHeaderLine: key=META name=null, for input source: file:https:///oak/stanford/groups/euan/UDN/gateway/data/UDN644400/WES/FromSequencingCore/WES_blood_hg19/Processed/UDN644400_family_merged.vcf.gz
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97) ~[htsjdk-3.0.5.jar:3.0.5]
at htsjdk.tribble.TabixFeatureReader.(TabixFeatureReader.java:82) ~[htsjdk-3.0.5.jar:3.0.5]
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117) ~[htsjdk-3.0.5.jar:3.0.5]
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81) ~[htsjdk-3.0.5.jar:3.0.5] ……..
……………………….
##fileformat=VCFv4.0
Lines with #META in header from my vcf:
##META='Cassandra_version=15.4.29'
##META='Pileup_File=/stornext/snfswgl/next-gen/Illumina/Instruments/D00143/170130_D00143_0967_BHF7KHBCXY/Results/Project_170130_D00143_0967_BHF7KHBCXY/Sample_HF7KHBCXY-2-ID10/SNP/7
##META='Annovar-refGene(hg19).Version=2013-08-23'
##META='Annovar-knownGene(hg19).Version=2013-08-23'
##META='Annovar-ensgene(hg19).Version=2013-08-23'
##META='Annovar-ensgene(GRCh37_MT).Version=2013-08-23'
##META='DbNSFP.Description=The dbNSFP is an integrated database of functional annotations from multiple sources for the comprehensive collection of human non-synonymous SNPs. v2.5.
##META='Hgmd.Database_version=null.Description=HGMD_PRO_2016.1.Downloaded=2016-07-8'
##META='1000 Genomes Phase 1.Description=SNPs Indels and SVs friom 1000 Genomes.Downloaded=2014-03-04'
##META='DbSNP.Description=NCBIs SNP database. v141 (GRCh37).Downloaded=2014-07-16'
##META='ARIC.Description=Allele freq from Aric cohort.Downloaded=2014-7-16'
##META='Mappability.Description=Encode 100bp alignability track. v1.Downloaded=2014-03-04'
##META='CgMaf.Description=Complete genomics variations from the reference genome identified across 54-genome subset of the 69 CG public genomes. Version 2.Downloaded=2014-03-04'
##META='ESP.Description=ESP5400 taken from 5400 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data. Version 1.Downloaded=2014-03-04'
##META='Encode.Description=Reglatory features from Encode. Taken from ensembl release 75.Downloaded=2014-03-04'
##META='Swissprot.Description=Uniprot gene annotation. Version 2014_02.Downloaded=2014-07-16'
##INFO=<ID=ReqIncl,Number=.,Type=String,Description="Site was required to be included in the VCF">
If I delete the rows with “##META” in the header of my vcf file, I can successfully run exomiser. However, I have several such vcf and do not want to create new vcfs with modified header. Is there a way to mitigate this?
Thanks,
Shruti
Shruti Marwaha, PhD.
Research Engineer,
Stanford Center for Undiagnosed Diseases
GREGoR (Genomics Research to Elucidate the Genetics of Rare disease) Stanford Site
Stanford University
The text was updated successfully, but these errors were encountered: