Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exomiser v14.0.0 is not recognizing unstructured key meta-information as a valid header line. #559

Open
ShrutiMarwaha opened this issue May 9, 2024 · 3 comments

Comments

@ShrutiMarwaha
Copy link

Hi there,

  1. Does exomiser support only certain vcf file format and above?
  2. Exomiser v14.0.0 is not recognizing unstructured key meta-information line with key as “##META” as a valid header line.
    I have some old vcfs that have 16 rows in header that start with “##META” and are unstructured meta information lines. However, this seems to be allowed in vcfv4.4 (page 5, section 1.4).
    When I run this vcf through exomiser, I get the following error:
    htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Invalid VCFSimpleHeaderLine: key=META name=null, for input source: file:https:///oak/stanford/groups/euan/UDN/gateway/data/UDN644400/WES/FromSequencingCore/WES_blood_hg19/Processed/UDN644400_family_merged.vcf.gz
    at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97) ~[htsjdk-3.0.5.jar:3.0.5]
    at htsjdk.tribble.TabixFeatureReader.(TabixFeatureReader.java:82) ~[htsjdk-3.0.5.jar:3.0.5]
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117) ~[htsjdk-3.0.5.jar:3.0.5]
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81) ~[htsjdk-3.0.5.jar:3.0.5] ……..
    ……………………….
  • ##fileformat=VCFv4.0

  • Lines with #META in header from my vcf:
    ##META='Cassandra_version=15.4.29'
    ##META='Pileup_File=/stornext/snfswgl/next-gen/Illumina/Instruments/D00143/170130_D00143_0967_BHF7KHBCXY/Results/Project_170130_D00143_0967_BHF7KHBCXY/Sample_HF7KHBCXY-2-ID10/SNP/7
    ##META='Annovar-refGene(hg19).Version=2013-08-23'
    ##META='Annovar-knownGene(hg19).Version=2013-08-23'
    ##META='Annovar-ensgene(hg19).Version=2013-08-23'
    ##META='Annovar-ensgene(GRCh37_MT).Version=2013-08-23'
    ##META='DbNSFP.Description=The dbNSFP is an integrated database of functional annotations from multiple sources for the comprehensive collection of human non-synonymous SNPs. v2.5.
    ##META='Hgmd.Database_version=null.Description=HGMD_PRO_2016.1.Downloaded=2016-07-8'
    ##META='1000 Genomes Phase 1.Description=SNPs Indels and SVs friom 1000 Genomes.Downloaded=2014-03-04'
    ##META='DbSNP.Description=NCBIs SNP database. v141 (GRCh37).Downloaded=2014-07-16'
    ##META='ARIC.Description=Allele freq from Aric cohort.Downloaded=2014-7-16'
    ##META='Mappability.Description=Encode 100bp alignability track. v1.Downloaded=2014-03-04'
    ##META='CgMaf.Description=Complete genomics variations from the reference genome identified across 54-genome subset of the 69 CG public genomes. Version 2.Downloaded=2014-03-04'
    ##META='ESP.Description=ESP5400 taken from 5400 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data. Version 1.Downloaded=2014-03-04'
    ##META='Encode.Description=Reglatory features from Encode. Taken from ensembl release 75.Downloaded=2014-03-04'
    ##META='Swissprot.Description=Uniprot gene annotation. Version 2014_02.Downloaded=2014-07-16'
    ##INFO=<ID=ReqIncl,Number=.,Type=String,Description="Site was required to be included in the VCF">

  • If I delete the rows with “##META” in the header of my vcf file, I can successfully run exomiser. However, I have several such vcf and do not want to create new vcfs with modified header. Is there a way to mitigate this?

Thanks,
Shruti

Shruti Marwaha, PhD.
Research Engineer,
Stanford Center for Undiagnosed Diseases
GREGoR (Genomics Research to Elucidate the Genetics of Rare disease) Stanford Site
Stanford University

@julesjacobsen
Copy link
Contributor

That's weird. Did it run OK on earlier versions? Under the hood Exomiser uses the HTSJDK, so support for whatever version of VCF is entirely down to that. I think it only supports up to 4.2.

@julesjacobsen
Copy link
Contributor

Shruti, I'm pretty sure that the issue lies with META being a defined header key and therefore requires the more structured META=<> format.

In VCF 4.3 (https://samtools.github.io/hts-specs/VCFv4.3.pdf) under the changes section 7.2, page 37

Introduced ##META header lines for defining phenotype metadata

This is shown on page 7 section 1.4.8

1.4.8 Sample field format
It is possible to define sample to genome mappings as shown below:
##META=<ID=Assay,Type=String,Number=.,Values=[WholeGenome, Exome]>
##META=<ID=Disease,Type=String,Number=.,Values=[None, Cancer]>
##META=<ID=Ethnicity,Type=String,Number=.,Values=[AFR, CEU, ASN, MEX]>
##META=<ID=Tissue,Type=String,Number=.,Values=[Blood, Breast, Colon, Lung, ?]>
##SAMPLE=<ID=Sample1,Assay=WholeGenome,Ethnicity=AFR,Disease=None,Description="Patient germline genome from unaffected",DOI=url>
##SAMPLE=<ID=Sample2,Assay=Exome,Ethnicity=CEU,Disease=Cancer,Tissue=Breast,Description="European patient exome from breast cancer">

So, this would mean that the line should have the form ##META=<ID....>, but this is for VCFv4.3. Your old files are v4.0, which your file states it is, and should therefore be considered legal.

I think HTSJDK effectively supports VCFv4.3 read and VCFv4.2 writing, which would explain why the error is happening. It would be more useful if they could precisely support the version stated in the header or throw an error about the type not matching the version they do fully support. What they actually support isn't clearly defined outside of checking that the file starts with the header "##fileformat=VCFv4".

@ShrutiMarwaha
Copy link
Author

Thanks Jules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants