Generating proteogenomic database for Pseudomonas with VCF called from WGS (or exome seq) data #185

animesh · 2020-08-23T15:36:49Z

I am wondering how can i add something like Pseudomonas aeruginosa ?

The fasta file for the reference proteome is available at https://www.uniprot.org/proteomes/UP000002438 , any ideas on how to proceed will be appreciated :)

trishorts · 2020-08-23T17:24:11Z

You want to do Spritz for Pseudomonas?

acesnik · 2020-08-23T18:37:07Z

Spritz is currently built to call variants from eukaryotes with RNA-Seq data, so this would take a new workflow.

What type of sequencing data do you have for the sample (e.g. exome, genome)?

Here's the ensembl genome for Pseudomonas: http:https://bacteria.ensembl.org/Pseudomonas_aeruginosa_pao1/Info/Index. There's no reference VCF like we're using for human in GATK.

acesnik · 2020-08-23T18:42:44Z

We would also need to implement using other codon tables for this feature #164

animesh · 2020-08-24T09:31:22Z

I have WGS data for this bacteria which seems to have diverged from main based on assembly so using canonical proteome is clearly suboptimal. I see that GFF is available at ftp:https://ftp.ensemblgenomes.org/pub/bacteria/current/gff3/bacteria_67_collection/pseudomonas_aeruginosa/ , probably one can use it to call the variants and create a strain-specific VCF ?

acesnik · 2020-08-27T22:00:02Z

This is definitely a good direction to take Spritz. It's also good that the GFF file is available. I know @rmmiller22 was working on vervet monkey samples, which had that situation, i.e. no reference VCF available.

I unfortunately don't have the bandwidth to add this feature to Spritz right now, but we'll keep you posted as we work towards this goal.

By the way, what tool do you typically use to align WGS reads to bacterial genomes? Bowtie/BWA?

acesnik · 2020-08-27T22:23:21Z

Oh, an option in the meantime is that you could generate a VCF file for your sample using other means and run it through the custom SnpEff fork that is part of Spritz with the options -protFasta {file} and -protXml {file} specified. This should generate FASTA and XML files that could be used in MetaMorpheus or other search software. SnpEff has ~270 different Pseudomonas references, which is a lot. For example, one of them is Pseudomonas_aeruginosa, which you could use for this analysis with java -Xmx16M -jar snpEff.jar -v -stats {output.html} -fastaProt {output.protfa} -xmlProt {output.protxml} Pseudomonas_aeruginosa {input.vcf} > {output.vcf}, where the bracketed bits are replaced with your desired input/output files.

acesnik added the New Feature label Aug 23, 2020

acesnik changed the title ~~how to add species for variant call~~ Generating proteogenomic database for Pseudomonas with VCF called from WGS data Mar 24, 2021

acesnik added Question and removed New Feature labels Mar 24, 2021

acesnik changed the title ~~Generating proteogenomic database for Pseudomonas with VCF called from WGS data~~ Generating proteogenomic database for Pseudomonas with VCF called from WGS (or exome seq) data Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating proteogenomic database for Pseudomonas with VCF called from WGS (or exome seq) data #185

Generating proteogenomic database for Pseudomonas with VCF called from WGS (or exome seq) data #185

animesh commented Aug 23, 2020

trishorts commented Aug 23, 2020

acesnik commented Aug 23, 2020

acesnik commented Aug 23, 2020

animesh commented Aug 24, 2020

acesnik commented Aug 27, 2020

acesnik commented Aug 27, 2020 •

edited

Loading

Generating proteogenomic database for Pseudomonas with VCF called from WGS (or exome seq) data #185

Generating proteogenomic database for Pseudomonas with VCF called from WGS (or exome seq) data #185

Comments

animesh commented Aug 23, 2020

trishorts commented Aug 23, 2020

acesnik commented Aug 23, 2020

acesnik commented Aug 23, 2020

animesh commented Aug 24, 2020

acesnik commented Aug 27, 2020

acesnik commented Aug 27, 2020 • edited Loading

acesnik commented Aug 27, 2020 •

edited

Loading