exotic-dna-ngs

WIP & POC: Count Viral and other non-human genome signatures in Human NGS data

VERSION: 0.1-alhpa
Author: Dr Stephen J Newhouse; [email protected]

Aims

Build a simple pipeline to detect Viral and other non-human genome signatures in WGS NGS Data.

💡 The idea is to use standard open source tools to detect and quantify Viral and other non-human genome signatures in WGS NGS Data.

The basic workflow consists of aligning short read data (using bwa and/or hisat2) to a genome index containing the human b38 reference sequence and all viral and other non-human genome fasta files. samtools idxstats will then be used to generate statistics on the number of mapped reads per human chromosome and viral/other genome. We will use this a naive measure of the presence or absence of viral/other DNA in the first instance. The number of reads mapped could be treated as count data, and used to produce some measure of quantification akin to RNA-seq based quantification methods e.g. TPM.

The motivation: We are not alone. Things live in and on us and increasing evidence is pointing towards the importance of our microbiome in disease and health. Plus, DNA extraction is not perfect - unless specifically designed to enrich for human DNA, most DNA extraction methods will capture everything (within limits) i.e. its not just human DNA in your sample prep!.

The basic questions:

Can we detect other DNA in WGS NGS data?
What is the origin of this DNA (Viral, bacterial, fungal, other...)?
Can we quantify this is any way?
If so, are there any menaingful differences between:
- labs (contamination, batch procssessing etc)
- disease groups (case v control)
- geography
- ...any other groups you can think off?

What about Kraken and other awesome things like sourmash!? I here you say. Well, they are awesome, but compute and memory intensive...I have yet to play with sourmash, but I will. I want to provide something a little less "hungry" and try and kill two birds with one stone i.e the increase in accuracy in read alignment and variant calling gained from having "decoy" genomes in the mix, and, try and query the metagenome, all at the same time. Here the decoy genomes are the viral+everything else. Hah - but, its NGS data, and will be compute intensive, and requires dowloading the world of geomes and re-inventing the wheel and blah blah blah - I hear you say again...let's see how this all turns out.

The Docs

WIP: see Docs
bwa mapping pipeline (nextflow): map-bwa.nf

NOT TESTED!

Core Contributors

Stephen J Newhouse | [email protected] | git:snewhouse | twitter:@s_j_newhouse

Contributors

Full list at Contributors

Contributing

Here’s how we suggest you go about proposing a change to this project:

Fork this project to your account.
Create a branch for the change you intend to make.
Make your changes to your fork.
Send a pull request from your fork’s branch to our master branch.

Using the web-based interface to make changes is fine too, and will help you by automatically forking the project and prompting to send a pull request too.

Licence

MIT

Development funded as part of:
NIHR Maudsley Biomedical Research Centre (BRC), King's College London and the
Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London
PHI Data Lab:
Institute of Psychiatry, Psychology & Neuroscience,King's College London.

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
bin		bin
docs		docs
scripts		scripts
src		src
temp		temp
.gitignore		.gitignore
CONTRIBUTERS.md		CONTRIBUTERS.md
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

exotic-dna-ngs

Aims

The Docs

Core Contributors

Contributors

Contributing

Licence

About

Releases

Packages

Languages

License

snewhouse/exotic-dna-ngs

Folders and files

Latest commit

History

Repository files navigation

exotic-dna-ngs

Aims

The Docs

Core Contributors

Contributors

Contributing

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages