Nextflow pipeline using containers for an Outbreak detection challenge using OpenEbench platform

This repository intends to be a nextflow + container implementation of OpenEbench workflow for an Outbreak detection challenge.

It has been originally both at https://github.com/BU-ISCIII/openebench_gmi.git and https://github.com/javi-gv94/openebench_gmi.git.

How to use it

git clone https://github.com/inab/openebench_gmi.git
cd openebench_gmi.git
git submodule init
git submodule update

# Next command it is needed to build the containers
./materialize-containers.bash

# Next command launches a test execution
nextflow run main.nf -profile docker

Parameters available:

nextflow run main.nf --help

Usage:
nextflow run inab/openebench_gmi --tree_test {test.newick.file} --goldstandard_dir {golden.folder.path} --assess_dir {assessment.path} --public_ref_dir {path.to.info.ref.dataset} --challenges_ids {event.id}

Mandatory arguments:
  --tree_test                   Path to input data (must be surrounded with quotes).
  --goldstandard_dir            Path to reference data. Golden datasets.
  --public_ref_dir              Path where public dataset info is stored for validation.
  --assess_dir                  Path where benchmark data is stored.
  --challenges_ids                    Event identifier.
  --participant_id              Participant identifier.
  --tree_format                 Format tree ["nexus","newick"].

Other options:
  --outdir                      The output directory where the results will be saved

Datasets

First of all, needed datasets have been collected in: datasets folder

Input dataset: fastq input data obtained from GMI WGS standards and benchmarks repository. Here you can find instructions for download.
Gold standard dataset: confirmed phylogeny for the outbreak being investigated.
Input dataset ids: input dataset ids in .txt and .json format.
Test dataset: a test tree for comparing with gold standard result. In this case just the same golden dataset. Robinson-Foulds metrics must be 0.
benchmark_data: path where benchmark results are stored.

Nextflow pipeline and containers

Second, a pipeline has been developed which is splitted in three steps following OpenEbench specifications following TCGA benchmarking repo as an example:

Nextflow processes

Validation and data preprocessing:
1. Check results format:
  - Tree input: User input tree format is validated, nexus and newick formats are allowed being newick the canonical format. If format validated, a tree is outputted in the canonical format (.nwk).
  - VCF input:
2. Get query ids:
  - Tree input: ids are extracted for user input tree in newick or nexus format. IDs are writed in: queryids.json
3. Validate query ids:
  - Tree input: query ids are validated against ref input ids.
Metrics:
1. Precision/Recall calculation: common (TP), source (FP) and ref(FN) edges are calculated in the comparison of ref and test tree topologies. Recall and precision are calculated using this values and stored in a json file called {participant_id}_snprecision.json.
2. Robinson-Foulds metric calculation: Normalized Robinson-Foulds test is performed between user tree and every participant tree already analyzed and stored in the benchmark_data folder in order to compare their topologies. Result value is writted to participant_matrix.json file.
Data visualization and consolidation:
Precision/Recall graph is created, classifying each participant inside a quartile.
A all participant vs all participant heatmap is created usign normalized robinson-foulds matrix.

Containers info

Each step runs in its own container. Containers are built using a Dockerfile recipe which makes use of SCI-F recipes for software installation. All scif recipes are available in scif_app_recipes repository. Singularity recipes are also provided (Not yet adapted in nextflow pipeline).

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
conf		conf
containers		containers
datasets		datasets
nf-stats		nf-stats
notebooks		notebooks
other-results		other-results
results		results
scif_app_recipes @ bb9dec3		scif_app_recipes @ bb9dec3
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
materialize-containers.bash		materialize-containers.bash
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nextflow pipeline using containers for an Outbreak detection challenge using OpenEbench platform

How to use it

Datasets

Nextflow pipeline and containers

Nextflow processes

Containers info

About

Releases

Packages

Languages

License

inab/openebench_gmi

Folders and files

Latest commit

History

Repository files navigation

Nextflow pipeline using containers for an Outbreak detection challenge using OpenEbench platform

How to use it

Datasets

Nextflow pipeline and containers

Nextflow processes

Containers info

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages