EpiDiverse/snp is a bioinformatics analysis pipeline for calling single nucleotide polymorphism variants from bisulfite sequencing data and/or for clustering of eg. environmental plant samples according to their methylation profiles while masking the genomic variation.
The workflow pre-processes a collection of bam files from the EpiDiverse/WGBS pipeline using samtools, then masks genomic and/or bisulfite variation relative to the reference using custom scripts. Genomic masked alignments are then extracted into fastq format and tested for kmer diversity using kWIP for clustering groups. Bisulfite-masked alignments are taken forward for variant calling using a combination of Freebayes and post-call filtering with bcftools.
See the output documentation for more details of the results.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
-
Install
nextflow
-
Install one of
docker
,singularity
orconda
-
Download the pipeline and test it on a minimal dataset with a single command
NXF_VER=20.07.1 nextflow run epidiverse/snp -profile test,<docker|singularity|conda>
- Start running your own analysis!
NXF_VER=20.07.1 nextflow run epidiverse/snp -profile <docker|singularity|conda> \
--input /path/to/wgbs/bam --reference /path/to/reference.fa
See the usage documentation for all of the available options when running the pipeline.
The EpiDiverse/snp pipeline is part of the EpiDiverse Toolkit, a best practice suite of tools intended for the study of Ecological Plant Epigenetics. Links to general guidelines and pipeline-specific documentation can be found below:
- Installation
- Pipeline configuration
- Running the pipeline
- Understanding the results
- Troubleshooting
These scripts were originally written for use by the EpiDiverse European Training Network, by Adam Nunn (@bio15anu).
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 764965
If you use epidiverse/snp for your analysis, please cite it using the following doi: