snakeSV is an integrated pipeline in Snakemake for complete SV analysis. The pipeline includes pre- and post-processing steps to deal with large scale studies. The input data of the pipeline consists of BAM files for each sample, a reference genome file (.FASTA) and a configuration file in yaml format. Additionally, users can also input custom annotation files in BED format for SV interpretation and VCF files with structural variants to be genotyped in addition to the discovery set.
The easiest way of using snakeSV is using Bioconda!
Install snakeSV by creating a separated environment (named "snakesv_env") with the command:
conda create -n snakesv_env -conda-forge -c bioconda snakesv
conda activate snakesv_env # Command to activate the environment. To deactivate use "conda deactivate"
After installing, to test if everything is working well, you can run the pipeline with an example data set included.
# First create a folder to run the test
mkdir snakesv_test
cd snakesv_test
# Run the snakeSV using example data.
snakeSV --test_run
You can also test an installation and small test runs using Google Cloud Shell here
For more details check the wiki pages for detailed configuration and input instructions! We also provide 2 study cases to illustrate uses of customized annotations and genotyping using a panel of SVs discovered using long-reads!
Vialle, R.A., Raj, T. (2022). snakeSV: Flexible Framework for Large-Scale SV Discovery. In: Proukakis, C. (eds) Genomic Structural Variants in Nervous System Disorders. Neuromethods, vol 182. Humana, New York, NY. link