cna-pipeline

Snakemake workflow to analyze somatic copy number alterations from whole exome data using Sequenza and CNVkit

Sample File Format

samples.csv should be a CSV file with the following columns:

patient - patient identifier
sample_type - either normal or tumor
bam - filepath for BAM for that sample

All patients are required to have both normal and tumor BAMs.

Installation

Install Snakemake
Install singularity or all dependencies (see Dependencies section)
Clone repository
Create samples.csv file with sample information (see Samples File Format section)
Modify config.yaml
If using a SLURM cluster for execution, edit cluster.json and run_pipeline.sh

Dependencies

Singularity is recommended to ensure reproducibility and avoid having to manually manage installation of Sequenza, CNVkit, and R. If you would prefer to not use Singularity, make sure the following programs are installed and accessible from your command line:

sequenza-utils command line tool
CNVkit
R

The following R libraries are also needed:

sequenza
readr

Running

NOTE: This workflow saves all intermediate and output files to the same directory where Snakefile is located. Ensure you have the appropriate storage before running

If running locally

snakemake -j [# of cores] [--use-singularity]

If running on a SLURM cluster (after completing cluster.json and run_pipeline.sh

# This needs to be run from the directory where Snakefile is located
sbatch run_pipeline.sh

Output

results/sequenza_info.csv - sequenza purity and ploidy estimates for each tumor
cnvkit-results - CNVKit segmentation files and call files
sequenza/[patient] - results/ - sequenza segmentation files for that [patient]
access.5kb.hg38.bed - accessible regions in the reference fasta at 5kb windows

sequenza_info.csv and cnvkit-results/ will be the most interesting to users. I primarily use Sequenza for purity/ploidy estimation and CNVKit for copy number segmentation.

See this paper for a discussion of copy number caller performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cna-pipeline

Sample File Format

Installation

Dependencies

Running

Output

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
scripts		scripts
README.md		README.md
Snakefile		Snakefile
cluster.json		cluster.json
config.yaml		config.yaml
run_pipeline.sh		run_pipeline.sh
samples.csv		samples.csv

tjbencomo/cna-pipeline

Folders and files

Latest commit

History

Repository files navigation

cna-pipeline

Sample File Format

Installation

Dependencies

Running

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages