https://hydra-genetics-qc.readthedocs.io
The module contains rules to run different quality control tools. Most output can be compiled to one .html
report using multiqc.
- Fastqc is checking the raw
.fastq
files. - Mosdepth, samtools stats and picard's collect metrics generates reports on the resulting
.bam
files. - Peddy generates sex checks
- Rseqc reports on different RNA-specific QC measures
In order to use this module, the following dependencies are required:
Note! Releases of qc <= v0.1.0 needs tabulate<0.9.0 added in requirements.txt
Input data should be added to samples.tsv
and units.tsv
.
The following information need to be added to these files:
Column Id | Description |
---|---|
samples.tsv |
|
sample | unique sample/patient id, one per row |
tumor_content | ratio of tumor cells to total cells |
units.tsv |
|
sample | same sample/patient id as in samples.tsv |
type | data type identifier (one letter), can be one of Tumor, Normal, RNA |
platform | type of sequencing platform, e.g. NovaSeq |
machine | specific machine id, e.g. NovaSeq instruments have @Axxxxx |
flowcell | identifer of flowcell used |
lane | flowcell lane number |
barcode | sequence library barcode/index, connect forward and reverse indices by + , e.g. ATGC+ATGC |
fastq1/2 | absolute path to forward and reverse reads |
adapter | adapter sequences to be trimmed, separated by comma |
A reference .fasta
-file should be specified in config.yaml
in the section reference
and fasta
. Picard
requires also interval files for WGS, wgs_intervals
, and for HS metrics, design_intervals
. Mosdepth,
samtools, and Rseqc require a .bed
file with specified regions to be analysed.
Files to be included in the MultiQC report is specified in the config file, see example config. The report can be split into several reports based on for example sample type or rna/dna. A report configuration file for each MultiQC report can also be specified by the config
tag in the config file which is forwarded to MultiQC.
The workflow repository contains a small test dataset .tests/integration
which can be run like so:
$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --configfile config.yaml --use-singularity
To use this module in your workflow, follow the description in the
snakemake docs.
Add the module to your Snakefile
like so:
module qc:
snakefile:
github(
"hydra-genetics/qc",
path="workflow/Snakefile",
tag="v0.1.0",
)
config:
config
use rule * from qc as qc_*
Latest:
- alignment:v0.4.0
- prealignment:v1.1.0
See COMPATIBLITY.md file for a complete list of module compatibility.
The following output files should be targeted via another rule:
File | Description |
---|---|
qc/multiqc/multiqc.html |
.html report from multiqc |
qc/multiqc/multiqc_data |
directory holding multiqc data |