hydra-genetics/qc

https://hydra-genetics-qc.readthedocs.io

hydra-genetics/qc

Snakemake module containing steps for quality control of sequencing data

💬 Introduction

The module contains rules to run different quality control tools. Most output can be compiled to one .html report using multiqc.

Fastqc is checking the raw .fastq files.
Mosdepth, samtools stats and picard's collect metrics generates reports on the resulting .bam files.
Peddy generates sex checks
Rseqc reports on different RNA-specific QC measures

❗ Dependencies

In order to use this module, the following dependencies are required:

Note! Releases of qc <= v0.1.0 needs tabulate<0.9.0 added in requirements.txt

🎒 Preparations

Sample and unit data

Input data should be added to samples.tsv and units.tsv. The following information need to be added to these files:

Column Id	Description
`samples.tsv`
sample	unique sample/patient id, one per row
tumor_content	ratio of tumor cells to total cells
`units.tsv`
sample	same sample/patient id as in `samples.tsv`
type	data type identifier (one letter), can be one of Tumor, Normal, RNA
platform	type of sequencing platform, e.g. `NovaSeq`
machine	specific machine id, e.g. NovaSeq instruments have `@Axxxxx`
flowcell	identifer of flowcell used
lane	flowcell lane number
barcode	sequence library barcode/index, connect forward and reverse indices by `+`, e.g. `ATGC+ATGC`
fastq1/2	absolute path to forward and reverse reads
adapter	adapter sequences to be trimmed, separated by comma

Reference data

A reference .fasta-file should be specified in config.yaml in the section reference and fasta. Picard requires also interval files for WGS, wgs_intervals, and for HS metrics, design_intervals. Mosdepth, samtools, and Rseqc require a .bed file with specified regions to be analysed.

MultiQC

Files to be included in the MultiQC report is specified in the config file, see example config. The report can be split into several reports based on for example sample type or rna/dna. A report configuration file for each MultiQC report can also be specified by the config tag in the config file which is forwarded to MultiQC.

✅ Testing

The workflow repository contains a small test dataset .tests/integration which can be run like so:

$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --configfile config.yaml --use-singularity

🚀 Usage

To use this module in your workflow, follow the description in the snakemake docs. Add the module to your Snakefile like so:

module qc:
    snakefile:
        github(
            "hydra-genetics/qc",
            path="workflow/Snakefile",
            tag="v0.1.0",
        )
    config:
        config


use rule * from qc as qc_*

Compatibility

Latest:

alignment:v0.4.0
prealignment:v1.1.0

See COMPATIBLITY.md file for a complete list of module compatibility.

Output files

The following output files should be targeted via another rule:

File	Description
`qc/multiqc/multiqc.html`	`.html` report from multiqc
`qc/multiqc/multiqc_data`	directory holding multiqc data

Name		Name	Last commit message	Last commit date
Latest commit History 294 Commits
.github		.github
.tests		.tests
config		config
docs		docs
images		images
workflow		workflow
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
COMPATIBLITY.md		COMPATIBLITY.md
LICENSE.md		LICENSE.md
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements.test.txt		requirements.test.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hydra-genetics/qc

Snakemake module containing steps for quality control of sequencing data

💬 Introduction

❗ Dependencies

🎒 Preparations

Sample and unit data

Reference data

MultiQC

✅ Testing

🚀 Usage

Compatibility

Output files

🧑‍⚖️ Rule Graph

About

Releases 6

Packages

Contributors 8

Languages

License

hydra-genetics/qc

Folders and files

Latest commit

History

Repository files navigation

hydra-genetics/qc

Snakemake module containing steps for quality control of sequencing data

💬 Introduction

❗ Dependencies

🎒 Preparations

Sample and unit data

Reference data

MultiQC

✅ Testing

🚀 Usage

Compatibility

Output files

🧑‍⚖️ Rule Graph

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 8

Languages

Packages