Skip to content

Roslin Results v2.4

Cyriac Kandoth edited this page Jan 30, 2019 · 2 revisions

Description of results from Roslin Variant v2.4

Results delivered to investigators consists of the following folders at the main project level. $ROOT refers to the path to the top of the output hierarchy and in that folder, files/folders are organized as follows:

  • $ROOT/results: Primary results directory, most users should look here first.
  • $ROOT/docs: Documentation and pipeline parameters, settings versions.
  • $ROOT/bams: BAM files with indices
  • $ROOT/output: Detailed output of the pipeline

$ROOT/docs

This folder contains documentation about the output and pipeline and also the parameters used in this specific run of the pipeline. The input arguments/parameters are in the folder: inputs. A PDF copy of this file is in this folder manifest.pdf. This version of the documentation will match the pipeline version run.

The docs folder has a subfolder qc with a PDF of some of the core QC-metrics (full metric output is in the output) folder. The file is ${projNo}_QC_Report.pdf.

The input folder has the following files:

  • inputs.yaml - All inputs self-contained in a format that Roslin likes, including paths to reference genomes, assay target/bait sets, known somatic hotspots, etc.

  • settings - Version information of Roslin pipeline and Roslin core

  • ${projNo}_request.txt - Relevant information from the original iLabs request

  • ${projNo}_sample_mapping.txt - Maps samples to the folders containing their FASTQs

  • ${projNo}_sample_pairing.txt - Pairs tumors to normals for somatic variant calling

  • ${projNo}_sample_grouping.txt - Groups samples that belong to the same patient

  • ${projNo}_sample_data_clinical.txt - Patient/sample data from the investigator

$ROOT/results

The files in this folder all start with a prefix which is the project number (e.g.; Proj_01234), here we will denote that with $projNo. There are three primary results types given: mutations, copy number (using the FACETS algorithm), and fusions. Each results type is in its own folder: variants, copyNumber, rearrangements

$ROOT/results/variants

There are two primary MAF files in the $ROOT/results/variants folder: The portal (DMP) MAF and the analysis MAF. The portal MAF contains a subset of the events that are in the analysis MAF. The steps to creating the analysis MAF are as follows:

- Merge all sample level MAFs (which are in `$ROOT/output/maf`)

- Remove events that are tag as false-positives and any from cmo_fillout.

- Remove splice region variants in non-coding genes, or those that are >3bp into introns.
    - For indels, use the closest distance to the nearby splice junction.

- Remove all non-coding events except interesting ones like TERT promoter mutations.

The portal MAF then applies the following additional filters.

- Remove silent aka synonymous mutations

- Remove genes without Entrez IDs, usually non-coding genes

- Remove intronic splice region mutations i.e. >2bp into introns

- For IMPACT/HemePACT runs, apply MSK-IMPACT cutoffs:
    - Total depth >= 20
    - Allele Depth `>=10` for non-hotspots, `>=8` for hotspots
	- VAF `>=5%` for non-hotspots, VAF `>=2%` for hotspots
    - Length f Indel or ONP < 30bps

Copy Number

Output from the FACETS copy number method.

  • ${projNo}.hisens.gene.cna.txt: Unified (all samples) gene level calls file.

  • ${projNo}.hisens.seg.cna.txt: Unified segmentation file in IGV format.

Facets was run in a two pass mode: Pass (1) was to option purity estimates with coarse grain parameters to get large features accurately and pass (2) with hi-sensitivity parameters to increase spatial resolution. In the results folder we have only the output from the hisens pass (the purity pass) is available in the full output folder.

  • ${projNo}.purity.Parameters.out - The run parameters used for the specific pass
  • ${projNo}.purity.SamplesValues.out - Sample level output from the Facets algorithm.
  • ${projNo}.purity.CNCF.pdf - Plots of the bi-segmentation profiles and integer copy number calls
  • ${projNo}.purity.cncf.txt - Segment level copy number and cell fraction and integer copy number.

Fusions

$ROOT/bams

The fully processed (markduplicated,realigned,recalibrated) BAM files for the project with indices. N.B. the two copies of the index files (.bai and .bam.bai) are indentical, both are provide because certian bioinformatics tools will only recognize one or the other.

$ROOT/output

The output folder contains a comprehensive set of output files from the pipeline; a verbose results folder.

$ROOT/output/maf

  • ${sampleID}.svs.pass.vep.maf - Structural variants (SVs) in MAF format (IMPACT only)

  • ${sampleID}.muts.maf - Small substitutions and indels in MAF format. False positives have a non-PASS tag in the FILTER column, and “fillout” rows (allele counts per event in other samples) are tagged “None” in the Mutation_Status column.

$ROOT/output/vcf

  • ${sampleID}.vardict.vcf - Substitutions and indels reported by VarDict in VCF format
  • ${sampleID}.mutect.{vcf,txt} - A MuTect VCF plus its more detailed tab-delimited format
  • ${sampleID}.pindel.vcf - Comprehensive VCF of indels reported by Pindel
  • ${sampleID}.svs.vcf - Comprehensive VCF of SVs detected by Delly (IMPACT only)
  • ${sampleID}.svs.pass.vcf - Shortlisted VCF of Delly SVs after some basic filtering

$ROOT/output/facets

  • ${sampleID}_hisens.CNCF.png - A plot for genome-wide integer copy-number (CN) per Facets
  • ${sampleID}_hisens.cncf.txt - Stats per segment with sufficient data for CN estimation
  • ${sampleID}_hisens.seg - Segmented copy-number data listing log-ratios
  • ${sampleID}_purity.* - All the files above, for the run of facets that estimated purity

$ROOT/output/portal

This directory contains the files used for upload to the portal and contain the exact same events displayed in the portal.

  • data_mutations_extended.txt - The subset of mutations that the portal will display. It starts with the events from the main results maf (${projNo}.muts.maf) and then applies the following filters:

    • Remove silent aka synonymous muts

    • Remove genes without Entrez IDs, usually non-coding genes

    • Remove intronic splice region muts i.e. >2bp into introns

    • For IMPACT/HemePACT runs, apply DMP cutoffs:

      • Allele Depth >=8
      • VAF >=5% for non-hotspots, VAF >=2% for hotspots
  • data_clinical.txt - Clinical data per patient/sample to display in the portal

  • data_CNA.txt - Sample x Gene matrix listing discretized copy number alterations

  • data_fusions.txt - Shortlist of somatic structural variants (IMPACT only)

  • ${PI_UUID}_data_cna_hg19.seg - Segmented copy-number data listing log-ratios. ${PI_UUID} is the Roslin/Portal UUID for the project. N.B. this is differnt from $projNo

  • case_lists - Folder containing lists of sample IDs with meta-data for the portal

  • meta_*.txt - Metadata about the study and the files above that the portal needs

$ROOT/output/qc

  • ${projNo}_CutAdaptStats.txt - Stats on paired end reads that needed trimming
  • ${projNo}_DiscordantHomAlleleFractions.txt - Concordance of homozygous SNPs
  • ${projNo}_FingerprintSummary.txt - Concordance of heterozygous SNPs
  • ${projNo}_UnexpectedMatches.txt - Samples from different patients with concordance
  • ${projNo}_UnexpectedMismatches.txt - Samples from same patient with discordance
  • ${projNo}_MajorContamination.txt - Contamination check using heterozygous SNP VAFs
  • ${projNo}_MinorContamination.txt - Contamination check using homozygous SNP VAFs
  • ${projNo}_GcBiasMetrics.txt - Check if coverage varies much by GC content
  • ${projNo}_HsMetrics.txt - Targeted library hybridization metrics per Picard
  • ${projNo}_InsertSizeMetrics_Histograms.txt - Insert size distribution per sample
  • ${projNo}_markDuplicatesMetrics.txt - Fragment duplication rates per Picard
  • ${projNo}_pre_recal_MeanQualityByCycle.txt - quality scores before GATK BQSR
  • ${projNo}_post_recal_MeanQualityByCycle.txt - quality scores after GATK BQSR
  • ${projNo}_ProjectSummary.txt - Table of QC stats that have passed or failed
  • ${projNo}_SampleSummary.txt - Table of successes/failures per sample

$ROOT/output/log

  • cwltoil.log - Records warnings and messages from the Toil workflow manager
  • output-meta.json - Metadata on all files generated and used by Toil
  • run-results.json - Indication of completed or failed steps of Roslin pipeline
  • stderr.log - Records warnings and failures
  • stdout.log - Records the stdout of Roslin pipeline progress