This tutorial is aimed at researchers with little background in bioinformatics who would like
to start learning about genome assembly. Basic knowledge of the Linux command line is required
(for an introduction see http:https://linuxcommand.org).
In whole genome sequencing (WGS) researchers are usually interested in the original genomic sequence of their sample. However, due to fragmentation of the genome during the preparation of the DNA sequencing library the order of the individual fragments is lost. Thus, the sequenced fragments (reads) need to be correctly stitched back together into their original configuration, a process called genome assembly.
This tutorial explains the two main approaches to genome assembly: 1) The alignment (or mapping) of reads to a reference sequence and 2) the reconstruction of the genomic sequence without a reference (de-novo assembly).
A researcher observes a known bacterial strain with an unusual phenotype. He/she would like to sequence the genome to identify the responsible genetic change. Genome assembly using reference alignment would allow for the identification of small alterations in the sequence such as single nucleotide polymorphisms (SNPs), insertions or deletions. However, larger alterations such as duplication events that are not in our reference sequence would be lost. A better approach for detecting these kinds of new structural variations is de-novo assembly. This method requires no prior knowledge of the original sequence but instead attempts to reconstruct the genome from the reads only. Both alignment and assembly have their pros and cons and they often go hand-in-hand during genome analysis.
Provide ..
- a basic understanding of genome assembly
- a workflow for assembly by alignment
- a workflow for de-novo assembly
- Introduction to genome assembly
- Assembly by alignment - workflow example
- De-novo assembly - workflow example
If you haven't got your own data you may use this dataset and this reference sequence.
For a short description of the data formats, see here.
- De novo genome assembly versus mapping to a reference genome
- Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data.
- Great tutorial on next-gen sequencing genome analysis using samtools (also covers SNP detection).
- Celera (Canu) Assembler Terminology (some of it is irrelevant to nanopore reads e.g. mate-pairs).
- How to score genome assemblies with Mauve
- QUAST: quality assessment tool for genome assemblies
- Some presentations on genome analysis by the Schatz lab.