1.3
Copyright 2012 - Christoph Hahn
Christoph Hahn [email protected]
This document contains instructions on how to use the MITObim pipeline described in the manuscript "Reconstructing mitochondrial genomes directly from genomic next generation sequencing reads - a baiting and iterative mapping approach" by Hahn et al., submitted to NAR methods online. The pipeline is at the moment intended to be used with illumina data, but can be readily modified for the use with other platforms data. The latest version of the wrapper script (including the proofreading function) has been uploaded 31.12.2012. The tutorial will be updated accordingly early 2013.
- GNU utilities
- perl
- A running version of MIRA v3.4.1.1 (https://sourceforge.net/projects/mira-assembler/files/MIRA/stable/) is required. An excellent guide to MIRA v3.4 is available at https://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html.
The MITObim procedure (mitochondrial baiting and iterative mapping) represents a highly efficient approach to assembling novel mitochondrial genomes of non-model organisms directly from total genomic DNA derived NGS reads. Labor intensive long-range PCR steps prior to sequencing are no longer required. MITObim is capable of reconstructing mitochondrial genomes without the need of a reference genome of the targeted species by relying solely on (a) mitochondrial genome information of more distantly related taxa or (b) short mitochondrial barcoding sequences (seeds), such as the commonly used cytochochrome-oxidase subunit 1 (COI), as a starting reference.
The script is performing three steps and iteratively repeating them: (i) Deriving reference sequence from previous mapping assembly, (ii) in silico baiting using the newly derived reference (iii) previously fished reads are mapped to the newly derived reference leading to an extension of the reference sequence. For more details please refer to the manuscript. Detailed examples are demonstrated in the TUTORIALS section below.
The following tutorials are designed for users with little Unix and no previous MIRA experience. Tutorials I & II will demonstrate how to recover the complete mitochondrial genome of Thymallus thymallus using the mitochondrial genome of Salvelinus alpinus as a starting reference. Tutorial III achieves the same goal using solely a ~700 bp barcoding sequence as initial seed reference. Tutorial IV (to be finished early 2013) uses a proofreading procedure to specifically reconstruct two mitochondrial genomes from a mixed sample containing genomic reads from two species.
Preparations:
- download the MITObim wrapper script MITObim.pl and make it executable (chmod a+x MITObim.pl)
- download testdata1.tgz and testdata2.tgz and extract the contents (tar xvfz testdata?.tgz)
Test the wrapper script by doing:
-bash-4.1$ ~/PATH/TO/MITObim.pl
which should display the usage:
usage: ./MITObim.pl <parameters>
parameters:
-start <int> iteration to start with, default=1
-end <int> iteration to end with, default=1
-strain <string> strainname as used in initial MIRA assembly
-ref <string> referencename as used in initial MIRA assembly
-readpool <PATH> path to readpool in fastq format
-maf <PATH> path to maf file from previous MIRA assembly
optional:
--denovo runs MIRA in denovo mode, default: mapping
--pair finds pairs after baiting, default: no
--quick <PATH> starts process with initial baiting using provided fasta reference
--noshow