MATAM

Mapping-Assisted Targeted-Assembly for Metagenomics

MATAM is a software dedicated to the fast and accurate targeted assembly of short reads sequenced from a genomic marker of interest. It has been applied to the assembly of 16S rRNA markers and is validated on simulated, synthetic and genuine metagenomes.

The related article of this method is available here.

1. Hardware requirements

We recommend running MATAM with at least 10Go of free RAM. You can try running MATAM with less RAM by setting --max_memory to a lower value (eg. --max_memory 4000 for 4Go). However, this could degrade the performance of the software.

Some steps of MATAM are highly parallelized. So you can get a significant speed up by setting the --cpu option according to your hardware configuration.

2. Installation

There are three possible ways of installing MATAM: either with CONDA, or as a docker container, or directly from the source code.

2.1 MATAM with conda

Before you begin, you should have installed Miniconda or Anaconda. See https://conda.io/docs/installation.html for more details.
Then you will need to add the followings channels:

conda config --add channels conda-forge
conda config --add channels defaults
conda config --add channels r
conda config --add channels bioconda
conda config --add channels bonsai-team

Finally, matam can be installed with:

conda install matam

All the commands used in this README will be available in your PATH.

2.2 MATAM in Docker

To retrieve the docker image, run the following command:

docker pull bonsaiteam/matam

Then all the commands used in this README will be available as:

docker run -v host_directory:/workdir bonsaiteam/matam CMD

Noticed that you have to specify a docker volume to share data between the host and the container and use this workdir for your analysis. Otherwise your data will be lost when exiting the container.

Finally, if you prefer an interactive session with the container, run:

docker run -it bonsaiteam/matam

2.3 MATAM from source code

2.3.1 Full dependencies list

gcc v4.9.0 or superior, (full C++11 support, <regex> included, and partial C++14 support)
C++ libraries: rt, pthread, zlib
Samtools v1.x or superior
automake, make, cmake v3.1 or superior
Python 3
pip
numpy
Apache Ant
Java SE 7 JDK. OpenJDK is ok (openjdk-7-jdk paquet on debian)
bzip2
google sparse hash library (libsparsehash-dev paquet on debian)

2.3.2 Install dependencies

To install all of the needed dependencies except samtools, you can run the following command-lines in Debian-like distributions :

sudo apt-get update && sudo apt-get install curl git gcc g++ python3 python3-pip default-jdk automake make cmake ant libsparsehash-dev zlib1g-dev bzip2
sudo pip install numpy

The samtools package available in current Ubuntu-like distributions is usually a deprecated version (v0.1.19). So we recommend getting samtools through bioconda (https://bioconda.github.io/)

2.3.3 Compile MATAM

Cloning MATAM repository

git clone https://github.com/bonsai-team/matam.git && cd matam

Compile MATAM and dependencies

./build.py

Update your PATH to make MATAM's commands available:

echo 'export PATH="$MATAMDIR/bin:$PATH"' >> ~/.profile
source ~/.profile

3. Run MATAM

3.1 Database preparation (clustering & indexation)

3.1.1 Provided database

By default, MATAM provides a SSU rRNA reference database where the clusterisation step has already been done (i.e. the sequences sharing 95% of identity have been clustered with Sumaclust).
The FASTA file used for this database comes from SILVA 128 release.

To use the default SSU rRNA reference database, run the following command:

index_default_ssu_rrna_db.py -d $DBDIR --max_memory 10000

where $DBDIR is the directory used to store the database.

3.1.2 Custom database

If the provided database does not fulfill your needs, you can prepare a custom database of your own by running the following command:

matam_db_preprocessing.py -i ref_db.fasta -d $DBDIR --cpu 4 --max_memory 10000 -v

where $DBDIR is the directory used to store the database.

3.2 De-novo assembly

When your database is ready, then you will be able to reconstruct your markers:

Assembly only
In this mode, MATAM will reconstruct the full length sequences present in the sample.
matam_assembly.py -d $DBDIR/prefix -i reads.fastq --cpu 4 --max_memory 10000 -v
Assembly and taxonomic assignment
In this mode, MATAM additionnaly provides a taxonomic classification of the sequences found, together with their abundance. Note that the classification is done with RDP with the default training model "16srrna". So this mode may be not suitable for other phylogenetic markers.
matam_assembly.py -d $DBDIR/prefix -i reads.fastq --cpu 4 --max_memory 10000 -v --perform_taxonomic_assignment
The taxonomic assignment is done with RDP classifier and the training model used by default is "16srrna"

where $DBDIR is the database directory and prefix is the common prefix used to name the database files. For example, with the default database, the prefix is SILVA_128_SSURef_NR95.

3.3 Example with default database and provided dataset

Retrieve the example dataset: 16 bacterial species simulated dataset

wget https://raw.githubusercontent.com/bonsai-team/matam/master/examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp_50x.fq

Getting and indexing default SSU rRNA reference database

index_default_ssu_rrna_db.py -d $DBDIR --max_memory 10000

De-novo assembly

matam_assembly.py -d $DBDIR/SILVA_128_SSURef_NR95 -i 16sp.art_HS25_pe_100bp_50x.fq --cpu 4 --max_memory 10000 -v --perform_taxonomic_assignment

4. Samples comparaison

We provide a script to compare the abundances of different samples. Available only if the --perform_taxonomic_assignment was used when running MATAM.

matam_compare_samples.py -s samples_to_compare.tsv -t contingency_table.tsv -c comparaison_table.tsv

The samples_to_compare.tsv file is a tabulated file listing the FASTA & RDP files of each sample to compare (see example below).
The contingency_table.tsv file will report the abundance for each sequence.
The comparaison_table.tsv file will report a comparaison by "taxonomic path" of the abundance for the samples.

Example

sample1 <tab> $WORKDIR/matam_sample1/final_assembly.fa <tab> $WORKDIR/matam_sample1/rdp.tab
sample2 <tab> $WORKDIR/matam_sample2/final_assembly.fa <tab> $WORKDIR/matam_sample2/rdp.tab

The first column is the ID of the sample and it must be unique among the file.

5. Release versioning

MATAM releases will be following the Semantic Versioning 2.0.0 rules described here: http:https://semver.org/spec/v2.0.0.html

Name		Name	Last commit message	Last commit date
Latest commit History 564 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Krona @ 273b403		Krona @ 273b403
RDPTools @ 02aa337		RDPTools @ 02aa337
componentsearch @ f705924		componentsearch @ f705924
examples/16sp_simulated_dataset		examples/16sp_simulated_dataset
lib		lib
ovgraphbuild		ovgraphbuild
scripts		scripts
sga @ 726e2e2		sga @ 726e2e2
sortmerna @ a0f9ae2		sortmerna @ a0f9ae2
tests		tests
vsearch @ 31b6e7d		vsearch @ 31b6e7d
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build.py		build.py
environment.yml		environment.yml
index_default_ssu_rrna_db.py		index_default_ssu_rrna_db.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
travis_build.sh		travis_build.sh
travis_download_db.sh		travis_download_db.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MATAM

Table of contents

1. Hardware requirements

2. Installation

2.1 MATAM with conda

2.2 MATAM in Docker

2.3 MATAM from source code

2.3.1 Full dependencies list

2.3.2 Install dependencies

2.3.3 Compile MATAM

3. Run MATAM

3.1 Database preparation (clustering & indexation)

3.1.1 Provided database

3.1.2 Custom database

3.2 De-novo assembly

3.3 Example with default database and provided dataset

4. Samples comparaison

5. Release versioning

About

Releases

Packages

Languages

License

bilille/matam

Folders and files

Latest commit

History

Repository files navigation

MATAM

Table of contents

1. Hardware requirements

2. Installation

2.1 MATAM with conda

2.2 MATAM in Docker

2.3 MATAM from source code

2.3.1 Full dependencies list

2.3.2 Install dependencies

2.3.3 Compile MATAM

3. Run MATAM

3.1 Database preparation (clustering & indexation)

3.1.1 Provided database

3.1.2 Custom database

3.2 De-novo assembly

3.3 Example with default database and provided dataset

4. Samples comparaison

5. Release versioning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages