Skip to content
JohnVollmers edited this page Apr 8, 2022 · 5 revisions

Welcome to the mdmcleaner wiki! This is still under construction, so Infos here areNOT complete! Please be patient, we will add more content!

Basic info

MDMcleaner is a reference-DB contamination aware pipeline for reliable contig classification of metagenome assembled (MAG) and single-cell amplified (SAG) genomes. It is based on the GTDB taxonomic system and uses GTDB representative genomes, as well as SILVA SSU and LSU and RefSeq eukaryotic and viral datasets as references. Classification is based on a "least common ancestor" (LCA) approach, that is implemented in a way that can recognize potential contaminants not only in the analyzed genome, but also in the underlying reference datasets. Furthermore each contig is classified only up to taxlevels that are actually supported by the corresponding alignment identities, thereby avoiding overclassification for organisms that are underrepresented in the reference database.

MDMcleaner is a pipeline, implemented in python, that is specifically designed for the more fragmented genomes that represent most of the microbial dark matter MAGs and SAGs.

The main script to run this pipeline is mdmcleaner.py or, if you installed this via pip simply mdmcleaner. This script run different commands for different tasks. Each of these commands has its own help function, accessible via the option -h or --help. The different available commands are:

  • clean: the major MDMcleaner workflow for assessing and filtering genome contamination
  • set_configs: can be used to change global or local settings. Will modify or create 'mdmcleaner.config'-files
  • show_configs: lists the currently applicable MDMcleaner settings/configurations
  • makedb: downloads and processes reference data into a MDMclenaner reference database. May have a LONG run-time but can be aborted and resumed
  • get_markers: an accessory command for extracting marker gene sequences from input genomes
  • completeness: an accessory command for "quick-and-dirty" assessment of bin completeness based on universally required types or tRNAs
  • refdb_contams: EXPERIMENTAL: evaluates refDBambiguity overviewfiles and adds obvious refDB contaminations to the blacklist
  • acc2taxpath: Get full taxonomic path associated with a specific input accession. Currently only works for MDMcleaner/GTDB accessions, but support for NCBI accession-numbers will follow soon
  • check_dependencies: Just check if all dependencies are being met
  • version: show version info and quit
Clone this wiki locally