MetaPCbin

MetaPCbinflow that takes a user given sequence and outputs the plasmid probability.

Configuration

Can be installed via conda, pip or manually. However conda is recommended as it installs the third party dependencies automatically.

Conda Installation (Recommended)

All the prerequisites will be installed automatically in the conda environment.

Pip Installation

Pip install will install tqdm, biopython for you. However Seq2Vec, Infernal, Mummer, Blast+ listed in the dependencies should be installed manually.

Manual Installation

Need to install all dependencies manually.

Dependencies

Following tools should be installed before running the software.

Seq2Vec - used for kmer count
Infernal - used for calculate the rRNA availability
Mummer - used to calculate circularity
Blast+ - used for OriT sequence availability and Incompatibility sequence availability
Biopython - used to read fasta files
tqdm - used to display runtime progress

Seq2Vec (kmer count) - github link

In linux,

git clone https://github.com/anuradhawick/seq2vec.git
cd seq2vec
mkdir build; cd build; cmake ..; make -j8

Infernal (rRNA) - github link

wget eddylab.org/infernal/infernal-1.1.4.tar.gz
tar xf infernal-1.1.4.tar.gz  
cd infernal-1.1.4
./configure
make
make install
cd easel
make install
cmscan
chmod 777  '/infernal-1.1.4/src/cmscan.c'

MUMMER (Circularity) - documentation link

wget https://downloads.sourceforge.net/project/mummer/mummer/3.23/MUMmer3.23.tar.gz
tar -xvf MUMmer3.23.tar.gz 
cd MUMmer3.23/
sudo apt-get install csh
make check
make install
./nucmer -h

BLAST+ (OriT and Incomp) - documentation link

wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.7.1/ncbi-blast-2.7.1+-x64-linux.tar.gz
tar zxvpf ncbi-blast-2.7.1+-x64-linux.tar.gz

BIOPYTHON

conda install -c conda-forge biopython

TQDM

conda install -c conda-forge tqdm

[OPTIONAL] Add the paths of the executables to your system path for tools seq2vec, blastn in BLAST+ cmscan in Infernal and nucmer in MUMMER.

Usage

Use the pipeline.py to run the pipeline. The --help tag provides a list of all parameters.

$ python pipeline.py --help

usage: pipeline.py [-h] [-f FILES [FILES ...]] [-o OUT] [-d DIRECTORY] [-c COVERAGE] [-t THREADS] 
                        [--cmscan CMSCAN] [--blastn BLASTN] [--seq2vec SEQ2VEC] [--nucmer NUCMER]

MetaGSC plasmid/chromosome predictor for metagenomic assemblies.

optional arguments:
  -h, --help            show this help message and exit
  -f FILES [FILES ...], --files FILES [FILES ...]
                        list of input fasta files (default: None)
  -o OUT, --out OUT     output file destination (default: None)
  -d DIRECTORY, --directory DIRECTORY
                        directory of input fasta files (default: None)
  -c COVERAGE, --coverage COVERAGE
                        fragment coverage (default: 5)
  -t THREADS, --threads THREADS
                        number of threads (default: 8)
  --cmscan CMSCAN       path to cmscan (default: cmscan)
  --blastn BLASTN       path to blastn (default: blastn)
  --seq2vec SEQ2VEC     path to seq2vec (default: seq2vec)
  --nucmer NUCMER       path to nucmer (default: nucmer)

It is mandatory to specify either -f or -d. If both are specified, the -f takes precedence.
The thread count specified by -t will be used in using tools to generate features.
The coverage parameter specified by -c specifies the fragment coverage used to break sequences into fragments.
If the cmscan, blastn, seq2vec, nucmer executable paths are not added to the system path, specify the paths using the --cmscan, --blastn, --seq2vec and --nucmer parameters.
The progress logs will be displayed the console output and in log.txt file along with a timestamp.
The errors will be displayed the console output and in error.txt file along with a timestamp.
The kmer files required for the neural network is saved to the results/kmers directory.
The biomarker features required for the random forest model is saved to the results/data.csv file.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
biomarker_dbs		biomarker_dbs
models		models
pipeline		pipeline
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MetaPCbin

Configuration

Conda Installation (Recommended)

Pip Installation

Manual Installation

Dependencies

Seq2Vec (kmer count) - github link

Infernal (rRNA) - github link

MUMMER (Circularity) - documentation link

BLAST+ (OriT and Incomp) - documentation link

BIOPYTHON

TQDM

Usage

About

Releases 2

Packages

Contributors 2

Languages

License

MetaGSC/MetaPCbin

Folders and files

Latest commit

History

Repository files navigation

MetaPCbin

Configuration

Conda Installation (Recommended)

Pip Installation

Manual Installation

Dependencies

Seq2Vec (kmer count) - github link

Infernal (rRNA) - github link

MUMMER (Circularity) - documentation link

BLAST+ (OriT and Incomp) - documentation link

BIOPYTHON

TQDM

Usage

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages