Skip to content
/ mirDNN Public
forked from cyones/mirDNN

MicroRNA prediction with convolutional neural networks.

License

Notifications You must be signed in to change notification settings

dmilone/mirDNN

 
 

Repository files navigation

MirDNN

MirDNN is a novel deep learning method specifically designed for pre-miRNA prediction in genome-wide data. The model is a convolutional deep residual neural network that can automatically learn suitable features from the raw data, without manual feature engineering. This model is capable of that can successfully learn the intrinsic structural characteristics of precursors of miRNAs, as well as their context in a sequence. The proposal has been tested with several genomes of animals and plants and compared with state-of-the-art algorithms.

MirDNN was described in detail in the work "High precision in microRNA prediction: a novel genome-wide approach based on convolutional deep residual networks" (under review in a refereed journal).

Contact: Cristian Yones, sinc(i)

Web server

MirDNN can be used without need of an installation from this web server. This server provides two pre-trained models (animals and plants) and can process both individual sequences or fasta files. When making predictions on individual sequences, the server generates also a nucleotide importance graph. Due to computational limitations, the size of the fasta files that can be uploaded is limited.

Package installation

The latest version of the package can be downloaded from the GitHub repository. The exact version used in the paper is allocated in SourceForge.

To download from GitHub:

git clone --recurse-submodules https://github.com/cyones/mirDNN.git

After downloading the package (from GitHub or SourceForge), install the dependencies:

cd mirDNN
pip install -r requeriments.txt

That would install all the needed packages to run mirDNN, but in order to train models or make predictions the secondary structure of the sequences has to be infered. For this task, the ViennaRNA software should be use. To install this software in you OS, see this page.

Usage

To make predictions or training new models, the first step is to predict the secondary structure of the sequences to proccess. This should be done with the RNAfold software. For example, given a fasta file named sequences.fa, run:

RNAfold --noPS --infile=sequences.fa --outfile=sequences.fold

Inference

Now that we have the .fold file, to make predictions with the provided pre-trained model for animals, simply run:

mirdnn_eval -i sequences.fold \
            -o predictions.csv \ # output file
            -m models/animal.pmt \ # pre-trained model provided
            -s 160 \ # sequence max lenght (should be 160 for animal and 320 for plants)
            -d "cpu" # device to use, could be "cpu", "cuda:0", "cuda:1", etc.

To calculate nucleotide importance values the command is similar:

mirdnn_explain -i sequences.fold \
               -o importance.csv \
               -m models/animal.pmt \
               -s 160 \
               -d "cpu"

Training new models

To train new models, two .fold files would be needed, one with negative examples (non pre-miRNA sequences) and other with positive examples (well-known pre-miRNAs). For some ideas about how to construct this datasets, see the paper.

Given these datasets, the training can be done with

mirdnn_fit.py -i negative_sequences.fold \
              -i positive_sequences.fold \
              -m out_model.pmt \
              -l train.log \ # log file with training progress
              -d "cuda:0" \
              -s 160

NOTE: training a model is a very computing intensive task, therefore, it is recommended to use a GPU.

For more details about the training parameters, execute

mirddn_fit.py -h

Reproduce experiments

All the experiments presented in the paper can be easily reproduced using the Makefile inside the folder experiments. For example, to generate the PRROC curve obtained in Caenorhabditis elegans, run:

cd experiments
make results/PRROC-cel.pdf

You would be asked to download the sequences files, and then all the necesary commands to train and test the model would be automatically executed.

About

MicroRNA prediction with convolutional neural networks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.0%
  • Makefile 11.8%
  • R 8.2%