DeepSilencer

A deep convolutional neural network for the accurate prediction of silencers

For accurate classification of silencers, we propose a CNN-based model named DeepSilencer. As illustrated in the following figure, DeepSilencer consists of four modules. First, a pre-processing module transforms DNA sequences into matrices of sequences and counts of kmers. Second, a CNN module uses a convolutional neural network (CNN) with multiple convolutional and pooling layers to extract features from matrices of DNA sequences. Third, an ANN module is adopted to sufficiently learn the characteristics of kmers. Finally, a joint module integrates outputs of the CNN and ANN modules to predict the probability.

Installation

Requiements:  
1. Python 3.5 or later version  
2. Packages:  
    numpy (>=1.15.1)  
    keras (2.3.1)  
    tensorflow(-gpu) (1.15.2)  
    hickle (>=3.4)
  
Package installation:
  
$ pip install -U numpy  
$ pip install keras == 2.3.1 
$ pip install tensorflow-gpu==1.15.2 #pip install tensorflow==1.15.2  
$ pip install -U hickle  
$ git clone https://github.com/xy-chen16/DeepSilencer.git   
$ cd DeepSilencer

Method	Version
DeepSilencer	0.1.0
gkmSVM	v1.3
SVM	0.22.1
correlation	0.22.1

Data Preprocessing

Load the genome files:

$ cd data 
$ mkdir -p genome/mm10 && cd genome/mm10
$ nohup wget https://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz
$ tar zvfx chromFa.tar.gz
$ cd ..
$ mkdir hg19 && cd hg19
$ nohup wget https://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
$ tar zvfx chromFa.tar.gz 
$ cd ../../..

Unzip the open region files and result files:

$ tar -xjvf data/open_region.tar.bz2 -C data
$ tar -xjvf result/result.tar.bz2 -C result

Tutorial

We collected the uncharacterized cis-regulatory elements (CREs) in K562 cells with MPRA provided by Jayavelu et al. Then we chose the top 2000 uncharacterized CREs sequences with the lowest MPRA activity as a positive set, and the bottom 2000 uncharacterized CREs with highest MPRA activity as a negative set. We also downloaded the uncharacterized CREs in homo sapiens and mus musculus without the value of MPRA, and we further use these sequences to find the candidate silencer.

self-projection

For demonstrating the classification performance of DeepSilencer, we conducted the self-projection experiment and compared our method with the gapped k-mer SVM (gkmSVM). We randomly selected 80% of the data as trainning set and used the remaining 20% of data for testing the two models.

$ python code/run_self_projection.py

The performance of DeepSilencer was shown in the following Figure.

The performance of two methods was shown in the following table.

Method	AUC	PRC
DeepSilencer	0.827	0.842
gkmSVM	0.81	0.76

crossdataset-projection

In order to find the candidate silencer elements in homo sapiens and mus musculus, we trained the DeepSilencer model based on the whole sequences using in the self-projection experiments. First, we trained the model:

$ python code/train_for_crossdata_projection.py

Predict the candidate silencer elements in homo sapiens (hg19):

$ python code/run_crossdata_projection_human.py

Predict the candidate silencer elements in mus musculus (mm10):

$ python code/run_crossdata_projection_mouse.py

Then you can check the results in the results folder.

Analysis

We got the candidate silencers using the trained DeepSilencer model from the uncharacterized CREs in human and mouse. Then we did some comparative analyses between silencers predicted in the human K562 cell line by the gkmSVM-based model and our DeepSilencer model, and believed that there is almost no difference between them. The length distribution of silencers from the gkmSVM-based model and our DeepSilencer model is shown in Figure A. We found that there is no significant difference in the distance between silencers and the nearest coding genes (two-sided Wilcox test p-value=0.5209, Figure B), the GC content (two-sided Wilcox test p-value=0.9299, Figure C) and the chromatin accessibility (two-sided Wilcox test p-value=0.5699, Figure D).

Indeed, most of the silencers were collected from both the gkmSVM-based model and our DeepSilencer model. However, there are some unique silencers from DeepSilencer model. For example, chr3: 48,997,302-48,997,398 is a silencer predicted by DeepSilencer model, but not a silencer predicted by the gkmSVM-based model. And chr3: 48,997,266-48,997,466 that is a silencer validated in the human K562 cell line covers this predicted silencer.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
code		code
data		data
inst		inst
model		model
result		result
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSilencer

A deep convolutional neural network for the accurate prediction of silencers

Installation

Data Preprocessing

Load the genome files:

Unzip the open region files and result files:

Tutorial

self-projection

crossdataset-projection

Analysis

About

Releases

Packages

Languages

License

xy-chen16/DeepSilencer

Folders and files

Latest commit

History

Repository files navigation

DeepSilencer

A deep convolutional neural network for the accurate prediction of silencers

Installation

Data Preprocessing

Load the genome files:

Unzip the open region files and result files:

Tutorial

self-projection

crossdataset-projection

Analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages