Name		Name	Last commit message	Last commit date
parent directory ..
DL_input_Test_HET_Katana.py		DL_input_Test_HET_Katana.py
DL_input_Test_HET_Katana.sh		DL_input_Test_HET_Katana.sh
ReadMe.md		ReadMe.md

ReadMe.md

DeepBrain Training [version 3]

In this version [V3], we've used Essays from HumanFC, EpiMap and TF profiles from ENCODE DCC to train deep learning model.
Same as Training [V2], this is a PyTorch implementation of the DeepBrain project. This project aims to predict the functional effects of non-coding variants from sequence data.

Requirements

python 3.6 or higher
Pytorch 1.0.1
Numpy
Scipy
sklearn

Overview

File	Description
`split_and_Numpy_V2.py`	Prepares data for running DeepBrain models; train-validation split based on a chromosome
`DL_input_Test_HET_Katana.py`	DeepBrain with static Convnet (e.g. DeepSEA and DeeperDeepSEA) models

Usage

Data Preparation

The preprocessing steps yields data that contains both input DNA seqeunce and corresponding label for all chromosomes combined in a single file. Note, each record in this file has at least one signal present (i.e. they are nonZero rows). The file header looks like the following:

chr	start	end	dna.seq	id	strand	288 HumanFC features	150 EpiMap feature	128 TF features

DeepBrain models considers "dna.seq" as input data and the feature values (binary) as known label. Hence, we need to extract them from above file and convert them to numpy ndarrays. Moreover, we wanted to leave one chromosome data out while training, so that we can test the model with that. Therefore, we have to do train-validation split on both the value and label data based on a single chromosome. This process runs on Katana GPUs (for details, see the pbs script: DL_input_Test_HET_Katana.sh)

Training and Validation

Initially, we are trying DeepSEA model-like architectures for training, which may be followed by the DARTs or other static ConvNet models with varied parameters settings. Training the DeeperDeepSEA-like model using DL_model_test.py and deepbrain2_dist.py, runs in single-GPU on Google CoLab (DL_input_test_HET_CoLab_notebook.ipynb) and Raijin GPUs, respectively.

To run on CPU, use following command:

 python3 DL_model_test_HET_Katana.py  \
--name deepbrainStaticConvnet \
--DataDir H:\ \
--TrainingDataFile temp_HET_trainingData_chr1_value.npy \
--TrainingLabelFile temp_HET_trainingData_chr1_label_all.npy \
--TestingDataFile temp_HET_validationData_chr1_value.npy \
--TestingLabelFile temp_HET_validationData_chr1_label_all.npy \
--nEpochs 15 \
--BATCH_SIZE 128

Accuracy measures and Log reporting

For this version, we've checked two accuracy measurements. In each training/testing iteration (for a mini-batch), we report "true prediction ratio" for each Feature columns and take median accross three Feature categories (e.g. Accetylation, RNA-seq, and TFs). We also report AUC scores in a similar manner. In addition, progress logs are reported in a log file within the current data directory and project name (e.g. deepbrainStaticConvnet) subdirectory.

Data Description

For data description and preprocessing details, please follow this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training 3 [with New data]

Training 3 [with New data]

ReadMe.md

DeepBrain Training [version 3]

Requirements

Overview

Usage

Data Preparation

Training and Validation

Accuracy measures and Log reporting

Data Description

Files

Training 3 [with New data]

Directory actions

More options

Directory actions

More options

Latest commit

History

Training 3 [with New data]

Folders and files

parent directory

ReadMe.md

DeepBrain Training [version 3]

Requirements

Overview

Usage

Data Preparation

Training and Validation

Accuracy measures and Log reporting

Data Description