GitHub - SCBB-LAB/RBPSpot: Learning on Appropriate Contextual Information for RBP Binding Sites Discovery

Requirements

Python3
Numpy
Pandas
perl
sklearn
xgboost
bedtools
keras
tensorflow
gcc
Download hg19.fa file from UCSC and put it into "Model_building" directory with "hg19.fa" name.

Running script[model building]

This script contain two modules:

For model building (This module requires CLIP-Seq peak data in bed file format)

To run the script:

./RBPSpot_model <bed_file> <window_size>

E.g. ./RBPSpot_model Example.bed 17

To build the model with XGboost:

python3 xgb_model.py Example.bed_train Example.bed_test

bedfile contains peak data for CLIP-seq data, bedfile can be given with any name, but it's name will be used as prefix for all the files generated in this step. And also some of the file will be used in next step.

window size can vary between 17-131. For optimum result try with variable window size.

Output description [model building]

bedfile_model

E.g. Example.bed_model

Folder contaiining the model file in .pb format and it's assets and variable.

Running script [Scanning module]

For scanning sequences with built model we requires five files in Model_Scan folder: (i). Input sequences (ii). bedfile_model folder [Generated after model building process] (iii). bedfile_penta_Prob_value [Generated after model building process] (iv). bedfile_penta_Prob_value [Generated after model building process] (v). bedfile_primary_motif [Generated after model building process]

To run the script:

./scan <bed_file> <Input_sequence> <window_size>

For parallel: sh parallel.sh <#Processors> <Input_sequence> <bed_file> <window_size>

E.g. Shift "Example.bed_model" directory into Model_Scan directory along with Example.bed_penta_Prob_value, Example.bed_penta_Prob_value and Example.bed_primary_motif files. Then run: ./scan Example.bed Example_sequence.fa 17

bedfile name must be the same name used in last step at the time of Model_building step. Input_sequence file must be in single line fasta and sequence length must be >=160 bases. window_size must be the same number used in Model_building step. As different window size will generate different number of feature vector, hence model will not be able to test any feature vector.

Output description [Scanning module]

Input_sequence_output.tsv File contain 3 columns: Seuqence_name Start_coordinate End_coordinate

E.g. Example_sequence.fa_output.tsv

Web-server version for 131 RBPs available at:

https://scbb.ihbt.res.in/RBPSpot/

Citation: Sharma NK, Gupta S, Kumar A, Kumar P, Pradhan UK, Shankar R (2021) RBPSpot: Learning on appropriate contextual information for RBP binding sites discovery. iScience 24(12). https://doi.org/10.1016/j.isci.2021.103381.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
Model_Scan		Model_Scan
Model_building		Model_building
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

SCBB-LAB/RBPSpot

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages