Skip to content

Learning on Appropriate Contextual Information for RBP Binding Sites Discovery

Notifications You must be signed in to change notification settings

SCBB-LAB/RBPSpot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 

Repository files navigation

Requirements

  1. Python3
  2. Numpy
  3. Pandas
  4. perl
  5. sklearn
  6. xgboost
  7. bedtools
  8. keras
  9. tensorflow
  10. gcc
  11. Download hg19.fa file from UCSC and put it into "Model_building" directory with "hg19.fa" name.

Running script[model building]

This script contain two modules:

  1. For model building (This module requires CLIP-Seq peak data in bed file format)

To run the script:

./RBPSpot_model <bed_file> <window_size>

E.g. ./RBPSpot_model Example.bed 17

To build the model with XGboost:

python3 xgb_model.py Example.bed_train Example.bed_test

bedfile contains peak data for CLIP-seq data, bedfile can be given with any name, but it's name will be used as prefix for all the files generated in this step. And also some of the file will be used in next step.

window size can vary between 17-131. For optimum result try with variable window size.

Output description [model building]

bedfile_model

E.g. Example.bed_model

Folder contaiining the model file in .pb format and it's assets and variable.

Running script [Scanning module]

  1. For scanning sequences with built model we requires five files in Model_Scan folder: (i). Input sequences (ii). bedfile_model folder [Generated after model building process] (iii). bedfile_penta_Prob_value [Generated after model building process] (iv). bedfile_penta_Prob_value [Generated after model building process] (v). bedfile_primary_motif [Generated after model building process]

To run the script:

./scan <bed_file> <Input_sequence> <window_size>

For parallel: sh parallel.sh <#Processors> <Input_sequence> <bed_file> <window_size>

E.g. Shift "Example.bed_model" directory into Model_Scan directory along with Example.bed_penta_Prob_value, Example.bed_penta_Prob_value and Example.bed_primary_motif files. Then run: ./scan Example.bed Example_sequence.fa 17

bedfile name must be the same name used in last step at the time of Model_building step. Input_sequence file must be in single line fasta and sequence length must be >=160 bases. window_size must be the same number used in Model_building step. As different window size will generate different number of feature vector, hence model will not be able to test any feature vector.

Output description [Scanning module]

Input_sequence_output.tsv File contain 3 columns: Seuqence_name Start_coordinate End_coordinate

E.g. Example_sequence.fa_output.tsv

Web-server version for 131 RBPs available at:

https://scbb.ihbt.res.in/RBPSpot/

Citation: Sharma NK, Gupta S, Kumar A, Kumar P, Pradhan UK, Shankar R (2021) RBPSpot: Learning on appropriate contextual information for RBP binding sites discovery. iScience 24(12). https://doi.org/10.1016/j.isci.2021.103381.

About

Learning on Appropriate Contextual Information for RBP Binding Sites Discovery

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published