🧬 Riboclette: Interpretable condition-aware transformer for predicting ribosome densities in nutrient-deprivation conditions

Riboclette is a transformer-based deep learning model that is capable of predicting ribosome densities for genes specifically in the mouse genome and can make these predictions in the Control condition (without any nutrient availability constraints) and also in conditions of specific amino acid deprivations such as Leucine, Isoleucine, Valine, and combinations of these. In this repository we provide links to the raw data and the preprocessing code to prepare the data. It also has the code to train the Riboclette model, along with the baseline models. The predictions of the Riboclette model along with individual codon importance plots.

Dataset Preparation 🐁

Download the raw data files from "link" and run the Ribo-DT pipeline to pre-process them.

- RiboDT

Convert the pre-processed data into a format the machine learning models can use

cd /riboclette/preprocessing
papermill processing.ipynb

Riboclette Model Training 💻

Combined Single-Head Variant

cd /riboclette/models/xlnet/csh
papermill xlnet_csh_train.ipynb

Double-Head Variant

cd /riboclette/models/xlnet/csh
papermill xlnet_csh_train.ipynb

Pseudolabeling ➕

In order to perform the pseudolabeling, you need to first train 5 seed models of Riboclette DH

cd /riboclette/models/xlnet/csh
papermill xlnet_csh_train.ipynb -p seed_val 1
papermill xlnet_csh_train.ipynb -p seed_val 2
papermill xlnet_csh_train.ipynb -p seed_val 3
papermill xlnet_csh_train.ipynb -p seed_val 4
papermill xlnet_csh_train.ipynb -p seed_val 42

Once all of the seed models are trained, the pseudolabeling dataset can be generated

cd /riboclette/models/xlnet/pseudolabeling
papermill plabel_dataset_gen.ipynb

After the Pseudolabeling datasets have been generated, the different psuedolabeling based models can be trained. There are two pseudolabeling experiment types ("exp1", and "exp2") which can be specified using the "experiment_type" parameter.

cd /riboclette/models/xlnet/pseudolabeling
papermill xlnet_plabel_train.ipynb -p experiment_type [exp1, exp2]

Interpretability

In order to generate codon-level interpretations for all the sequences in the testing set, run the following commands:

cd /riboclette/models/xlnet/pseudolabeling
papermill interpret_plabel.ipynb

Baseline Models 🛸

BiLSTM Model Training - Combined Single-Head Variant

cd /riboclette/models/bilstm/csh
papermill bilstm_csh_train.ipynb

BiLSTM Model Training - Double-Head Variant

cd /riboclette/models/bilstm/dh
papermill bilstm_dh_train.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
data		data
downstream_analysis		downstream_analysis
riboclette		riboclette
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Riboclette: Interpretable condition-aware transformer for predicting ribosome densities in nutrient-deprivation conditions

Dataset Preparation 🐁

Riboclette Model Training 💻

Pseudolabeling ➕

Interpretability

Baseline Models 🛸

About

Releases

Packages

Contributors 2

Languages

vam-sin/riboclette

Folders and files

Latest commit

History

Repository files navigation

🧬 Riboclette: Interpretable condition-aware transformer for predicting ribosome densities in nutrient-deprivation conditions

Dataset Preparation 🐁

Riboclette Model Training 💻

Pseudolabeling ➕

Interpretability

Baseline Models 🛸

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages