In silico RNA isoform screening to identify potential cancer driver exons with therapeutic applications
This is the repository to reproduce the analysis carried out in our manuscript, from the data download to the figures.
The article describes how we repurposed gene-level knockdown screens across many cancer cell lines and their intrinsic heterogeneity to generate a computational framework called spotter
of potential cancer-driver exons whose inclusion is highly predictive of cancer cell proliferation.
The computational framework created to perform in silico RNA isoform screening is available as a Python package called target_spotter and can also be accessed interactively at https://spotter.crg.eu.
data
will contain the raw and preprocessed data generated by running the workflows.images
contains images for this repository.results
will contain the results (files and figures) of the analyses.support
contains a set of relatively small tables and files essential to run all workflows that are either handmade or had to be downloaded manually.workflows
obtain_data
generates thedata/raw
directory. Note that some raw datasets require special permissions and/or need to be downloaded manually. See the correspondingworkflows/obtain_data/README.md
file.preprocess_data
generates thedata/prep
directory cleaning raw datasets to be analyzed.model_splicing_dependency
generates theresults/model_splicing_dependency
directory. Perfoms model fitting and validation using existing datasets.streamlined_therapy_dev
generates theresults/streamlined_therapy_dev
directory. Prioritizes potential cancer-driver exons with therapeutic potential from patient data.experimental_validation
generates theresults/experimental_validation
directory. Analyzes our inhouse experimental results.exon_drug_interactions
generates theresults/exon_drug_interactions
directory. Performs drug-exon association models using predicted splicing dependencies for cancer cell lines and explores how potential cancer-driver exons may mediate drug mechanism of action.treatment_recommendation
generates theresults/treatment_recommendation
directory. Case study of a dataset in which ~40 patients were treated with chemotherapies and we try to predict their outcomes with our statistical models.prepare_submission
generates theresults/prepare_submission
directory. Collects tables from all workflows and puts them in order to prepare the submission.
After running all workflows (indicated below), the figures used in the manuscript and their accompaining data will be in the subdirectory results/*/figures
.
We indicate the versions used, however, you'll probably be able to run the full pipeline with different versions.
-
command-line tools
- conda==23.3.1
- snakemake-minimal==6.10.0
- vast-tools==2.5.1
- csvkit==1.0.5
- fastq-dump==2.9.1
- gdc-client==1.6.1
- ascp==4.1.1.73
- axel==2.17.10
- bowtie==1.3.0
-
Python==3.8.3
- joblib==1.1.0
- tqdm==4.65.0
- pandas==1.3.0
- networkx==2.8.4
- numpy==1.20.3
- target_spotter==0.1.1
-
R==4.1.0
- Biostrings==2.64.1
- BSgenome.Hsapiens.UCSC.hg38
- cluster==2.1.2
- clusterProfiler==4.0.5
- ComplexHeatmap==2.9.3
- cowplot==1.1.1
- doParallel==1.0.16
- extrafont==0.17
- ensembldb==2.16.4
- EnsDb.Hsapiens.v86
- foreach==1.5.1
- ggnewscale==0.4.5
- ggplotify==0.1.0
- ggpubr==0.4.0
- ggraph==2.0.5
- ggrepel==0.9.1
- ggtext==0.1.1
- graphlayouts==0.8.0
- gridExtra==2.3
- gtools==3.9.2
- here==1.0.1
- igraph==1.3.1
- optparse==1.7.1
- pROC==1.18.0
- readxl==1.3.1
- scattermore==0.7
- statebins==2.0.0
- SummarizedExperiment==1.22.0
- TCGAbiolinks==2.25.3
- tidytext==0.3.2
- tidyverse==1.3.1
Refer to each workflow/*/README.md
file to run each workflow. Note that obtain_data
and preprocess_data
are the most computationally demanding workflows.