Update tutorials

Fixes minor spell typos and updates the installation page with the conda installation instructions, tweaks some images and adds authors to the license.
cnio-bu · Jan 25, 2021 · 380e7a5 · 380e7a5
1 parent 6df1914
commit 380e7a5
Show file tree

Hide file tree

Showing 8 changed files with 117 additions and 39 deletions.
diff --git a/.img/drug_signatures.png b/.img/drug_signatures.png
diff --git a/.img/workflow_landscape.png b/.img/workflow_landscape.png
diff --git a/.img/workflow_tutorial.png b/.img/workflow_tutorial.png
diff --git a/LICENSE b/LICENSE
@@ -6,7 +6,9 @@ National Cancer Research Center (CNIO), www.cnio.es.
 
 ACADEMIC PUBLIC LICENSE
 
-Copyright (C) 2021 Coral Fustero-Torre, María José Jiménez-Santos, Santiago García-Martín, Carlos Carretero-Puche, Luis García-Jimeno, Tomás Di Domenico, Gonzalo Gómez-López and Fátima Al-Shahrour
+Copyright (C) 2021 Coral Fustero-Torre, María José Jiménez-Santos, 
+Santiago García-Martín, Carlos Carretero-Puche, Luis García-Jimeno, 
+Tomás Di Domenico, Gonzalo Gómez-López and Fátima Al-Shahrour
 
 
 Preamble

diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 
 ## Workflow overview
 
-**Beyondcell workflow.** Given two inputs, the scRNA-seq expression matrix and a collection of drug signatures, the methodology calculates a beyondcell score (BCS) for each drug-cell pair. The BCS ranges from 0 to 1 and measures the susceptibility of each cell to a given drug. The resulting BCS matrix can be used to determine the sample’s therapeutic clusters. Furthermore, drugs are prioritized in a table and each individual drug score can be visualized in a UMAP.
+**Beyondcell workflow.** Given two inputs, the scRNA-seq expression matrix and a collection of drug signatures, the methodology calculates a Beyondcell score (BCS) for each drug-cell pair. The BCS ranges from 0 to 1 and measures the susceptibility of each cell to a given drug. The resulting BCS matrix can be used to determine the sample’s therapeutic clusters. Furthermore, drugs are prioritized in a table and each individual drug score can be visualized in a UMAP.
 
 ![Beyondcell workflow](./.img/workflow_tutorial.png)
 
@@ -22,18 +22,19 @@ Depending on the evaluated signatures, the BCS represents the cell perturbation
  * If time points are available, identify the changes in drug tolerance of your samples
  * Identify mechanisms of resistance
 
-## Installing beyondcell
-The **Beyondcell** algorithm is implemented in R (v. 4.0.0 or greater). We recommend running the installation via gitlab using devtools:
+## Installing Beyondcell
+The Beyondcell algorithm is implemented in R (v. 4.0.0 or greater). We recommend running the installation via conda: 
 
 ```r
-library("devtools")
-devtools::install_gitlab("bu_cnio/Beyondcell")
+# Create a conda environment
+conda create -n beyondcell 
+# Install Beyondcell package and dependencies
+conda install -c bu_cnio beyondcell
 ```
 
-See the DESCRIPTION file for a complete list of R dependencies. If the R dependencies are already installed, installation should finish promptly.
 
 ## Results
-We have validated Beyondcell in a population of MCF7-AA cells exposed to 500nM of bortezomib and collected at different time points: t0 (before treatment), t12, t48 and t96 (72h treatment followed by drug wash and 24h of recovery) obtained from *Ben-David U, et al., Nature, 2018*. We integrated all four conditions using the Seurat pipeline (left). After calculating the beyondcell scores (BCS) for each cell, a clustering analysis was applied. **Beyondcell** was able to cluster the cells based on their treatment time point, to separate untreated cells from treated cells (center) and to recapitulate the changes arisen by the treatment with bortezomib (right). 
+We have validated Beyondcell in a population of MCF7-AA cells exposed to 500nM of bortezomib and collected at different time points: t0 (before treatment), t12, t48 and t96 (72h treatment followed by drug wash and 24h of recovery) obtained from *Ben-David U, et al., Nature, 2018*. We integrated all four conditions using the Seurat pipeline (left). After calculating the BCS for each cell, a clustering analysis was applied. Beyondcell was able to cluster the cells based on their treatment time point, to separate untreated cells from treated cells (center) and to recapitulate the changes arisen by the treatment with bortezomib (right). 
 
 ![results_golub](./.img/integrated_bendavid.png)
 
@@ -45,17 +46,16 @@ For general instructions on running Beyondcell, check out the [analysis workflow
 ## Authors
 
  * Coral Fustero-Torre
- * María José Jiménez
+ * María José Jiménez-Santos
  * Santiago García-Martín
  * Carlos Carretero-Puche
- * Luis G. Jimeno
+ * Luis García-Jimeno
  * Tomás Di Domenico
  * Gonzalo Gómez-López
  * Fátima Al-Shahrour
 
 
-
-## References
+## Citation
 
 ## Support
-If you have any question regarding the use of **Beyoncell**, feel free to submit an [issue](https://gitlab.com/bu_cnio/Beyondcell/issues).
+If you have any question regarding the use of Beyoncell, feel free to submit an [issue](https://gitlab.com/bu_cnio/Beyondcell/issues).
diff --git a/tutorial/GenerateGenesets/README.md b/tutorial/GenerateGenesets/README.md
@@ -0,0 +1,73 @@
+# GenerateGenesets function
+
+
+
+By default, `GenerateGenesets` returns a `geneset` with the `250` most upregulated and downregulated genes in each drug signature. You can change this behaviour by providing new values to `n.genes` and `mode`. Moreover, a small collection of functional pathways will be included in your `geneset` object. These pathways are related to the regulation of the epithelial-mesenchymal transition (EMT), cell cycle, proliferation, senescence and apoptosis. Note that `n.genes` and `mode` arguments do not affect to functional pathways.
+
+```r
+# Generate geneset object with one of the ready to use signature collections.
+gset <- GenerateGenesets(PSc)
+# Retrieve only the top 100 most upregulated genes in drug signatures (functional pathways remain unchanged)
+up100 <- GenerateGenesets(PSc, n.genes = 100, mode = "up")
+# You can deactivate the functional pathways option if you are not interested in evaluating them
+nopath <- GenerateGenesets(PSc, include.pathways = FALSE)
+```
+
+Additionaly, you can computed a `geneset` from a pre-loaded PSc subset called DSS.
+
+```r
+# Generate geneset object with one of the ready to use signature collections
+dss <- GenerateGenesets(DSS, include.pathways = FALSE)
+```
+
+Also, you can filter PSc, SSc and DDS objects by several fields (cap insensitive):
+
+ * `drugs`: Drug name of interest (i.e sirolimus).
+ * `IDs`: `sig_id` of the signature(s) of interest.
+ * `MoA`: Desired mechanism of action of interest (i.e. MTOR INHIBITOR).
+ * `targets`: Target gene of interest (i.e. MTOR).
+ * `source`: `"LINCS"` (for PSc) or `"GDSC"`, `"CCLE"` and/or `"CTRP"` (for SSc)
+
+```r
+# Return a `geneset` with all sirolimus signatures, as well as signatures of sirolimus synonyms such as 
+# rapamycin or BRD-K84937637
+sirolimus <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(drugs = "sirolimus"))
+# Return just a subset of sirolimus signatures
+my_sigs <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(IDs = c("sig_2349", "sig_7409"))
+# Return all MTOR INHIBITORS
+MTORi <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(MoA = "MTOR INHIBITOR")
+# Return all drugs targetting MTOR
+mtor_targets <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(targets = "MTOR")
+# Return only signatures derived from GDSC and CCLE
+my_sources <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(source = c("GDSC", "CCLE"))
+```
+
+By calling `ListFilters` function, you can retrieve all the available values for a given field. The signatures that pass **ANY** of these filters are included in the final `geneset`.
+
+```r
+# Values for targets
+ListFilters(entry = "targets")
+# Geneset with all drugs taht target MTOR and sirolimus signatures
+filter_combination <- GenerateGenesets(SSc, include.pathways = FALSE, 
+ filters = list(drugs = "sirolimus", targets = "MTOR"))
+```
+You can check information about the pre-loaded signatures calling the object `drugInfo`. Also, each `geneset` object obtained using pre-loaded matrices contains a subset of `drugInfo` for the selected drugs.
+
+```r
+# drugInfo of the signatures of interest
+gset@info
+```
+
+Finally, Beyondcell allows the user to input a GMT file containing the functional pathways/signatures of interest or a numeric matrix (containing a ranking criteria such as the t-statistic or logFoldChange).
+
+ * **In case your input is a GMT file:** You must supply the path to the file. Take into account that the names of each gene set must end in `"_UP"` or `"_DOWN"` to specify its mode. In this case, `n.genes` and `mode` are deprecated.
+ * **In case your input is a numeric matrix:** Make sure that rows correspond to genes and columns to signatures.
+
+In both cases, `filters` argument is deprecated but you must indicate if the `comparison` that yielded your input was `"treated_vs_control"` or `"sensitive_vs_resistant"`.
+
+```r
+# Mock numeric matrix
+m <- matrix(rnorm(500 * 25), ncol = 25, dimnames = list(rownames(PSc[[1]])[1:500], colnames(PSc[[1]])[1:25]))
+num_matrix <- GenerateGenesets(m, n.genes = 100, mode = c("up", "down"), 
+ comparison = "treated_vs_control", include.pathways = TRUE)
+```
diff --git a/tutorial/analysis_workflow/README.md b/tutorial/analysis_workflow/README.md
@@ -9,53 +9,56 @@ We have validated Beyondcell in a population of MCF7-AA cells exposed to 500nM o
 ## Using Beyondcell
 For a correct analysis with **Beyondcell**, users should follow these steps: 
 
- 1. Read single cell expression matrix
- 2. Compute Beyondcell scores
- 3. Compute Therapeutic clusters
+ 1. Read a single-cell expression object
+ 2. Compute the Beyondcell scores (BCS)
+ 3. Compute the Therapeutic Clusters (TCs)
  * Check clustering and look for unwanted sources of variation
  * Regress out unwanted sources of variation
- * Recompute UMAP
+ * Recompute UMAP rduction
  4. Compute ranks
  5. [**Visualize**](https://gitlab.com/bu_cnio/Beyondcell/-/tree/master/tutorial/visualization) the results
 
 
-### 1. Read single cell expression object
-In order to correctly compute the scores, the transcriptomic data needs to be pres-processed. This means that proper cell-based quality control filters, as well as normalization, scaling and clustering of the data, should be applied prior to the analysis with **Beyondcell**. 
+### 1. Read a single-cell expression object
+Beyondcell can accept both a single-cell matrix or a Seurat object. In order to correctly compute the scores, the transcriptomics data needs to be pre-processed. This means that proper cell-based quality control filters, as well as normalization and scaling of the data, should be applied prior to the analysis with Beyondcell.
+
+> Note: We recommend using a Seurat object.
 
 ```r
-library("Beyondcell")
+library("beyondcell")
 library("Seurat")
 # Read single cell experiment
 sc = readRDS(path_to_sc)
 ```
 
-### 2. Compute BCS
-The `bcCompute` function allows you to input either a pre-processed seurat object or a single cell matrix. Have in mind, that when a seurat object is used as an input, the `DefaultAssay` must be specified, both `SCT` and `RNA` assays are accepted.
+Note that if you are using a Seurat object, the `DefaultAssay` must be specified. Both `SCT` and `RNA` assays are accepted.
 
 ```r
 # Set Assay
 DefaultAssay(sc) <- "RNA"
 ```
-**Generate Signatures**\
-In order to compute the BCS, we also need a **gene signatures object** containing the drug or functional signatures we are interested in evaluating. To create this object, the `GenerateGenesets` function needs to be called. **Beyondcell** includes two drug signature collections that are ready to use:
 
- * The drug Perturbation Signatures collection (PSc): captures the transcriptional changes induced by a drug.
- * The drug Sensitivity Signatures collection (SSc): captures the drug sensitivity to a given drug.
+### 2. Compute the BCS
+We need to perform two steps:
+
+#### Get a geneset object with signatures of interest
+In order to compute the BCS, we also need a `geneset` object containing the drug or functional signatures we are interested in evaluating. To create this object, the `GenerateGenesets` function needs to be called. Beyondcell includes two drug signature collections that are ready to use:
 
-A small collection of functional pathways will be included by default in your gene signatures object. These pathways are related to the regulation of the epithelial-mesenchymal transition (EMT), cell cycle, proliferation, senescence and apoptosis. 
+ * **Drug Perturbation Signatures collection (PSc):** Captures the transcriptional changes induced by a drug.
+ * **Drug Sensitivity Signatures collection (SSc):** Captures the drug sensitivity to a given drug.
+
+A small collection of functional pathways will be included by default in your gene signatures object. These pathways are related to the regulation of the epithelial-mesenchymal transition (EMT), cell cycle, proliferation, senescence and apoptosis.
 
 ```r
-# Generate gene signatures object with one of the ready to use signature collections
-gs <- GenerateGenesets(PSc, include.pathways = TRUE)
+# Generate geneset object with one of the ready to use signature collections
+gset <- GenerateGenesets(PSc)
 # You can deactivate the functional pathways option if you are not interested in evaluating them
-gs <- GenerateGenesets(PSc, include.pathways = FALSE)
+nopath <- GenerateGenesets(PSc, include.pathways = FALSE)
 ```
 
-Furthermore, **Beyondcell** allows the user to input a .GMT file containing the functional pathways/signatures of interest, or a numeric matrix (containing a ranking criteria such as the t-statistic or logFoldChange).
-
-You can check out the structure of the obtained gene set object, information on the drug signatures, mode of action and target genes can be found at `gs@info` or by using the `FindDrugs` function. 
+PSc and SSc signatures can also be filtered according to several values. Moreover, Beyondcell allows the user to input a GMT file containing the functional pathways/signatures of interest, or a numeric matrix (containing a ranking criteria such as the t-statistic or logFoldChange). For further information please check [GenerateGenesets](https://gitlab.com/bu_cnio/Beyondcell/-/tree/master/tutorial/GenerateGenesets) tutorial.
 
-**Compute BCS**
+#### Compute the BCS
 ```r
 # Compute score for the PSc. This might take a few minutes depending on the size of your dataset.
 bc <- bcScore(sc, gs, expr.thres = 0.1) 
@@ -85,13 +88,13 @@ It is important to check whether any unwanted source of variation is guiding the
 
 ```r
 # Visualize whether cells are clustered based on the number of genes detecter per each cell
-bcClusters(bc, UMAP = "Beyondcell", idents = "nFeature_RNA", factor.col = FALSE)
+bcClusters(bc, UMAP = "beyondcell", idents = "nFeature_RNA", factor.col = FALSE)
 ```
 <img src=".img/nFeature_variation.png" width="500">
 
 ```r
 # Visualize whether cells are clustered based on their cell cycle status
-bcClusters(bc, UMAP = "Beyondcell", idents = "Phase", factor.col = TRUE)
+bcClusters(bc, UMAP = "beyondcell", idents = "Phase", factor.col = TRUE)
 ```
 <img src=".img/Phase_variation.png" width="500">
 
@@ -112,9 +115,9 @@ Once corrected, you will need to recompute the dimensionality reduction and clus
 # Recompute UMAP
 bc <- bcUMAP(bc, pc = 5, res = 0.2, add.DSS = FALSE, k.neighbors = 20) 
 # Visualize UMAP
-bcClusters(bc, UMAP = "Beyondcell", idents = "nFeature_RNA", factor.col = FALSE, pt.size = 1)
+bcClusters(bc, UMAP = "beyondcell", idents = "nFeature_RNA", factor.col = FALSE, pt.size = 1)
 # Visualize Therapeutic clusters
-bcClusters(bc, UMAP = "Beyondcell", idents = "bc_clusters_res.0.2", pt.size = 1)
+bcClusters(bc, UMAP = "beyondcell", idents = "bc_clusters_res.0.2", pt.size = 1)
 ```
 
 <p float="left">

diff --git a/tutorial/visualization/README.md b/tutorial/visualization/README.md
@@ -69,7 +69,7 @@ bcSignatures(bc, UMAP = "beyondcell", genes = list(values = "PSMA5"), pt.size =
 <img src=".img/psma5_expr.png" width="500">
 
 ## Ranking visualization
-We can summarize the ranking results using the `bc4Squares` function. This function summarizes the top hits obtained for each of the specified condition levels. The residuals are represented in the x axis, the switch point is represented in the y axis. The top-left and bottom-right corners contain the drugs to which all selected cells are most/least sensistive, respectively. The centre quadrants show the drugs with an heterogeneous response. In this case, we can clearly see how the tool predicts an heterogeneous response to bortezomib. 
+We can summarize the ranking results using the `bc4Squares` function. This function summarizes the top hits obtained for each of the specified condition levels. The residuals are represented in the x axis, the switch point is represented in the y axis. The top-left and bottom-right corners contain the drugs to which all selected cells are least/most sensitive, respectively. The centre quadrants show the drugs with an heterogeneous response. In this case, we can clearly see how the tool predicts an heterogeneous response to bortezomib. 
 
 ```r
 bc4Squares(bc, idents = "condition", lvl = "t0", top = 5)