-
Notifications
You must be signed in to change notification settings - Fork 70
/
CSIDE_celltocell_interactions.Rmd
155 lines (126 loc) · 8.18 KB
/
CSIDE_celltocell_interactions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
title: "Applying CSIDE to Spatial Transcriptomics Data: Cell-to-cell interactions"
author: "Dylan Cable"
date: "December 19th, 2021"
output:
html_document:
keep_md: yes
pdf_document: default
rmarkdown::html_vignette:
keep_md: yes
vignette: |
%\VignetteIndexEntry{CSIDE-celltocell-interactions}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---
```{r setup}
library(spacexr)
library(Matrix)
library(doParallel)
library(ggplot2)
datadir <- system.file("extdata",'SpatialRNA/Vignette',package = 'spacexr') # directory for sample Slide-seq dataset
if(!dir.exists(datadir))
dir.create(datadir)
savedir <- 'RCTD_results'
if(!dir.exists(savedir))
dir.create(savedir)
```
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
cache = TRUE,
out.width = "100%"
)
```
## Introduction
Cell type-Specific Inference of Differential Expression, or CSIDE, is part of the spacexr R package for learning cell type-specific differential expression from spatial transcriptomics data. In this Vignette, we will use CSIDE to test for cell-to-cell interactions in a toy cerebellum Slide-seq dataset. First, we will first use RCTD to assign cell types to a cerebellum Slide-seq dataset. We will define cell type profiles using an annotated single nucleus RNA-sequencing (snRNA-seq) cerebellum dataset. We will test for differential expression as a function of colocalization with a particular cell type.
## Data Preprocessing and running RCTD
First, we run RCTD on the data to annotated cell types. Please note that this follows exactly the content of the spatial transcriptomics RCTD vignette (doublet mode). Please refer to the [spatial transcriptomics vignette](https://raw.githack.com/dmcable/spacexr/master/vignettes/spatial-transcriptomics.html) for more explanation on the RCTD algorithm.
```{r RCTD}
### Load in/preprocess your data, this might vary based on your file type
if(!file.exists(file.path(savedir,'myRCTD.rds'))) {
counts <- read.csv(file.path(datadir,"MappedDGEForR.csv")) # load in counts matrix
coords <- read.csv(file.path(datadir,"BeadLocationsForR.csv"))
rownames(counts) <- counts[,1]; counts[,1] <- NULL # Move first column to rownames
rownames(coords) <- coords$barcodes; coords$barcodes <- NULL # Move barcodes to rownames
nUMI <- colSums(counts) # In this case, total counts per pixel is nUMI
puck <- SpatialRNA(coords, counts, nUMI)
barcodes <- colnames(puck@counts) # pixels to be used (a list of barcode names).
plot_puck_continuous(puck, barcodes, puck@nUMI, ylimit = c(0,round(quantile(puck@nUMI,0.9))),
title ='plot of nUMI')
refdir <- system.file("extdata",'Reference/Vignette',package = 'spacexr') # directory for the reference
counts <- read.csv(file.path(refdir,"dge.csv")) # load in counts matrix
rownames(counts) <- counts[,1]; counts[,1] <- NULL # Move first column to rownames
meta_data <- read.csv(file.path(refdir,"meta_data.csv")) # load in meta_data (barcodes, clusters, and nUMI)
cell_types <- meta_data$cluster; names(cell_types) <- meta_data$barcode # create cell_types named list
cell_types <- as.factor(cell_types) # convert to factor data type
nUMI <- meta_data$nUMI; names(nUMI) <- meta_data$barcode # create nUMI named list
reference <- Reference(counts, cell_types, nUMI)
myRCTD <- create.RCTD(puck, reference, max_cores = 2)
myRCTD <- run.RCTD(myRCTD, doublet_mode = 'doublet')
saveRDS(myRCTD,file.path(savedir,'myRCTD.rds'))
}
```
## Create explanatory variable / covariate
Now that we have successfully run RCTD, we can create a explanatory variable (i.e. covariate) used for predicting differential expression in CSIDE. In general one should set the explanatory variable to biologically relevant predictors of gene expression such as spatial position.
The explanatory variable itself is a vector of values, constrained between 0 and 1, with names matching the pixel names of the `myRCTD` object. In this case, we will use the explanatory variable
as the density of cell type '6'. We will calculate this explanatory variable using the `exvar.celltocell.interactions`, which calculates cell type density using an exponential filter.
Here, we also artifically upregulate the expression of the Malat1 gene (in regions of high explanatory variable) to see whether CSIDE can detect this differentially expressed gene.
```{r SpatialRNA, results = 'hide', fig.height = 6, fig.width = 6}
### Create SpatialRNA object
myRCTD <- readRDS(file.path(savedir,'myRCTD.rds'))
barcodes <- colnames(myRCTD@spatialRNA@counts)
cell_type_interacting <- '6'
explanatory.variable <- exvar.celltocell.interactions(myRCTD, barcodes, cell_type_interacting, radius = 500)
print(head(explanatory.variable))
hist(explanatory.variable)
# Differentially upregulate one gene
change_gene <- 'Malat1'
high_barc <- names(explanatory.variable[explanatory.variable > 0.5])
low_barc <- names(explanatory.variable[explanatory.variable < 0.5])
myRCTD@originalSpatialRNA@counts[change_gene, high_barc] <- myRCTD@spatialRNA@counts[change_gene, high_barc] * 2
plot_puck_continuous(myRCTD@spatialRNA, names(explanatory.variable), explanatory.variable, ylimit = c(0,1), title ='plot of explanatory variable')
```
## Running CSIDE
After creating the explanatory variable, we are now ready to run CSIDE using the `run.CSIDE.single` function. We will use two cores, and a false discovery rate of 0.25. Next, we will set a gene threshold (i.e. minimum gene expression) of 0.01, and we will set a cell_type_threshold (minimum instances per cell type) of 3.
**Warning**: On this toy dataset, we have made several choices of parameters that are not recommended for regular use. On real datasets, we recommend first consulting the CSIDE default parameters. This includes `gene_threshold` (default 5e-5), `cell_type_threshold` (default 125), and `fdr` (default 0.01). Please see `?run.CSIDE.single` for more information on these parameters.
```{r DEgenes}
#de
myRCTD@config$max_cores <- 2
myRCTD <- run.CSIDE.single(myRCTD, explanatory.variable, gene_threshold = .01,
cell_type_threshold = 3, fdr = 0.25)
saveRDS(myRCTD,file.path(savedir,'myRCTDde_celltocell.rds'))
```
Equivalently to using the `run.CSIDE.single` function, those who want to have more precise control can build the design matrix directly. In this case, we use the `build.designmatrix.single` function, which
constructs the design matrix for a single explanatory variable. After constucting the design matrix, one can use the `run.CSIDE` function to run CSIDE with this general design matrix.
```{r design matrix}
X <- build.designmatrix.single(myRCTD, explanatory.variable)
print(head(X))
barcodes <- names(explanatory.variable)
cell_types <- c('10', '18', '6')
myRCTD_identical <- run.CSIDE(myRCTD, X, barcodes, cell_types, gene_threshold = .01,
cell_type_threshold = 3, fdr = 0.25)
```
## CSIDE results
After running CSIDE, it is time to examine CSIDE results. CSIDE model fits for all genes are stored in `myRCTD@de_results$gene_fits`, and we demonstrate below how to access the point estimates, standard errors, and convergence. We will focus on cell type 10. Furthermore, we will examine the original Malat1 gene, which was differentially expressed in cell type 10.
```{r CSIDE_results, fig.width = 8, fig.height=8}
#print results for cell type '18'
cell_type <- '10'
sig_gene <- change_gene
print(paste("following results hold for", sig_gene))
print("check for covergence of each cell type")
print(myRCTD@de_results$gene_fits$con_mat[sig_gene, ])
print('estimated DE')
print(myRCTD@de_results$gene_fits$mean_val[sig_gene, ])
print('standard errors for non-intercept terms')
print(myRCTD@de_results$gene_fits$s_mat[sig_gene, ])
```
Finally, we will plot CSIDE results in the `savedir` directory!
The following plot shows a spatial visualization of the Malat1 gene, which was determined to be differentially expressed.
The function `make_all_de_plots` will automatically generate several types of plots displaying the CSIDE results across all genes and cell types.
```{r results, fig.width = 8, fig.height=8}
myRCTD <- readRDS(file.path(savedir,'myRCTDde_celltocell.rds'))
plot_gene_two_regions(myRCTD, sig_gene, cell_type, min_UMI = 10, expr.thresh = 25)
make_all_de_plots(myRCTD, savedir)
```