We benchmark two types of CCC inference methods, one type of methods predict LR pairs based on scRNA-seq data, and another type of methods that can predict ligand/receptor-targets regulations.
For the first benchmark, we evaluated the accuracy, stability and usability of 18 LR inference methods. In term of accuracy, paired ST datasets, CAGE expression/Proteomics data and sampled scRNA-seq datasets were used to benchmark the 18 methods. Firstly, 11 scRNA-seq datasets were used as input for methods to predict intercellular communication and the two defined similarity index (SI, modified Jaccard index) and rank-based similarity index (RSI) were used to compare the similarity of LR pairs predicted by methods.Furthermore, we benchmark the 18 methods using 11 paired ST datasets with the hypothesis that the values of mutual information (MI) of LR pairs are greater in the close group than that in the distant group. In addition, three PBMC datasets from 10X Genomics website were used as input for methods to predict LR pairs and CAGE expression/Proteomics data were used as pseudo gold standards to benchmark the 18 methods. In term of stability, we ramdomly sampled different ratios of cells in all the scRNA-seq, resulting 70 sampled datasets and 14 original datasets as input for methods. We calculated the Jaccard index of the LR pairs predicted based sampled datasets and original datasets and a stability value was defined to test the robustness of methods to sampling rates of scRNA-seq data. In term of usability, we recorded the running time and maximum memory usage of methods in all the 84 scRNA-seq datasets.
For the second benchmark, 8 ST datasets were used as the input for 5 LR-Targets inference tools to predict ligand/receptor-targets regulations, and the cell line perturbation datasets were used for evaluation, involving knockout/mutant conditions for 5 receptors, and treatment conditions for 10 ligands. And the differentially expressed genes (DEGs) in each cell line perturbation dataset, were used as the ground truth of ligand/receptor-targets regulations. The score of ligand/receptor-targets predicted by different tools were compared to the differential expression status (DGEs or not DEGs) of corresponding targets to calculate AUROC and AUPRC. In addition, we also record the running time and maximum memory usage of methods in all the ST datasets.
- Step0_LRToolsFunction contains the R/Python/Shell scripts that package the running code of 19 methods with Seurat objects as input into function.
- Step1_LRPredictionResult contains the R/Shell scripts to run 19 methods for inferring LR pairs from the 14 scRNA-seq datasets.
- Step2_PreSTForLRBench contains the R scripts to get the different ratios (e.g.top 10%, 20%, 30%, 40%) of cell type specific close and distant cell pairs in each dataset for the preparation of the benchmarking using mutual infomation.
- Step3_MIForLRBench contains the R scripts to calculate MI of LR interactions predicted by methods in the different ratios of cell type specific close and distant groups and calculate DLRC index of methods in each dataset.
- Step4_SIRSIForLRBench contains the R scripts to benchmark the similarity (SI and RSI) of the LR interactions predicted by each two methods.
- Step5_BenchBasedCAGEProteomic contains the R scripts to benchmark the 18 LR inference methods using the CAGE expression and proteomics data.
- Step6_LRBenchSampling contains the R/Shell scripts to run the 18 LR inference methods for inferring LR pairs from 70 sampled scRNA-seq datasets.
- Step7_LRBenchSamplingBench contains the R/Shell scripts to calculate Jaccard index between the LR pairs predicted based on the sampled datasets and the original datasets, and record the running time and maximum memory usage of methods in each dataset.
- Step8_LRTToolsFunction contains the R/Python/Shell scripts to run the 5 LR-Target inference methods for predicting ligand/receptor-targets using ST datasets as input.
- Step9_LRTBench contains the R scripts to benchmark the 5 LR-Target inference methods using cell line perturbation datasets for evaluation, and record the running time and maximum memory usage of methods in each dataset.
- scRNA-seq and ST datasets
Tissue (Disease) | SampleID (scRNA-seq) | SampleID (ST) | Literature PMID | Download URL (scRNA-seq) | Download URL (ST) | Evaluation purpose |
---|---|---|---|---|---|---|
Heart Tissue (Health) | CK357 | control_P7 | 35948637 | URL | URL | LR interactions LR-Target regulations |
CK358 | control_P8 | |||||
Heart Tissue (ICM) | CK368 | FZ_GT_P19 | LR interactions | |||
CK162 | FZ_GT_P4 | |||||
CK362 | RZ_P11 | |||||
Heart Tissue (AMI) | CK361 | IZ_P10 | ||||
CK161 | IZ_P3 | |||||
CK165 | IZ_BZ_P2 | |||||
Tumor Tissue (Breast cancer) | CID44971 | CID44971 | 34493872 | URL | URL | LR interactions LR-Target regulations |
CID4465 | CID4465 | |||||
Mouse embryo | —— | Slide14 | 34210887 | —— | URL | LR interactions |
PBMC | PBMC4K | —— | —— | URL | —— | LR interactions |
PBMC6K | —— | —— | URL | —— | ||
PBMC8K | —— | —— | URL | —— | ||
Tumor Tissue (Gliomas) | —— | UKF243_T_ST | 35700707 | —— | URL | LR-Target interactions |
—— | UKF260_T_ST | |||||
—— | UKF266_T_ST | |||||
—— | UKF334_T_ST |
- Cell line perturbation datasets
Datasets | Ligand/Receptor | Type | Condition | Cell Line | Disease |
---|---|---|---|---|---|
GSE120268 | AXL | receptor | Knockdown | MDA-MB-231 | Breast Cancer |
GSE157680 | NRP1 | receptor | Knockdown | MDA-MB-231 | |
GSE15893 | CXCR4 | receptor | Mutant | MDA-MB-231 | |
CXCL12 | ligand | Treatment | MDA-MB-231 | ||
GSE160990 | TGFB1 | ligand | Treatment | MDA-MB-231 | |
GSE36051 | DLL4(1) | ligand | Treatment | MCF7 | |
DLL4(2) | ligand | Treatment | MDA-MB-231 | ||
JAG1 | ligand | Treatment | MDA-MB-231 | ||
GSE65398 | IGF1(1) | ligand | Treatment | MCF7 | |
GSE7561 | IGF1(2) | ligand | Treatment | MCF7 | |
GSE69104 | CSF1R | receptor | Inhibit | TAMs | Gliomas |
GSE116414 | FGFR1 | receptor | Inhibit | GSLC | |
GSE206947 | EFNB2 | ligand | Treatment | cardiac fibroblasts | Health |
GSE181575 | TGFB1 | ligand | Treatment | cardiac fibroblasts | |
GSE123018 | TGFB1 | ligand | Treatment | cardiac fibroblasts |
- CellPhoneDB (Python, version: 3.0.0)
- CellTalker (R, version: 0.0.4.9000)
- Connectome (R, version: 1.0.1)
- NATMI (Python)
- ICELLNET (R, version: 1.0.1)
- scConnect (Python, version: 1.0.3)
- CellChat (R, version: 1.4.0)
- SingleCellSignalR (R, version: 1.2.0)
- CytoTalk (R, version: 0.99.9)
- CellCall (R, version: 0.0.0.9000)
- scSeqComm (R, version: 1.0.0)
- NicheNet (R, version: 1.1.0)
- Domino (R, version: 0.1.1)
- scMLnet (R, version: 0.2.0)
- PyMINEr (Python, version: 0.10.0)
- iTALK (R, version: 0.1.0)
- cell2cell (Python, version: 0.5.10)
- RNAMagnet (R, version: 0.1.0)
- CytoTalk (R, version: 0.99.9)
- NicheNet (R, version: 1.1.0)
- stMLnet (R, version: 0.1.0)
- MISTy (R, version: 1.3.8)
- HoloNet (Python, version: 0.0.5)
Please cite ESICCC as follows:
Luo J, Deng M, Zhang X, Sun X*. ESICCC as a systematic computational framework for evaluation, selection and integration of cell-cell communication inference methods. Genome Research. 2023. doi: 10.1101/gr.278001.123
If you encounter any problems, please contact ([email protected]).