SECANT is a biology-guided SEmi-supervised method for Clustering, classification, and ANnoTation of single-cell multi-omics.
SECANT can be used to analyze CITE-seq data, or jointly analyze CITE-seq and scRNA-seq data. The novelties of SECANT include:
-
- using confident cell type labels classified from surface protein data through gating as guidance for cell clustering with RNA data
-
- providing general annotation of confident cell types for each cell cluster
-
- fully utilizing cells with uncertain or missing cell type labels to increase performance
-
- accurate prediction of confident cell types identified from surface protein data for scRNA-seq data
In general, the input of SECANT include:
-
- ADT confident cell type labels L, where L ranges from 0 to C. Each unique value refers to one confident cell type, such as B cells, Monocytes. The maximum value C indicates uncertain cell type (e.g., cells on the boundary of different cell types in a gating plot)
-
- RNA data after dimension reduction (e.g., scVI or PCA)
-
- Optional (for the purpose of jointly analyzing CITE-seq and scRNA-seq data): RNA data after dimension reduction and batch effect correction
Here, we demonstrate this functionality with public human PBMC data, bone marrow data and upper lobe lung data. The same pipeline would generally be used to analyze any CITE-seq dataset.
-
PBMC10k: SECANT_GitHub_10X10k_PBMC.ipynb
-
Bone marrow: SECANT_GitHub_Bone_marrow.ipynb
Here we demonstrate how to jointly analyze CITE-seq and scRNA-seq datasets with SECANT using two public PBMC CITE-seq datasets from 10x Genomics, namely 10X10k and 10X5k. We use the entire 10X10k dataset (i.e., both ADT and RNA) while we hold-out the ADT data of the 10X5k dataset to mimic scRNA-seq. We will store the original values to validate our results.
Due to computational burden, we suggest running this step in parallel on a server with multiple CPUs or GPUs. Here is an example SECANT_GitHub_Search_Best_Config.ipynb
We provide an example of simulation study, including both how to generate simualted data and assessing performance. For computational burden, we recommend runnining simulation on a server with multiple CPUs or GPUs. To replicate result using Google Colab, one needs to copy all files under simulation_files to Google Drive, and mount Google Colab with Google Drive. SECANT_GitHub_simulation.ipynb
A collection of datasets are available with SECANT. All datasets stored in this repository are pre-processed by scVI.
Dataset | Number of cells | Description | Original data source |
---|---|---|---|
10X10k_PBMC | 7,865 | Human PBMCs (from 10X Genomics) | source |
10X5k_PBMC | 5,527 | Human PBMCs (from 10X Genomics) | source |
Bone_marrow | 30,672 | Human bone marrow | source |
Upper_lobe_lung | 5,451 | Human upper lobe lung (on GEO, use DropletUtils for pre-processing) | source |
In-house data will be available soon.
Download a local copy of SECANT and install from the directory:
git clone https://github.com/tarot0410/SECANT.git
cd SECANT
pip install .
Torch, sklearn, umap, pandas, numpy and all of their respective dependencies.
- FLOCK + LDA for PBMC data: AutoGating.html
Here, we give an example of utilizing clustering uncertainty (through posterior probability) from SECANT for downstream analysis. Specifically, we remove cells with low confident clustering result in trajectory analysis for sensitivity analysis.
- Trajectory analysis of CD8+ T cells: Trajectory.html
Wang X, Xu Z, Hu H, Zhou X, Zhang Y, Lafyatis R, Chen K, Huang H, Ding Y, Duerr RH, Chen W. SECANT: a biology-guided semi-supervised method for clustering, classification, and annotation of single-cell multi-omics. PNAS Nexus. 2022