Skip to content

tarot0410/SECANT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SECANT (Beta)

SECANT is a biology-guided SEmi-supervised method for Clustering, classification, and ANnoTation of single-cell multi-omics.

SECANT can be used to analyze CITE-seq data, or jointly analyze CITE-seq and scRNA-seq data. The novelties of SECANT include:

    1. using confident cell type labels classified from surface protein data through gating as guidance for cell clustering with RNA data
    1. providing general annotation of confident cell types for each cell cluster
    1. fully utilizing cells with uncertain or missing cell type labels to increase performance
    1. accurate prediction of confident cell types identified from surface protein data for scRNA-seq data

workflow

In general, the input of SECANT include:

    1. ADT confident cell type labels L, where L ranges from 0 to C. Each unique value refers to one confident cell type, such as B cells, Monocytes. The maximum value C indicates uncertain cell type (e.g., cells on the boundary of different cell types in a gating plot)
    1. RNA data after dimension reduction (e.g., scVI or PCA)
    1. Optional (for the purpose of jointly analyzing CITE-seq and scRNA-seq data): RNA data after dimension reduction and batch effect correction

Get Started

Analyzing CITE-seq data

Here, we demonstrate this functionality with public human PBMC data, bone marrow data and upper lobe lung data. The same pipeline would generally be used to analyze any CITE-seq dataset.

Jointly analyzing CITE-seq and scRNA-seq data

Here we demonstrate how to jointly analyze CITE-seq and scRNA-seq datasets with SECANT using two public PBMC CITE-seq datasets from 10x Genomics, namely 10X10k and 10X5k. We use the entire 10X10k dataset (i.e., both ADT and RNA) while we hold-out the ADT data of the 10X5k dataset to mimic scRNA-seq. We will store the original values to validate our results.

SECANT_GitHub_Joint_10X.ipynb Open In Colab

Search for the best configuration of concordance matrix in a data-driven manner

Due to computational burden, we suggest running this step in parallel on a server with multiple CPUs or GPUs. Here is an example SECANT_GitHub_Search_Best_Config.ipynb Open In Colab

Simulation study

We provide an example of simulation study, including both how to generate simualted data and assessing performance. For computational burden, we recommend runnining simulation on a server with multiple CPUs or GPUs. To replicate result using Google Colab, one needs to copy all files under simulation_files to Google Drive, and mount Google Colab with Google Drive. SECANT_GitHub_simulation.ipynb Open In Colab

Datasets

A collection of datasets are available with SECANT. All datasets stored in this repository are pre-processed by scVI.

Public data:

Dataset Number of cells Description Original data source
10X10k_PBMC 7,865 Human PBMCs (from 10X Genomics) source
10X5k_PBMC 5,527 Human PBMCs (from 10X Genomics) source
Bone_marrow 30,672 Human bone marrow source
Upper_lobe_lung 5,451 Human upper lobe lung (on GEO, use DropletUtils for pre-processing) source

In-house data:

In-house data will be available soon.

Installation:

From source

Download a local copy of SECANT and install from the directory:

git clone https://github.com/tarot0410/SECANT.git
cd SECANT
pip install .

Dependencies

Torch, sklearn, umap, pandas, numpy and all of their respective dependencies.

Other relevant material

Example of using automatic gating tool to classify major cell types with CITE-seq data

Clustering uncertainty used in downstream analysis

Here, we give an example of utilizing clustering uncertainty (through posterior probability) from SECANT for downstream analysis. Specifically, we remove cells with low confident clustering result in trajectory analysis for sensitivity analysis.

Paper

Wang X, Xu Z, Hu H, Zhou X, Zhang Y, Lafyatis R, Chen K, Huang H, Ding Y, Duerr RH, Chen W. SECANT: a biology-guided semi-supervised method for clustering, classification, and annotation of single-cell multi-omics. PNAS Nexus. 2022

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published