Skip to content

An example Variational AutoEncoder built in pytorch for single-cell data with a Zero Inflated Negative Binomial (zinb) distribution or Negative Binomial (nb) distribution

Notifications You must be signed in to change notification settings

fffamk/ZINB_VAE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

ZeroInflatedNegativeBinomial_VAE

This is a very fundamental Variational AutoEncoder model with zero inflated negative binomial distribution and negative binomial distribution. It can be widely applied in the single cell RNA (scRNA) or single cell ATAC (scATAC) data to extract compat low dimensionality representations. (Other data distribution might be added to this repo later on.)

For beginers who are inetrested in using VAE for single cell analysis, you can use this as an toy example to play with different parameters like batch_size, learning rate to see how to achieve a better low dimensionality represetnations.

For advanced users or who want to customize their own VAE models on single cell data, you can easily build a customized VAE model by changing the number of layers and number of neurons in the encoder/decoder structure by specifying the key arguments, or adding the novel deep learning components, customizing the loss function, and etc.

Dependencies

  • Python 3
  • pytorch
  • pyro
  • anndata
  • scanpy

Datasets

Example data example.h5ad is provided to testing the toy example only. The cell type label is celltype in the uns attribute.

Usage

git clone https://github.com/Szym29/ZeroInflatedNegativeBinomial_VAE.git
cd ZeroInflatedNegativeBinomial_VAE

Please use python example_zinb_nb_VAE_pytorch.py -h to find the usage of keywords like input data directory, leraning rate, assumption of data distribution, and the name of cell type attributes, plotting latent embedding with umaps or not, and etc.

The --data_dir is required for running the script

Example:

python example_zinb_nb_VAE_pytorch.py --data_dir example.h5ad --distribution zinb --use_cuda True --plot_embedding True --clustering True --lable_name celltype

The VAE model will apply zero-inflated negative binomial distribution and be trained using GPU, and doing leiden clustering on the latent representation level. Plot the latent representation using umap and coloring by cell types labels and clustering results. (Plots can be found at ./figures/)

If your single cell data is not preprocessed, you can extends the preprocess(adata) function and uncomment the adata = preprocess(adata) in the main function to use the preprocessed data. You can also simply pass the directory of preprocessed data into the script.

NOTE: if the preprocessed data or your data contains negative values, the script will stop running because this violate the assumption of both negative binomial and zero-inflated negative binomial distribution. Druing the training, the loss will be NaN. If you still want to use the the data in training a VAE model, please specify --left_trim True when you run the python script. This keyword will force all negative values as 0 and the model will run without bugs. But this strategy will loss some information from data.

Contact

Yumin Zheng

About

An example Variational AutoEncoder built in pytorch for single-cell data with a Zero Inflated Negative Binomial (zinb) distribution or Negative Binomial (nb) distribution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%