Skip to content

colabobio/sciviewer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Single Cell Interactive Viewer (Sciviewer)

This is an interactive viewer for 2D embeddings such as UMAP or tSNE of high dimensional single-cell RNA-seq data that is run directly out of the Jupyter Notebook environment. The user can select cells in the 2D plane and the viewer will calculate the differential expression between the selected and the unselected cells. Alternatively, the user can select a group of cells and a direction and the viewer will identify the genes with the greatest variation (Pearson correlation) along that direction. See a video of how this works below. Also, see the example tutorial in this repository (a small example of 3000 PBMCs that illustrates all the input options, and a larger one of 50,000 circulating T-cells).

Important note: The code here is an initial proof-of-concept, the development is continuing in this new repo, which will eventually replace this prototype.

Installation

The main requirement for sciviewer is py5 which in turn requires Python 3.8. We recommend using the conda package manager to install the necessary dependencies fo sciviewer. Conda can be installed following the instructions here. Then follow the steps below to install sciviewer.

  1. Prepare and activate the conda environment containing dependencies for py5:
conda env create -n sciviewer -f https://raw.githubusercontent.com/colabobio/sciviewer/master/sciviewer-env.yml
conda activate sciviewer

Alternatively, if you want to append the needed dependencies to an existing conda environment, instead of creating a new one, you can do the following:

conda env update -n your_existing_environment -f https://raw.githubusercontent.com/colabobio/sciviewer/master/sciviewer-env.yml
conda activate your_existing_environment

Note, Sciviewer currently requires Python 3.8 or greater.

  1. Install OpenJDK 1.7. Py5 does not work with OpenJDK 11, which is the one available through Conda at the moment. After creating an activating the sciviewer environment, there are two options:
  • If you already have OpenJDK 17 installed in your system, you can make it available to sciviewer by setting the JAVA_HOME environmental variable. For instance, if you installed Adoptium OpenJDK 17, the home folder should be /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home so you can get the JAVA_HOME as follows:
 export JAVA_HOME=/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home
  • Otherwise, you can install OpenJDK 17 as explained in Py5's Anaconda setup.
  1. Next, install the sciviewer package using pip:
pip install sciviewer

And that is it, the module is now installed and ready to be used.

To uninstall, use:

pip uninstall sciviewer
  1. Now launch jupyter from within the activated conda environment and you are good to go.
jupyter lab

Quick start

Sciviewer is executed from a Jupyter notebook such as in the examples directory. It is run by initializing a SCIViewer object with the 2D embedding (# cells X 2) and the expression data (# cells X # genes) and then running the explore_data method. E.g.

from sciviewer import SCIViewer
svobj = SCIViewer(expr, umap)
svobj.explore_data()

Running the code above will cause the visualizer to appear. The umap and expression data can also now be provided directly as a Scanpy AnnData, see below or tutorials for details.

Click the video link below for a ~3 minute tutorial on how to use the visualizer: Watch the video

Note, if you are running Sciviewer from the Jupyter notebook as in the tutorials, you need to load the py5 magic extension before you can start the visualizer

%load_ext py5

and if you are running it from a jupyter notebook on a mac computer, you need to add an additional magic extension

%gui osx

Some key usage points:

  • Inputs: The expression data can be provided as a Scanpy AnnData, Pandas DataFrame, a Numpy ndarray, or as a scipy sparse csc_matrix.
  • AnnData expression: For AnnData objects, the expression data are accessed from the .X attribute by default. Setting the use_raw argument to True causes it to be accessed from the .raw.X attribute instead. If the data are sparse, sciviewer requires it to be in the csc_matrix format. See the tutorial for how to convert between sparse matrix formats
  • Sparsity: Providing the data as a sparse csc_matrix is recommended for large datasets as it can lead to a considerable (1-2 order or magnitude) performance speedup. See this notebook as an example.
  • Gene/cell names: If the expression data is provided as a Pandas DataFrame, the cell names are inferred from the index and the gene names are inferred from the columns. If it is provided as a Scanpy AnnData, the gene names come from the index of the .var attribute and the cell names come from the index of the .obs attribute. Otherwise, the gene names and cell names can be provided when initializing the SCIViewer class with the gene_names and cell_names arguments, or will be initialized with generic names.
  • Real time updating of python variables The selected_cells attribute of the sciviewer object is updated whenever a new set of cells are selected, regardless of the mode, and contains information about the selected cells. The results_proj_correlation attribute of the sciviewer object is updated whenever a new selection is made in the "directional" mode and contains the Pearson correlation and P-values of all genes for the selected direction and cells. The results_diffexpr attribute is updated when a new selection is made in the "differential" mode and contains the T-statistic and P-value for the differential expression test (simple Welch's T-test). These are updated in real time as the visualizer is in use.

See the tutorial notebooks for more details

Development / debugging

For development purposes, it can be helpful to import sciviewer directly rather than installing the package. See the extras/debugging directory for notebooks with examples of how to do this e.g. debug_example_3K_PBMC.ipynb.

Publications

  • Kotliar D, Colubri A. Sciviewer enables interactive visual interrogation of single-cell RNA-Seq data from the python programming environment. (2021). Bioinformatics. doi: 10.1093/bioinformatics/btab689
  • Kotliar D, Colubri A. Sciviewer enables interactive visual interrogation of single-cell RNA-Seq data from the Python programming environment. (2021). bioRxiv. doi: 10.1101/2021.08.12.455997 (preprint)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published