Welcome to TMvis - a pipeline for transmembrane protein annotation and 3D visualization.
TMvis combines AlphaFold 2 [1] structures from the AlphaFold DB [2] with predicted transmembrane protein (TMP) annotations into interactive 3D visualizations of protein structures embedded into membranes. The TMPs are predicted by TMbed [3], a method based on the protein language model ProtT5 [4], which provides per-residue alpha-helical and beta-barrel transmembrane segment predictions. The respective AlphaFold 2 TMP structures are then enhanced by adding the predicted TMbed topology to the 3D visualization. Further, TMvis allows to add membrane embeddings predicted by ANVIL [5], or PPM3 [6].
As an example, we provide a subset of 496 predicted TMPs. TMbed predicted 4.967 TMP for the human proteome (20,375 proteins, UniProt [7] version April 2022; excluding TITIN_HUMAN due to length). We extracted AlphaFold 2 structures with an average per-residue confidence score (pLDDT) of more than 90%, which lead to the subset. Please download the set here.
Once you have the dataset ready, you can use the Jupyter notebook TMvis.ipynb
in the TMvis
folder for 3D-visualization of every protein structure in the dataset with predictions of ANVIL, PPM3, and TMbed. Additionally, you can visualize the per-residue confidence scores (pLDDT) of AlphaFold.
git clone https://github.com/Rostlab/TMvis
- Python and Conda
conda env create -n TMvis --file TMvis.yml
conda activate TMvis
- Nextflow
- Docker
See data/
folder for details.
Make sure that after this step, you have a folder data/current/
containing one folder with your AlphaFold 2 structures and a text file with TMbed predictions.
Run python3 ./TMvis/main.py
main.py
will generate a results/db
folder which contains AF structure of TMbed predicted membrane alpha/beta proteins. Additionally, db/**/pLDDT90F1
is a subset from db
with alpha and beta proteins selected by following criteria:
- Max. 2.700 base pairs long (length of one AlphaFold 2 PDB file)
- pLDDT mean score per protein is higher than 90 (highly accurate structures). If needed you can change the threshold.
- Unpack the docker container. See
docker/
folder for details. - Run
nextflow run ./PPM3/run_PPM3.nf -c custom.config
Note: to run ANVIL, you need an access key.
- Unpack the docker container. See
docker/
folder for details on how to do that and where to get the access key. - Run
nextflow run ./ANVIL/run_anvil.nf -c custom.config
Run jupyter notebook ./TMvis/TMvis.ipynb
- AlphaFold - AlphaFold Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.
- Alphafold DB - Varadi, Mihaly, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, et al. 2022. “AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models.” Nucleic Acids Research 50 (D1): D439–44.
- TMbed - TMbed Bernhofer, Michael, and Burkhard Rost. 2022. “TMbed – Transmembrane Proteins Predicted through Language Model Embeddings.” bioRxiv.
- ProtT5 - A. Elnaggar et al., "ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2021.3095381.
- ANVIL - ANVIL Postic, Guillaume, Yassine Ghouzam, Vincent Guiraud, and Jean-Christophe Gelly. 2016. “Membrane Positioning for High- and Low-Resolution Protein Structures through a Binary Classification Approach.” Protein Engineering, Design & Selection: PEDS 29 (3): 87–91.
- PPM3 - PPM3 Lomize, Mikhail A., Irina D. Pogozheva, Hyeon Joo, Henry I. Mosberg, and Andrei L. Lomize. 2012. “OPM Database and PPM Web Server: Resources for Positioning of Proteins in Membranes.” Nucleic Acids Research 40 (Database issue): D370–76.
- UniProt - UniProt Consortium (2021). UniProt: the universal protein knowledgebase in 2021. Nucleic acids research, 49(D1), D480–D489.