Sample raw data for results discussed in Predicting Relative Populations of Protein Conformations without a Physics Engine Using AlphaFold2
You can now run a fully self-contained sample of our AlphaFold2 subsampling for conformational ensemble predictions directly from Collab!
The notebook below is designed to:
- Building an MSA for your target protein
- Trying different subsampling conditions
- General analysis of the resulting ensemble with basic heuristics for measuring diversity and quality
- Exporting results
It uses Abl1 as an example, but should work with other systems as well, all you need is the sequence for the protein you're looking to make predictions for.
For monomers:
For multimers:
The approach we describe in the manuscript does not require any specialized software besides the ones generally used to make predictions with AlphaFold 2. Namely, it requires a Multiple Sequence Alignment (MSA) that can be generated through a variety of methods (such as MMSEQS2 [https://github.com/soedinglab/MMseqs2] or jackhmmr [https://github.com/EddyRivasLab/hmmer/tree/master]) and the AlphaFold2 https://github.com/lipan6461188/AlphaFold-StepByStep executable itself.
In our manuscript, we use jackhmmr to generate the MSAs, and run AlphaFold2 via the colabfold_batch wrapper https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/batch/AlphaFold2_batch.ipynb. The following content will pertain to that specific implementation.
RAM: at least 32 GB (16 GB with --db_preset=reduced_dbs). More RAM might be necessary if using very deep MSAs (>100k sequences).
CPU: A modern multicore Intel/AMD CPU.
GPU: Currently only tested in nVidia GPUs (e.g. A100). Should work with CPUs, but significantly slower.
Disk: 3 TB of disk space, ideally an SSD for faster MSA search (1 TB with reduced_dbs) if querying local databases. If using the Google Collab Notebook to build MSAs, the space requirements are much lower (about 10 GBs should be sufficient).
Installation instructions for jackhmmr and/or collabfold_batch can be found at the following repositories:
https://github.com/EddyRivasLab/hmmer/tree/master
https://github.com/lipan6461188/AlphaFold-StepByStep
We strongly recommend the use of our Google Collab notebook for generating MSAs. Instructions for its usage can be found on the notebook itself.
https://colab.research.google.com/drive/1BhOsy9UL41mE0UN5eYiMxwpFXkQAA8iE
- Run the Google Collab notebook with default parameters, exporting the MSA to Google Drive as instructed in the notebook itself.
- Run collabfold_batch or another implementation of AF2 using the MSA generated on 1 as an input, following the instructions on the Google Collab for setting the AF2 prediction parameters.
- Analyze predictions using a software of choice (PyMol, PyTraj, MDAnalysis, MDTraj, etc.).
- Run the Google Collab notebook with default parameters using the sequences shared in sequences, exporting the MSA to Google Drive as instructed in the notebook itself.
- Run collabfold_batch or another implementation of AF2 using the MSA generated on 1 as an input, following the command described in [here](https://github.com/GMdSilva/rel_state_pop_af2_raw_data/blob/main/scripts/colabfold_batch_command.sh Adjust input and output names as needed.