ParallelFold

Author: Bozitao Zhong 📮: [email protected]

🚉 We are adding new functions to ParallelFold, you can see our Roadmap.

📑 Please cite our Arxiv paper if you used ParallelFold (ParaFold) in you research.

Overview

This project is a modified version of DeepMind's AlphaFold2 to achieve high-throughput protein structure prediction.

We have these following modifications to the original AlphaFold pipeline:

Divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold now supports AlphaFold 2.1.1

How to install

We recommend to install AlphaFold locally, and not using docker.

Setting up conda environment

Step 1: Create a conda environment for ParallelFold/AlphaFold

# suppose you have miniconda environment on your cluster, or you can install another miniconda or anaconda
module load miniconda3
source activate base

# Create a miniconda environment for ParallelFold/AlphaFold
conda create -n alphafold python=3.8
conda activate alphafold

We recommend you to use python 3.8, python version < 3.7 may have missing packages.

Step 2: Install cudatoolkit 10.1 and cudnn:

conda install cudatoolkit=10.1 cudnn

Why use cudatoolkit 10.1:

cudatoolkit supports TensorFlow 2.3.0, while sometimes TensorFlow can't find GPU when using cudatoolkit 10.2

cudnn version 7.6.5
For higher version of CUDA driver, you can install cudatoolkit 11.2 and TensorFlow 2.5.0 instead

Step 3: Install tensorflow 2.3.0 by pip

pip install tensorflow==2.3.0

Step 4: Install other packages with pip and conda

# Using conda
conda install -c conda-forge openmm=7.5.1 pdbfixer=1.7
conda install -c bioconda hmmer=3.3.2 hhsuite=3.3.0 kalign2=2.04
conda install pandas=1.3.4

# Using pip
pip install biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0
pip install --upgrade jax jaxlib==0.1.69+cuda101 -f https://storage.googleapis.com/jax-releases/jax_releases.html

jax installation reference: https://github.com/google/jax

For CUDA 11.1, 11.2, or 11.3, use cuda111.

For CUDA 11.0, use cuda110.

For CUDA 10.2, use cuda102.

For CUDA 10.1, use cuda101.

In newer version of JAX (after 0.1.70), it will not support CUDA 10.1 and lower version. So here we downgrade jaxlib to 0.1.69.

Here you should used cuda 10.1 when you use cuda toolkit 10.1

Clone This Repo

git clone https://github.com/Zuricho/ParallelFold.git
alphafold_path="/path/to/alphafold/git/repo"

give the executive permission for sh files:

chmod +x run_alphafold.sh

Final Steps

[Not Necessary] Download chemical properties to the common folder

You need to check if you have the stereo_chemical_props.txt file in alphafold/alphafold/common/ folder, if you don't have it, you need to download this file:

wget -q -P alphafold/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt

Apply OpenMM patch

# This is you path to your alphafold folder
alphafold_path="/path/to/alphafold/git/repo"
cd ~/.conda/envs/alphafold/lib/python3.8/site-packages/
patch -p0 < $alphafold_path/docker/openmm.patch

Local cuda

Based on our test, you need to use local cuda if you install cudatoolkit=10.1, you can skip this step if you are using cudatoolkit 11

Their might be some available modules: cuda/10.1.243-gcc-8.3.0, cuda/10.2.89-gcc-8.3.0

References

Official version from DeepMind with docker.
None docker versions install AlphaFold without docker.
My none docker guide adjusted to different cuda versions (cuda driver >= 10.1)

Some detail information of modified files

4 files:

run_alphafold.py: modified version of original run_alphafold.py, it has multiple additional functions like skipping featuring steps when exists feature.pkl in output folder
run_alphafold.sh: bash script to run run_alphafold.py
run_figure: this file can help you make figure for your system

How to run

First, you need CPUs to run get features:

./run_alphafold.sh -d data -o output -p monomer_ptm -i input/test.fasta -t 2021-07-27 -m model_1 -f

-f means only run the featurization step, result in a feature.pkl file, and skip the following steps.

8 CPUs is enough, according to my test, more CPUs won't help with speed

Featuring step will output the feature.pkl and MSA folder in your output folder: ./output/FASTA_NAME/

PS: Here we put input files in an input folder to organize files in a better way.

Second, you can run run_alphafold.sh using GPU:

./run_alphafold.sh -d data -o output -m model_1,model_2,model_3,model_4,model_5 -i input/test.fasta -t 2021-07-27

If you have successfully output feature.pkl, you can have a very fast featuring step

Finally, you can run run_figure.py to visualize your result: [This will be available soon]

python run_figure.py [SystemName]

This python file will create a figure folder in your output folder.

Notice: run_figure.py need a local conda environment with matplotlib, pymol and numpy.

Functions

You can using some flags to change prediction model for ParallelFold:

-x: Skip AMBER refinement

-b: Using benchmark mode - running JAX model for twice, and the second run can used for evaluate running time

-r: Change the number of cycles in recycling

Some more functions are under development.

What is this for

ParallelFold can help you accelerate AlphaFold when you want to predict multiple sequences. After dividing the CPU part and GPU part, users can finish feature step by multiple processors.

Using ParallelFold, you can run AlphaFold 2~3 times faster than DeepMind's procedure.

If you have any question, please send your problem in issues

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
alphafold		alphafold
batch_scripts		batch_scripts
docker		docker
figure		figure
input		input
scripts		scripts
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run_alphafold.py		run_alphafold.py
run_alphafold.sh		run_alphafold.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParallelFold

Overview

How to install

Setting up conda environment

Clone This Repo

Final Steps

References

Some detail information of modified files

How to run

Functions

What is this for

About

Releases

Packages

Languages

License

zephyr221/ParallelFold

Folders and files

Latest commit

History

Repository files navigation

ParallelFold

Overview

How to install

Setting up conda environment

Clone This Repo

Final Steps

References

Some detail information of modified files

How to run

Functions

What is this for

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages