-
Notifications
You must be signed in to change notification settings - Fork 66
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
29 changed files
with
4,317 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
# script | ||
*.sh | ||
# *.ipynb | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,154 @@ | ||
# GeoDiff | ||
# GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation | ||
|
||
[[OpenReview]](https://openreview.net/forum?id=PzcvxEMzvQC) | ||
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/MinkaiXu/GeoDiff/blob/main/LICENSE) | ||
|
||
[[OpenReview](https://openreview.net/forum?id=PzcvxEMzvQC)] [[arXiv](https://arxiv.org/abs/2203.02923)] [[Code](https://github.com/MinkaiXu/GeoDiff)] | ||
|
||
The official implementation of GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022 **Oral Presentation**) | ||
The official implementation of GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022 **Oral Presentation [54/3391]**). | ||
|
||
The code is coming soon. We have a primary version on [OpenReview](https://openreview.net/forum?id=PzcvxEMzvQC) as the supplymentary material, and the link is also copied [here](https://openreview.net/attachment?id=PzcvxEMzvQC&name=supplementary_material). | ||
![cover](assets/geodiff_framework.png) | ||
|
||
## Environments | ||
|
||
### Install via Conda (Recommended) | ||
|
||
```bash | ||
# Clone the environment | ||
conda env create -f env.yml | ||
# Activate the environment | ||
conda activate geodiff | ||
# Install PyG | ||
conda install pytorch-geometric=1.7.2=py37_torch_1.8.0_cu102 -c rusty1s -c conda-forge | ||
``` | ||
|
||
## Dataset | ||
|
||
### Offical Dataset | ||
The offical raw GEOM dataset is avaiable [[here]](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JNGTDF). | ||
|
||
### Preprocessed dataset | ||
We provide the preprocessed datasets (GEOM) in this [[google drive folder]](https://drive.google.com/drive/folders/1b0kNBtck9VNrLRZxg6mckyVUpJA5rBHh?usp=sharing). After downleading the dataset, it should be put into the folder path as specified in the `dataset` variable of config files `./configs/*.yml`. | ||
|
||
### Prepare your own GEOM dataset from scratch (optional) | ||
|
||
You can also download origianl GEOM full dataset and prepare your own data split. A guide is available at previous work ConfGF's [[github page]](https://github.com/DeepGraphLearning/ConfGF#prepare-your-own-geom-dataset-from-scratch-optional). | ||
|
||
## Training | ||
|
||
All hyper-parameters and training details are provided in config files (`./configs/*.yml`), and free feel to tune these parameters. | ||
|
||
You can train the model with the following commands: | ||
|
||
```bash | ||
# Default settings | ||
python train.py ./config/qm9_default.yml | ||
python train.py ./config/drugs_default.yml | ||
# An ablation setting with fewer timesteps, as described in Appendix D.2. | ||
python train.py ./config/drugs_1k_default.yml | ||
``` | ||
|
||
The model checkpoints, configuration yaml file as well as training log will be saved into a directory specified by `--logdir` in `train.py`. | ||
|
||
## Generation | ||
|
||
We provide the checkpoints of two trained models, i.e., `qm9_default` and `drugs_default` in the [[google drive folder]](https://drive.google.com/drive/folders/1b0kNBtck9VNrLRZxg6mckyVUpJA5rBHh?usp=sharing). Note that, please put the checkpoints `*.pt` into paths like `${log}/${model}/checkpoints/`, and also put corresponding configuration file `*.yml` into the upper level directory `${log}/${model}/`. | ||
|
||
You can generate conformations for entire or part of test sets by: | ||
|
||
```bash | ||
python test.py ${log}/${model}/checkpoints/${iter}.pt \ | ||
--start_idx 800 --end_idx 1000 | ||
``` | ||
Here `start_idx` and `end_idx` indicate the range of the test set that we want to use. All hyper-parameters related to sampling can be set in `test.py` files. Specifically, for testing qm9 model, you could add the additional arg `--w_global 0.3`, which empirically shows slightly better results. | ||
|
||
Conformations of some drug-like molecules generated by GeoDiff are provided below. | ||
|
||
<p align="center"> | ||
<img src="assets/exp_drugs.png" /> | ||
</p> | ||
|
||
## Evaluation | ||
|
||
After generating conformations following the obove commands, the results of all benchmark tasks can be calculated based on the generated data. | ||
|
||
### Task 1. Conformation Generation | ||
|
||
The `COV` and `MAT` scores on the GEOM datasets can be calculated using the following commands: | ||
|
||
```bash | ||
python eval_covmat.py ${log}/${model}/${sample}/sample_all.pkl | ||
``` | ||
|
||
|
||
### Task 2. Property Prediction | ||
|
||
For the property prediction, we use a small split of qm9 different from the `Conformation Generation` task. This split is also provided in the [[google drive folder]](https://drive.google.com/drive/folders/1b0kNBtck9VNrLRZxg6mckyVUpJA5rBHh?usp=sharing). Generating conformations and evaluate `mean absolute errors (MAR)` metric on this split can be done by the following commands: | ||
|
||
```bash | ||
python ${log}/${model}/checkpoints/${iter}.pt --num_confs 50 \ | ||
--start_idx 0 --test_set data/GEOM/QM9/qm9_property.pkl | ||
python eval_prop.py --generated ${log}/${model}/${sample}/sample_all.pkl | ||
``` | ||
|
||
## Visualizing molecules with PyMol | ||
|
||
Here we also provide a guideline for visualizing molecules with PyMol. The guideline is borrowed from previous work ConfGF's [[github page]](https://github.com/DeepGraphLearning/ConfGF#prepare-your-own-geom-dataset-from-scratch-optional). | ||
|
||
### Start Setup | ||
|
||
1. `pymol -R` | ||
2. `Display - Background - White` | ||
3. `Display - Color Space - CMYK` | ||
4. `Display - Quality - Maximal Quality` | ||
5. `Display Grid` | ||
1. by object: use `set grid_slot, int, mol_name` to put the molecule into the corresponding slot | ||
2. by state: align all conformations in a single slot | ||
3. by object-state: align all conformations and put them in separate slots. (`grid_slot` dont work!) | ||
6. `Setting - Line and Sticks - Ball and Stick on - Ball and Stick ratio: 1.5` | ||
7. `Setting - Line and Sticks - Stick radius: 0.2 - Stick Hydrogen Scale: 1.0` | ||
|
||
### Show Molecule | ||
|
||
1. To show molecules | ||
|
||
1. `hide everything` | ||
2. `show sticks` | ||
|
||
2. To align molecules: `align name1, name2` | ||
|
||
3. Convert RDKit mol to Pymol | ||
|
||
```python | ||
from rdkit.Chem import PyMol | ||
v= PyMol.MolViewer() | ||
rdmol = Chem.MolFromSmiles('C') | ||
v.ShowMol(rdmol, name='mol') | ||
v.SaveFile('mol.pkl') | ||
``` | ||
|
||
|
||
## Citation | ||
Please consider citing the our paper if you find it helpful. Thank you! | ||
``` | ||
@inproceedings{ | ||
xu2022geodiff, | ||
title={GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation}, | ||
author={Minkai Xu and Lantao Yu and Yang Song and Chence Shi and Stefano Ermon and Jian Tang}, | ||
booktitle={International Conference on Learning Representations}, | ||
year={2022}, | ||
url={https://openreview.net/forum?id=PzcvxEMzvQC} | ||
} | ||
``` | ||
|
||
## Acknowledgement | ||
|
||
This repo is built upon the previous work ConfGF's [[codebase]](https://github.com/DeepGraphLearning/ConfGF#prepare-your-own-geom-dataset-from-scratch-optional). Thanks Chence and Shitong! | ||
|
||
## Contact | ||
|
||
If you have any question, please contact me at [email protected] or [email protected]. | ||
|
||
## Known issues | ||
|
||
1. The current codebase is not compatible with more recent torch-geometric versions. | ||
2. The current processed dataset (with PyD data object) is not compatible with more recent torch-geometric versions. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
model: | ||
type: diffusion # dsm and diffusion | ||
network: dualenc | ||
hidden_dim: 128 | ||
num_convs: 6 | ||
num_convs_local: 4 | ||
cutoff: 10.0 | ||
mlp_act: relu | ||
beta_schedule: sigmoid | ||
beta_start: 1.e-7 | ||
beta_end: 9.e-3 | ||
num_diffusion_timesteps: 1000 | ||
edge_order: 3 | ||
edge_encoder: mlp | ||
smooth_conv: true | ||
|
||
train: | ||
seed: 2021 | ||
batch_size: 32 | ||
val_freq: 5000 | ||
max_iters: 10000000 | ||
max_grad_norm: 30000.0 # Different from QM9 | ||
anneal_power: 2.0 | ||
optimizer: | ||
type: adam | ||
lr: 1.e-3 | ||
weight_decay: 0. | ||
beta1: 0.95 | ||
beta2: 0.999 | ||
scheduler: | ||
type: plateau | ||
factor: 0.6 | ||
patience: 10 | ||
|
||
dataset: | ||
train: ./data/GEOM/Drugs/train_data_40k.pkl | ||
val: ./data/GEOM/Drugs/val_data_5k.pkl | ||
test: ./data/GEOM/Drugs/test_data_1k.pkl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
model: | ||
type: diffusion # dsm and diffusion | ||
network: dualenc | ||
hidden_dim: 128 | ||
num_convs: 6 | ||
num_convs_local: 4 | ||
cutoff: 10.0 | ||
mlp_act: relu | ||
beta_schedule: sigmoid | ||
beta_start: 1.e-7 | ||
beta_end: 2.e-3 | ||
num_diffusion_timesteps: 5000 | ||
edge_order: 3 | ||
edge_encoder: mlp | ||
smooth_conv: true | ||
|
||
train: | ||
seed: 2021 | ||
batch_size: 32 | ||
val_freq: 5000 | ||
max_iters: 10000000 | ||
max_grad_norm: 30000.0 # Different from QM9 | ||
anneal_power: 2.0 | ||
optimizer: | ||
type: adam | ||
lr: 1.e-3 | ||
weight_decay: 0. | ||
beta1: 0.95 | ||
beta2: 0.999 | ||
scheduler: | ||
type: plateau | ||
factor: 0.6 | ||
patience: 10 | ||
|
||
dataset: | ||
train: ./data/GEOM/Drugs/train_data_40k.pkl | ||
val: ./data/GEOM/Drugs/val_data_5k.pkl | ||
test: ./data/GEOM/Drugs/test_data_1k.pkl |
Oops, something went wrong.