AdsMT: Multimodal Transformer for Predicting Global Minimum Adsorption Energy

AdsMT is a novel multi-modal transformer to rapidly predict the global minimum adsorption energy (GMAE) of diverse catalyst/adsorbate combinations based on surface graphs and adsorbate feature vectors without any binding information.

🚀 Environment Setup

System requirements: This package requires a standard Linux computer with GPU (supports CUDA >= 11) and enough RAM (> 2 GB). The codes have been tested on NVIDIA RTX 3090, A6000 and A100 GPUs. If you want to run the code on a GPU that does not support CUDA>=11, you need to modify the versions of PyTorch and CUDA in the env.yml file.
We'll use conda to install dependencies and set up the environment for a Nvidia GPU machine. We recommend using the Miniconda installer.
After installing conda, install mamba to the base environment. mamba is a faster, drop-in replacement for conda:
```
conda install mamba -n base -c conda-forge
```
Then create a conda environment and install the dependencies:
```
mamba env create -f env.yml
```
Activate the conda environment with conda activate adsmt. It will take about 10 minutes to configure the environment for running code.

📌 Datasets

Dataset links: Zenodo and Figshare

We built three GMAE benchmark datasets named OCD-GMAE, Alloy-GMAE and FG-GMAE from OC20-Dense, Catalysis Hub, and FG-dataset datasets through strict data cleaning, and each data point represents a unique combination of catalyst surface and adsorbate.

Dataset	Combination Num.	Surface Num.	Adsorbate Num.	Range of GMAE (eV)
Alloy-GMAE	11,260	1,916 (37)	12 (5)	-4.3 $\sim$ 9.1
FG-GMAE	3,308	14 (14)	202 (5)	-4.0 $\sim$ 0.8
OCD-GMAE	973	967 (54)	74 (4)	-8.0 $\sim$ 6.4

Note: The values in brackets represent the numbers of element types.

We can run scripts/download_datasets.sh to download all datasets:

bash scripts/download_datasets.sh

🔥 Model Training

1. Training from scratch

To train a AdsMT model with different graph encoder on a dataset by scripts/train.sh and the following command:

bash scripts/train.sh [DATASET] [GRAPH_ENCODER]

This code repo includes 7 different graph encoders: SchNet (schnet), CGCNN (cgcnn), DimeNet++ (dpp), GemNet-OC (gemnet-oc), TorchMD-NET (et), eSCN (escn), AdsGT (adsgt, this work). The log file including experiment results will be found in exp_results/[DATASET]/[GRAPH_ENCODER].log. It will take 3-24 hours for one task, depending on the dataset and graph encoder.

2. Pretraining on the OC20-LMAE dataset

We provide scripts for model pretraining on the OC20-LMAE dataset. For example, a AdsMT model with different graph encoders will be pretrained by running:

bash scripts/pretrain_base.sh [GRAPH_ENCODER]

The checkpoint file of pretrained model can be found at checkpoint_dir in the log file.

3. Finetuning on the GMAE datasets

To finetune a AdsMT model on a GMAE dataset, you need to change the ckpt_path parameter in the model's configuration file (configs/[DATASET]/finetune/[GRAPH_ENCODER].yml) to the checkpoint path of your pre-trained model, then run the following command:

bash scripts/finetune.sh [DATASET] [GRAPH_ENCODER]

4. Cross-attention scores for adsorption site identification

The scripts/attn4sites.sh is used to calculate the cross-attention scores of a trained AdsMT model on a GMAE dataset by running:

bash scripts/attn4sites.sh [CONFIG_PATH] [CHECKPOINT_PATH]

The output file will be stored at the results_dir in the log file.

We provide a notebook visualize/vis_3D.ipynb to visualize and compare cross-attention score-colored surfaces with DFT-optimized adsorption configurations under GMAE.

🌈 Acknowledgements

This work was supported as part of NCCR Catalysis (grant number 180544), a National Centre of Competence in Research funded by the Swiss National Science Foundation.

This code repo is based on several existing repositories:

📝 Citation

If you find our work useful, please consider citing it:

📫 Contact

If you have any question, welcome to contact me at:

Junwu Chen: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
models		models
scripts		scripts
trainer		trainer
utils		utils
visualize		visualize
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
env.yml		env.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AdsMT: Multimodal Transformer for Predicting Global Minimum Adsorption Energy

🚀 Environment Setup

📌 Datasets

🔥 Model Training

1. Training from scratch

2. Pretraining on the OC20-LMAE dataset

3. Finetuning on the GMAE datasets

4. Cross-attention scores for adsorption site identification

🌈 Acknowledgements

📝 Citation

📫 Contact

About

Releases

Packages

Languages

License

schwallergroup/AdsMT

Folders and files

Latest commit

History

Repository files navigation

AdsMT: Multimodal Transformer for Predicting Global Minimum Adsorption Energy

🚀 Environment Setup

📌 Datasets

🔥 Model Training

1. Training from scratch

2. Pretraining on the OC20-LMAE dataset

3. Finetuning on the GMAE datasets

4. Cross-attention scores for adsorption site identification

🌈 Acknowledgements

📝 Citation

📫 Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages