AdsMT is a novel multi-modal transformer to rapidly predict the global minimum adsorption energy (GMAE) of diverse catalyst/adsorbate combinations based on surface graphs and adsorbate feature vectors without any binding information.
- System requirements: This package requires a standard Linux computer with GPU (supports CUDA >= 11) and enough RAM (> 2 GB). The codes have been tested on NVIDIA RTX 3090, A6000 and A100 GPUs. If you want to run the code on a GPU that does not support CUDA>=11, you need to modify the versions of PyTorch and CUDA in the env.yml file.
- We'll use
conda
to install dependencies and set up the environment for a Nvidia GPU machine. We recommend using the Miniconda installer. - After installing
conda
, installmamba
to the base environment.mamba
is a faster, drop-in replacement forconda
:conda install mamba -n base -c conda-forge
- Then create a conda environment and install the dependencies:
Activate the conda environment with
mamba env create -f env.yml
conda activate adsmt
. It will take about 10 minutes to configure the environment for running code.
Dataset links: Zenodo and Figshare
We built three GMAE benchmark datasets named OCD-GMAE
, Alloy-GMAE
and FG-GMAE
from OC20-Dense, Catalysis Hub, and FG-dataset datasets through strict data cleaning, and each data point represents a unique combination of catalyst surface and adsorbate.
Dataset | Combination Num. | Surface Num. | Adsorbate Num. | Range of GMAE (eV) |
---|---|---|---|---|
Alloy-GMAE | 11,260 | 1,916 (37) | 12 (5) | -4.3 |
FG-GMAE | 3,308 | 14 (14) | 202 (5) | -4.0 |
OCD-GMAE | 973 | 967 (54) | 74 (4) | -8.0 |
Note: The values in brackets represent the numbers of element types.
We can run scripts/download_datasets.sh
to download all datasets:
bash scripts/download_datasets.sh
To train a AdsMT model with different graph encoder on a dataset by scripts/train.sh
and the following command:
bash scripts/train.sh [DATASET] [GRAPH_ENCODER]
This code repo includes 7 different graph encoders:
SchNet (schnet),
CGCNN (cgcnn),
DimeNet++ (dpp),
GemNet-OC (gemnet-oc),
TorchMD-NET (et),
eSCN (escn),
AdsGT (adsgt, this work). The log file including experiment results will be found in exp_results/[DATASET]/[GRAPH_ENCODER].log
. It will take 3-24 hours for one task, depending on the dataset and graph encoder.
We provide scripts for model pretraining on the OC20-LMAE dataset. For example, a AdsMT model with different graph encoders will be pretrained by running:
bash scripts/pretrain_base.sh [GRAPH_ENCODER]
The checkpoint file of pretrained model can be found at checkpoint_dir
in the log file.
To finetune a AdsMT model on a GMAE dataset, you need to change the ckpt_path
parameter in the model's configuration file (configs/[DATASET]/finetune/[GRAPH_ENCODER].yml
) to the checkpoint path of your pre-trained model, then run the following command:
bash scripts/finetune.sh [DATASET] [GRAPH_ENCODER]
The scripts/attn4sites.sh
is used to calculate the cross-attention scores of a trained AdsMT model on a GMAE dataset by running:
bash scripts/attn4sites.sh [CONFIG_PATH] [CHECKPOINT_PATH]
The output file will be stored at the results_dir
in the log file.
We provide a notebook visualize/vis_3D.ipynb
to visualize and compare cross-attention score-colored surfaces with DFT-optimized adsorption configurations under GMAE.
This work was supported as part of NCCR Catalysis (grant number 180544), a National Centre of Competence in Research funded by the Swiss National Science Foundation.
This code repo is based on several existing repositories:
If you find our work useful, please consider citing it:
If you have any question, welcome to contact me at:
Junwu Chen: [email protected]