MolGPT

In this work, we train small custom GPT on Moses and Guacamol dataset with next token prediction task. The model is then used for unconditional and conditional molecular generation. We compare our model with previous approaches on the Moses and Guacamol datasets. Saliency maps are obtained for interpretability using Ecco library.

The processed Guacamol and MOSES datasets in csv format can be downloaded from this link:

https://drive.google.com/drive/folders/1LrtGru7Srj_62WMR4Zcfs7xJ3GZr9N4E?usp=sharing

Original Guacamol dataset can be found here:

https://github.com/BenevolentAI/guacamol

Original Moses dataset can be found here:

https://github.com/molecularsets/moses

All trained weights can be found here:

https://www.kaggle.com/virajbagal/ligflow-final-weights

To train the model, make sure you have the datasets' csv file in the same directory as the code files.

Training

./train_moses.sh

./train_guacamol.sh

Generation

./generate_guacamol_prop.sh

./generate_moses_prop_scaf.sh

If you find this work useful, please cite:

Bagal, Viraj; Aggarwal, Rishal; Vinod, P. K.; Priyakumar, U. Deva (2021): MolGPT: Molecular Generation using a Transformer-Decoder Model. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.14561901.v1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MolGPT

Training

Generation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
evaluate		evaluate
generate		generate
train		train
LICENSE		LICENSE
README.md		README.md
generate_guacamol_prop.sh		generate_guacamol_prop.sh
generate_moses_prop_scaf.sh		generate_moses_prop_scaf.sh
train_guacamol.sh		train_guacamol.sh
train_moses.sh		train_moses.sh

License

orgw/molgpt

Folders and files

Latest commit

History

Repository files navigation

MolGPT

Training

Generation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages