Reinforced Molecular Optimization with Neighborhood-Controlled Grammar

Abstract

A major challenge in the pharmaceutical industry is to design novel molecules with specific desired properties, especially when the property evaluation is costly. Here, we propose MNCE-RL, a graph convolutional policy network for molecular optimization with molecular neighborhood-controlled embedding grammars through reinforcement learning. We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation and design an efficient algorithm to infer grammatical production rules from given molecules. The use of grammars guarantees the validity of the generated molecular structures. By transforming molecular graphs to parse trees with the inferred grammars, the molecular structure generation task is modeled as a Markov decision process where a policy gradient strategy is utilized.

Requirements

Anaconda is recommended to run the project.

conda create -n MNCERL python=3.6 
source activate MNCERL

Install rdkit and Cython:

conda install -c conda-forge rdkit
conda install Cython

Install related packages:

pip install -r requirements.txt
cd MyLib
python setup.py install

Prepare data:

cd Data
ls *.tar.gz|while read line
do
tar -xzvf ${line}
done

Training and evaluations

You can run training and evaluations by:

python main.py -c PATH_TO_CONFIG

For example:

python main.py -c tasks.Optimize_logp_limited.config_seed1

Please refer to config_example.py for the format of the config file. In the "tasks" directory, we have provided the pretrained model, and the config.py and results for all the tasks presented in our paper.

Custom data

To train and evaluations with custom data, the molecules in SMILES format can be parsed by:

python mkdata.py -c PATH_TO_CONFIG

For example:

python mkdata.py -c tasks.Makedata_zinc.config

Please refer to tasks/Makedata_zinc/config.py for the format of the config file. Then the parsed custom data can be used to train models by specifying the "data_path" in the training config file.

Reference

Xu, C., Liu, Q., Huang, M., & Jiang, T. (2020). Reinforced Molecular Optimization with Neighborhood-Controlled Grammars. Advances in Neural Information Processing Systems, 33.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Data		Data
Eval		Eval
Image		Image
MyLib		MyLib
NCE		NCE
guacamol		guacamol
guacamol_utils		guacamol_utils
model		model
tasks		tasks
utils		utils
README.md		README.md
__init__.py		__init__.py
config_example.py		config_example.py
main.py		main.py
mkdata.py		mkdata.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforced Molecular Optimization with Neighborhood-Controlled Grammar

Abstract

Requirements

Training and evaluations

Custom data

Reference

About

Releases

Packages

Languages

Zoesgithub/MNCE-RL

Folders and files

Latest commit

History

Repository files navigation

Reinforced Molecular Optimization with Neighborhood-Controlled Grammar

Abstract

Requirements

Training and evaluations

Custom data

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages