Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
README_jp.md		README_jp.md
config_gen.multi.json		config_gen.multi.json
config_gen.multi_onlylink.json		config_gen.multi_onlylink.json
config_gen.single.json		config_gen.single.json
config_gen.single_onlylink.json		config_gen.single_onlylink.json
config_vae.multi.json		config_vae.multi.json
config_vae.multi_onlylink.json		config_vae.multi_onlylink.json
config_vae.single.json		config_vae.single.json
config_vae.single_onlylink.json		config_vae.single_onlylink.json
conv_graph.py		conv_graph.py
get_dataset.sh		get_dataset.sh
init.sh		init.sh
init_large.sh		init_large.sh
init_small.sh		init_small.sh
preprocessing.py		preprocessing.py
resampling_smiles.py		resampling_smiles.py
run.multi.sh		run.multi.sh
run.single.sh		run.single.sh
run_gen.multi.sh		run_gen.multi.sh
run_gen.single.sh		run_gen.single.sh
run_test.sh		run_test.sh
test_recons.py		test_recons.py

README.md

sample_chem/generative_model

This repository is a example of a graph generative model

Dataset: ZINC

Building data

Downloading data

Move to GraphCNN/sample_chem/generative_model/ and execute the following command

sh . /get_dataset.sh

The following data will be generated

ZINC/6_p0.smi

If not possible, download from https://drive.google.com/drive/folders/15wRLBPHnu6A8emMRM_gU5EQNAQeKk7q8?usp=sharing

Resampling of data

Similarly, run the following command in the directory GraphCNN/sample_chem/generative_model/.

sh . /init.sh

This script performs the following operations

reduce the number of data sets by resampling a portion of the data set, since using all the data would be too much
create a dataset file (jbl) using preprocessing.py
- graphs without bond information
- multi graphs with bond information

The following two files are generated

dataset.single.jbl
dataset.multi.jbl

By default, init.sh resamples 10000 data. If you want to use more data, simply change the part of init.sh that says 10000.

Learning

To start learning, execute the following commands

In the case of single (without bond information)

sh ./run.single.sh

In the case of multi (with bond information)

sh ./run.multi.sh

The detailed settings of the study are described in the following configuration files, respectively.

config_vae.single.json
config_vae.multi.json

Reconstruction

The following two files are generated when training is performed with the above default settings. This is the dataset file with the reconstructed training and validation data.

recons.train.jbl
recons.valid.jbl

For the external dataset,

kgcn-gen recons --config <config file>

will also generate a reconstructed dataset file

Reconstructed dataset files

The reconstructed dataset file can be read, for example, as follows.

import joblib

o=joblib.load("recons.valid.jbl")
print(o.keys())

The loaded object is a dictionary with two keys.

feature' feature matrix for each atom in the molecule
- number of data x maximum number of atoms in molecule (70) x number of atomic features (75)
'dense_adj' The bond probability matrix of the molecule
- Number of data x type of bond x maximum number of atoms in molecule (70) x maximum number of atoms in molecule (70) Note that the bond type is always 1 in single mode is and 5 in multi mode.

An example of a program to reconstruct a molecule from bond types and atomic features is conv_graph.py

For an example of a program that reconstructs molecules from bond types and atomic features, please refer to conv_graph.py (see below).

The following 5 dimensions are available for bond types

Single, Double, Triple, Aromatic, Other

Atomic features have the following 75 dimensions

44 dimensions for atom types: 'C','N','O','S','F','Si','P','Cl','Br','Mg','Na','Ca','Fe','As','Al','I','B','V','K','Tl','Yb','Sb','Sn ','Ag','Pd','Co','Se','Ti','Zn','H', # H? 'Li','Ge','Cu','Au','Ni','Cd','In','Mn','Zr','Cr','Pt','Hg','Pb','Unknown'
11 dimensions for GetDegree()
7 dimensions for GetImplicitValence()
1 dimension for GetFormalCharge()
1 dimension for GetNumRadicalElectrons()
5 dimensions for GetHybridization(): SP, SP2, SP3, SP3D, SP3D2
1 dimension for GetIsAromatic()
5 dimensions for GetTotalNumHs()

Visualization of molecules from a reconstructed dataset file

Specifying the following in conv_graph.py will generate image files visualizing 10 molecules under the directory specified in output_dir.

python conv_graph.py recons.valid.jbl --num 10 --output_dir images/

In multi mode, use the following command

python conv_graph.py recons.valid.jbl --num 10 --output_dir images/ --multi

If you specify --threshold 0.9 as an option in conv_graph.py, only the bonds with probability greater than or equal to 0.9 will be kept and the numerator will be created.

Generation

To generate a molecule from scratch instead of rebuilding, execute the following commands

single mode

sh run_gen.single.sh

multi mode

sh run_gen.multi.sh

Now, we are passing the same dataset as in training, but instead of actually using it in the program, we generate a new dataset, generate gen.single.test.jbl or gen.multi.test.jbl. The format is the same as the reconstructed data set file and can also be visualized as molecules using conv_graph.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generative_model

generative_model

README.md

sample_chem/generative_model

Building data

Downloading data

Resampling of data

Learning

Reconstruction

Reconstructed dataset files

Visualization of molecules from a reconstructed dataset file

Generation

Files

generative_model

Directory actions

More options

Directory actions

More options

Latest commit

History

generative_model

Folders and files

parent directory

README.md

sample_chem/generative_model

Building data

Downloading data

Resampling of data

Learning

Reconstruction

Reconstructed dataset files

Visualization of molecules from a reconstructed dataset file

Generation