Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

Name

Last commit message

Last commit date

Available files

The generative model was pretrained on combined dataset formed from a combination of ChemBL 33, GuacaMol v1, MOSES, and BindingDB (08-2023). The dataset was processed to exclude all SMILES strings containing more than 133 tokens and which contain tokens that occurr less than 1000 times in the dataset. The combined dataset contains 5 539 765 unique and valid SMILES strings, which are split into:

Training Partition (5 262 776 entries; 95%)
Validation Partition (276 989 entries; 5%)

If you wish to use our pretrained model, you can download the model weights and dataset descriptors (these are internal parameters required for generation). Note that if you use our Jupyter Notebook, it has a special cell which will download these files and put them into appropriate folders for you!

Pretrained model weights
Pretrained model descriptors

You can also download the PCA weights fitted on the combined dataset:

scaler,pca tuple stored as a pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

files

files

README.md

Available files

Files

files

Directory actions

More options

Directory actions

More options

Latest commit

History

files

Folders and files

parent directory

README.md

Available files