Datasets

To train the model from scratch, you need to download the preprocessed file from this link and save it into data/single or data/multi.

We use the CrossDocked dataset and reaction-based slicing method in LibINVENT to construct single and multi R-groups datasets. If you want to process the dataset from scratch, you can follow the steps:

Download the dataset archive crossdocked_pocket10.tar.gz and the split file split_by_name.pt from this link.
Extract the TAR archive using the command:

tar -xzvf crossdocked_pocket10.tar.gz

Split raw PL-complexes and convert sdf files into SMILES format:

python split_and_convert.py

Use the reaction-based slicing method in LibINVENT to slice the molecules into scaffolds and R-groups in Lib-INVENT-dataset and replace example_configurations/supporting_files/filter_conditions.json in Lib-INVENT-dataset with filter_conditions.json in this directory. For single R-group dataset, set the value of parameter max_cuts in example_configurations/reaction_based_slicing.json to 1 while for multi R-groups dataset to 4.
Process and prepare datasets:

cd single
python -W ignore process_and_prepare.py

cd multi
python -W ignore process_and_prepare.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Datasets

Files

README.md

Latest commit

History

README.md

File metadata and controls

Datasets