GitHub - valence-labs/mood-experiments: Molecular Out-Of-Distribution

Molecular Out-Of-Distribution Generalization

Close the testing-deployment gap in molecular scoring.

Python repository with all the code that was used for the MOOD paper.

Setup

We recommend you to use mamba (learn more).

mamba env create -n mood -f env.yml 
conda activate mood
pip install -e .

Overview

The repository is set-up to make the results easy to reproduce. If you get stuck or like to learn more, please feel free to open an issue.

CLI

After installation, the MOOD CLI can be used to reproduce the results.

mood --help

Data

All data has been made available in a public GCP bucket. See gs:https://mood-data-public.

Code

mood/: This is the main part of the codebase. It contains Python implementations of several reusable components and defines the CLI.
notebooks/: Notebooks were used to visualize and otherwise explore the results. All plots in the paper can be reproduced through these notebooks.
scripts/: Generally more messy pieces of code that were used to generate (intermediate) results.

Use the MOOD splitting protocol

One of the main results of the MOOD paper, is the MOOD protocol. This protocol helps to close the testing-deployment gap in molecular scoring by finding the most representative splitting method. To make it easy for others to experiment with this protocol, we made an effort to make it easy to use.

import datamol as dm
import numpy as np
from sklearn.model_selection import ShuffleSplit
from mood.splitter import PerimeterSplit, MaxDissimilaritySplit, PredefinedGroupShuffleSplit, MOODSplitter

# Load your data
data = dm.data.freesolv()
smiles = data["smiles"].values
X = np.stack([dm.to_fp(dm.to_mol(smi)) for smi in smiles])
y = data["expt"].values

# Load your deployment data
X_deployment = np.random.random((100, 2048)).round()

# Set-up your candidate splitting methods
scaffolds = [dm.to_smiles(dm.to_scaffold_murcko(dm.to_mol(smi))) for smi in smiles]
candidate_splitters = {
    "Random": ShuffleSplit(n_splits=5),  # MOOD is Scikit-learn compatible!
    "Scaffold": PredefinedGroupShuffleSplit(groups=scaffolds, n_splits=5),
    "Perimeter": PerimeterSplit(n_splits=5),
    "Maximum Dissimilarity": MaxDissimilaritySplit(n_splits=5),
}

# Set-up the MOOD splitter
mood_splitter = MOODSplitter(candidate_splitters, metric="jaccard")
mood_splitter.fit(X, y, X_deployment=X_deployment)

for train, test in mood_splitter.split(X, y):
    # Work your magic!
    ...

How to cite

Please cite MOOD if you use it in your research:

Real-World Molecular Out-Of-Distribution: Specification and Investigation
Prudencio Tossou, Cas Wognum, Michael Craig, Hadrien Mary, and Emmanuel Noutahi
Journal of Chemical Information and Modeling 2024 64 (3), 697-711
DOI: 10.1021/acs.jcim.3c01774

@article{doi:10.1021/acs.jcim.3c01774,
  author = {Tossou, Prudencio and Wognum, Cas and Craig, Michael and Mary, Hadrien and Noutahi, Emmanuel},
  title = {Real-World Molecular Out-Of-Distribution: Specification and Investigation},
  journal = {Journal of Chemical Information and Modeling},
  volume = {64},
  number = {3},
  pages = {697-711},
  year = {2024},
  doi = {10.1021/acs.jcim.3c01774},
  note = {PMID: 38300258},
  URL = {https://doi.org/10.1021/acs.jcim.3c01774},
  eprint = {https://doi.org/10.1021/acs.jcim.3c01774}
}

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
docs/images		docs/images
mood		mood
notebooks		notebooks
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
env.yml		env.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Molecular Out-Of-Distribution Generalization

Setup

Overview

CLI

Data

Code

Use the MOOD splitting protocol

How to cite

About

Releases

Packages

Contributors 2

Languages

valence-labs/mood-experiments

Folders and files

Latest commit

History

Repository files navigation

Molecular Out-Of-Distribution Generalization

Setup

Overview

CLI

Data

Code

Use the MOOD splitting protocol

How to cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages