Skip to content

BioSystemsUM/deepmol_case_studies

Repository files navigation

Case studies for the DeepMol publication

Installation

  1. Clone the repository and move into the directory:
git clone https://github.com/BioSystemsUM/deepmol_case_studies.git
cd deepmol_case_studies
  1. Create a conda environment and activate it:
conda create -n deepmol_case_studies python=3.10
conda activate deepmol_case_studies
  1. Install the dependencies:
pip install -r requirements.txt
pip install --no-deps deepmol[all]==1.1.1
  1. Install the package:
pip install .

AutoML experiments

AutoML experiments can be found in here.

We used podman/docker for the experiments, the Dockerfile can be found in this repository.

The "run" file can be found in here.

Evaluate TDC commons benchmark datasets

To train and evaluate DeepMol models:

from dcs.evaluation import get_results

results = get_results(tdc_dataset_name="Bioavailability_Ma", pipeline="bioavailability_optimal")

The tdc_dataset_name parameter is used to download the TDC commons benchmark datasets. Available datasets:

  • "AMES"
  • "BBB_Martins"
  • "Bioavailability_Ma"
  • "Caco2_Wang"
  • "Clearance_Hepatocyte_AZ"
  • "Clearance_Microsome_AZ"
  • "HIA_Hou"
  • "Pgp_Broccatelli"
  • "Solubility_AqSolDB"
  • "Lipophilicity_AstraZeneca"
  • "VDss_Lombardo"
  • "CYP2C9_Veith"
  • "CYP2D6_Veith"
  • "CYP3A4_Veith"
  • "CYP2C9_Substrate_CarbonMangels"
  • "CYP2D6_Substrate_CarbonMangels"
  • "CYP3A4_Substrate_CarbonMangels"
  • "DILI"
  • "Half_Life_Obach"
  • "hERG"
  • "LD50_Zhu"
  • "PPBR_AZ"

While the pipeline parameter is for internal pipeline loading. The pipelines are listed according to the paper, where there are the pipelines created based on the first AutoML experiment and the ones that were further optimized (optimal). Available pipelines:

  • "ames"
  • "bbb"
  • "bioavailability"
  • "bioavailability_optimal"
  • "caco"
  • "clearance_hepatocyte"
  • "clearance_microsome"
  • "hia"
  • "pgp"
  • "solubility"
  • "lipophilicity"
  • "lipophilicity_optimal"
  • "vdss"
  • "cyp2c9"
  • "cyp2d6"
  • "cyp3a4"
  • "cyp2c9_substrate"
  • "cyp2d6_substrate"
  • "cyp3a4_substrate"
  • "dili"
  • "half_life"
  • "herg"
  • "hia"
  • "ld50"
  • "ppbr"

If intended, the default pipelines (all but the optimal) for each dataset can be called as follows:

from dcs.evaluation import get_results

results = get_results(tdc_dataset_name="Bioavailability_Ma")

About

Case Studies from the DeepMol Publication

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published