Case studies for the DeepMol publication

Installation

Clone the repository and move into the directory:

git clone https://github.com/BioSystemsUM/deepmol_case_studies.git
cd deepmol_case_studies

Create a conda environment and activate it:

conda create -n deepmol_case_studies python=3.10
conda activate deepmol_case_studies

Install the dependencies:

pip install -r requirements.txt
pip install --no-deps deepmol[all]==1.1.1

Install the package:

pip install .

AutoML experiments

AutoML experiments can be found in here.

We used podman/docker for the experiments, the Dockerfile can be found in this repository.

The "run" file can be found in here.

Evaluate TDC commons benchmark datasets

To train and evaluate DeepMol models:

from dcs.evaluation import get_results

results = get_results(tdc_dataset_name="Bioavailability_Ma", pipeline="bioavailability_optimal")

The tdc_dataset_name parameter is used to download the TDC commons benchmark datasets. Available datasets:

"AMES"
"BBB_Martins"
"Bioavailability_Ma"
"Caco2_Wang"
"Clearance_Hepatocyte_AZ"
"Clearance_Microsome_AZ"
"HIA_Hou"
"Pgp_Broccatelli"
"Solubility_AqSolDB"
"Lipophilicity_AstraZeneca"
"VDss_Lombardo"
"CYP2C9_Veith"
"CYP2D6_Veith"
"CYP3A4_Veith"
"CYP2C9_Substrate_CarbonMangels"
"CYP2D6_Substrate_CarbonMangels"
"CYP3A4_Substrate_CarbonMangels"
"DILI"
"Half_Life_Obach"
"hERG"
"LD50_Zhu"
"PPBR_AZ"

While the pipeline parameter is for internal pipeline loading. The pipelines are listed according to the paper, where there are the pipelines created based on the first AutoML experiment and the ones that were further optimized (optimal). Available pipelines:

"ames"
"bbb"
"bioavailability"
"bioavailability_optimal"
"caco"
"clearance_hepatocyte"
"clearance_microsome"
"hia"
"pgp"
"solubility"
"lipophilicity"
"lipophilicity_optimal"
"vdss"
"cyp2c9"
"cyp2d6"
"cyp3a4"
"cyp2c9_substrate"
"cyp2d6_substrate"
"cyp3a4_substrate"
"dili"
"half_life"
"herg"
"hia"
"ld50"
"ppbr"

If intended, the default pipelines (all but the optimal) for each dataset can be called as follows:

from dcs.evaluation import get_results

results = get_results(tdc_dataset_name="Bioavailability_Ma")

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
scripts		scripts
src/dcs		src/dcs
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Case studies for the DeepMol publication

Installation

AutoML experiments

Evaluate TDC commons benchmark datasets

About

Releases

Packages

Contributors 2

Languages

License

BioSystemsUM/deepmol_case_studies

Folders and files

Latest commit

History

Repository files navigation

Case studies for the DeepMol publication

Installation

AutoML experiments

Evaluate TDC commons benchmark datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages