28 Jun 22:46

Notable changes

The mfs argument of MoleculeDatapoint was removed in #876. This argument accepted functions which generated molecular features to use as extra datapoint descriptors. When using chemprop in a notebook, users should first manually generate their molecule features and pass them into the datapoints using x_d which stands for (extra) datapoint descriptors. This is demonstrated in the extra_features_descriptors.ipynb notebook under examples. CLI users will see no change as the CLI will still automatically calculate molecule features using user specified featurizers. The --features-generators flag has been deprecated though in favor of the more descriptive --molecule-featurizers. Available molecule features can be found in the help text generated by chemprop train -h.

The default aggregation was changed to norm in #946. This was meant to be change in version 2.0.0, but got missed. Norm aggregation was used in all the benchmarking of version 1 as it performs better than mean aggregation when predicting properties that are extensive in the number of atoms.

More documentation for the CLI hpopt and fingerprint commands have been added and can be viewed here and here.

The individual predictions of an ensemble of models are now automatically averaged and the individual predictions are saved in a separate file. #919

What's Changed

Change the installed numpy version in pyproject by @shihchengli in #922
Explicitly double save scalers/criterion by @KnathanM in #898
Add --show-individual-scores CLI flag by @shihchengli in #920
Set Ray Train's trainer resources to 0 by @hwpang in #928
Save individual and average predictions into different files by @shihchengli in #919
Add CLI pages for hpopt and fingerprint by @jonwzheng in #914
Make fingerprint CLI consistent with predict CLI by @hwpang in #927
Fix issue related to target column for fingerprint by @hwpang in #939
build molecule featurizer in parsing by @KnathanM in #875
Remove featurizing from datapoint by @KnathanM in #876
change aggregation default to norm by @KnathanM in #946
Use mol.GetBonds() instead of for loop by @KnathanM in #931

Full Changelog: v2.0.2...v2.0.3

Contributors

jonwzheng, hwpang, and 2 other contributors

Assets 2

14 Jun 22:51

shihchengli

v2.0.2

f64881f

v2.0.2 Adding Document Modules and hpopt Enhancement

In this release, we have included numerous notebooks to document modules. Chemprop may be used in python scripts, allowing for greater flexibility and control than the CLI. We recommend first looking through some of the worked examples to get an overview of the workflow. Then further details about the creation, customization, and use of Chemprop modules can be found in the module tutorials.

New CLI Features

Improved `--model-path` CLI

Previously --model-path could take either a single model file or a directory containing model files. Now it can take any combination of checkpoint files (.ckpt), model files (.pt), and directory containing model files. Directories are recursively searched for model file (.pt). Chemprop will use all models given and found to make predictions (#731).

Improvements for hpopt CLI

Some flags related to Ray Tune (i.e., --raytune-temp-dir, --raytune-num-cpus, --raytune-num-gpus, and --raytune-max-concurrent-trials) have been added. You can use the CLI to initiate your Ray instance using these flags. (#918)

Bug fix

An incorrect max learning rate was used when writing the config file after hyperparameter optimization. This is now fixed (#913).

What's Changed

Fix typos in docstrings and .rst files that led to rendering errors by @jonwzheng in #901
Add CLI transition guide link to RTD by @kevingreenman in #907
Add meaningful warning for warm up epoch search space by @hwpang in #909
Fixing small bug in hpopt for learning rate by @akshatzalte in #913
Add notebooks to document modules by @KnathanM in #834
V2: consolidate --checkpoint CLI by @hwpang in #731
Improvements for hpopt cli by @hwpang in #918

Full Changelog: v2.0.1...v2.0.2

Contributors

jonwzheng, kevingreenman, and 3 other contributors

Assets 2

06 Jun 17:14

shihchengli

v2.0.1

33df735

v2.0.1 First Patch

New CLI Features

Caching in CLI

MolGraphs are now created (by featurizing molecules) and cached at the beginning of training by default in the CLI. If you wish to disable caching, you can use the --no-cache flag, which will featurize molecules on the fly instead. (#903)

Change the default trial scheduler in HPO

We changed the default trial scheduler for HPO from AsyncHyperBand to FIFO, as it is the default in Ray and was used in version 1. You can switch the trial scheduler back to AsyncHyperBand by using --raytune-trial-scheduler AsyncHyperBand if needed. (#896)

Support Optuna in HPO

You can use optuna as a HPO search algorithm via --raytune-search-algorithm optuna. (#888)

CLI Bug Fixes

HPO-related bugs

In #873, we changed the search space for the initial and final learning rate ratios and max_lr to avoid very small (~10^-10) learning rates and also ensured that some hyperparameters are saved as integers instead of floating-point numbers (e.g., batch_size). In #881, we addressed the bug concerning the incompatibility of the saved config file with the training config. In #836, we shut down Ray processes after HPO completion to avoid zombie processes. For those encountering issues with Ray processes, we suggest you start Ray outside of the Python process.

DDP-related bugs

In #884, we resolved the issue where metrics were not synchronized across processes and disabled the distributed sampler during testing in DDP.

Backwards incompatibility note

In #883, we fixed the bug related to unused parameters in DDP. Models created via the CLI in v2.0.0 without additional atomic descriptors cannot be used via the CLI in v2.0.1. You will need to first remove message_passing.W_d.weight and message_passing.W_d.bias from the model file's state_dict to make it compatible with the current version.

What's Changed

update v2 installation instructions page in docs by @kevingreenman in #831
Remove Ray zombie processes by @shihchengli in #836
Docker images for v2 by @JacksonBurns in #841
Change Docker sytnax for MyBinder compatibility by @JacksonBurns in #872
[V2] Fix featurizer cli by @hwpang in #865
Fix hyperparameter predictorbase by @c-w-feldmann in #832
V2: Add all notebooks to test by @hwpang in #840
Fix small bugs in hpopt by @akshatzalte in #873
Add pip setup step to environment.yml install instructions by @cjmcgill in #889
Avoid scrambling target column name order by @JacksonBurns in #893
Fix unused parameters issue in DDP by @shihchengli in #883
Fix the inference issue related to the target columns by @shihchengli in #895
Change the default trial scheduler to FIFOScheduler by @shihchengli in #896
Add Optuna support for HPO by @shihchengli in #888
Fix Circular Import with isort by @JacksonBurns in #887
make LookupAction work with ConfigArgParse by @KnathanM in #900
V2: Fix typo in hpopt installation instruction by @hwpang in #897
V2: Make hpopt config compatible with training config by @hwpang in #881
Fix DDP prediction and checkpoint Issues by @shihchengli in #884
Add simple cache to CLI by @KnathanM in #903
V2: Fix small hpopt bugs and add example notebook by @hwpang in #842

New Contributors

@akshatzalte made his first contribution in #873

Full Changelog: v2.0.0...v2.0.1

Contributors

JacksonBurns, kevingreenman, and 6 other contributors

Assets 2

23 Apr 17:41

kevingreenman

v2.0.0

73d8948

v2.0.0 Stable Release

This is the first stable release of Chemprop v2.0.0, with updates since the v2.0.0-rc.1 release candidate in early March.

v2 documentation can be found here.

There are v2 tutorial notebooks in the examples/ directory.

A helpful transition guide from Chemprop v1 to v2 can be found here. This includes a side-by-side comparison of CLI argument options, a list of which arguments will be implemented in later versions of v2, and a list of changes to default hyperparameters.

Note that if you install from source, the primary branch of our repository has been renamed from master to main.

Due to development team bandwidth, Chemprop v1 will no longer be actively developed, so that we can focus our efforts on v2. Bug reports and questions about v1 are still welcome to benefit users who haven't yet made the switch to v2, but bug reports will not be fixed by the development team.

Please let us know of any bugs you find, questions you have, or enhancements you want in Chemprop v2 by opening an issue.

Assets 2

12 Apr 17:47

kevingreenman

v1.7.1

02c7602

Final Patch for Version 1

This is the final release of chemprop v1. All future development will be done on chemprop v2. The development team is still happy to answer questions about v1, but no new feature requests or PRs for v1 will be accepted. Users who identify bugs in v1 are still encouraged to open issues to report them - they will be tagged as v1-wontfix to signify that we won't be publishing fixes for them in official chemprop releases, but the bugs can still be open to community discussion.

We encourage all users to try migrating their workflows over to chemprop v2 (available now as a release candidate, stable version planned to be released within the next week) and let us know of any issues you encounter. All v1 releases will remain available on PyPI, and the v1 source code will remain available in this GitHub organization.

What's Changed

fix the uncal_vars for atom/bond property prediction by @shihchengli in #712
[v1]: Add Docker Image Building Action and Official Images to DockerHub by @JacksonBurns in #718
remove macos and windows from v1 ci by @JacksonBurns in #720
update docker build if to use correct upstream branch name by @JacksonBurns in #723
fix the task names by @shihchengli in #725
Fixed typo in README.md by @willspag in #745

New Contributors

@willspag made their first contribution in #745

Full Changelog: v1.7.0...v1.7.1

Contributors

JacksonBurns, shihchengli, and willspag

Assets 2

04 Mar 22:37

kevingreenman

v2.0.0-rc.1

4a1c1f2

v2.0.0 Release Candidate Pre-release

Pre-release

This is a release candidate for Chemprop v2.0.0, to be released in April 2024.

The primary objectives of v2.0.0 are making Chemprop more usable from within Python scripts, more modular, easier to maintain and develop, more compute/memory efficient, and usable with PyTorch Lightning. Some features will not be migrated from v1 to v2 (e.g. web, sklearn). Some v1 features will be added in later versions of v2 (v2.1+) (e.g. uncertainty, interpret, atom- and bond-targets); see milestones here. The new version also has substantially faster featurization speeds and much higher unit test coverage, enables training on multiple GPUs, and works on Windows (in addition to Linux and Mac). Finally, the incorporation of a batch normalization layer is expected to result in smoother training and improved predictions. The label as a “release candidate” reflects its availability to be downloaded via PyPI and that only minor changes are expected for the Python API before the final release. We expect most remaining changes before the release of v2.0.0 in April to be focused on additional improvements to the command line interface (CLI), which does not yet have feature parity with v1. We encourage all Chemprop users to try using v2.0.0-rc.1 to see how it can improve their workflows.

The v2 documentation can be found here.

There are tutorial notebooks for v2 in the examples/ directory.

A helpful transition guide from v1 to v2 can be found here. This includes a side-by-side comparison of CLI argument options, a list of which arguments will be implemented in later versions of v2, and a list of changes to default hyperparameters.

You can subscribe to our development status and notes for this version: #517.

Ongoing work for this version is available on the v2/dev branch.

Please let us know of any bugs you find by opening an issue.

Assets 2

04 Mar 21:51

kevingreenman

v1.7.0

76d59ff

Conformal Calibration

What's Changed

new split per molecular weight by @soulios in #456
Specify license for Chemprop logos by @mliu49 in #461
Add todo.md by @davidegraff in #492
Update authors list in license file and alphabetically sort by @cjmcgill in #532
update authors in LICENSE and setup files for v1 by @kevingreenman in #533
Fix Transpose bug in Inequality Regression by @cjmcgill in #308
Add Dirichlet Evidential Uncertainty Quantification by @cjmcgill in #423
New metrics by @soulios in #542
Updating README with ADMET-AI details by @swansonk14 in #554
Improve error message when gilbrat is needed. by @KnathanM in #569
limit chempropv1 python version to 3.7, 3.8 only by @JacksonBurns in #618
Add a CITATIONS.bib by @JacksonBurns in #627
Limit Maximum Allowed flask Version in v1 by @JacksonBurns in #628
move num_unc_tasks definition to ensure always defined by @kevingreenman in #632
Switching np.mean to np.nanmean to handle NaN metrics by @swansonk14 in #453
Fix the dtype for targets of different sizes by @shihchengli in #638
Add setters for atom and bond constraints by @shihchengli in #637
switch v1 readthedocs build from conda to mamba by @kevingreenman in #660
Fix v1 docs theme by @kevingreenman in #669
Conformal Calibration by @danielxu9393 in #304
add note on feature releases and instructions for ssl+ddp by @JacksonBurns in #685
remove unnecessary argument for reshape function by @shihchengli in #671
Fix atom/bond property prediction with atom-mapped SMILES and target classification by @shihchengli in #673
Pass num_workers to MoleculeDataLoader during interpretation by @kevingreenman in #691
conformal quantile prediction bug fix by @shihchengli in #693

New Contributors

@soulios made their first contribution in #456
@danielxu9393 made their first contribution in #304

Full Changelog: v1.6.1...v1.7.0

Contributors

swansonk14, mliu49, and 8 other contributors

Assets 2

01 Aug 20:09

kevingreenman

v1.6.1

f3d1bff

Bug fix for reaction atom mapping

Bug fix

PR #383 unexpectedly broke the atom mapping for reaction mode. The issue is described in Issue #426 and fixed by PR #427.

What's Changed

Fix versioning issues - metadata and dependencies by @kevingreenman in #420
add job to tests action for PyPI package by @JacksonBurns in #422
added chemprop manuscript to readme by @hesther in #425
Keep Support for Python 3.7 and 3.8 when fixing gilbrat Issue by @JacksonBurns in #431
Fix reaction atom mapping by @shihchengli in #427

Full Changelog: v1.6.0...v1.6.1

Contributors

JacksonBurns, kevingreenman, and 2 other contributors

Assets 2

17 Jul 20:43

kevingreenman

v1.6.0

b2d95f3

Atomic/bond targets prediction

Major New Features

Atomic/bond targets prediction by @shihchengli in #280

What's Changed

Replace multiclass mcc with 1-mcc for loss by @cjmcgill in #332
Add chemprop logo by @shihchengli in #339
Add CodeQL workflow for GitHub code scanning by @lgtm-com in #344
Add to the description of evidential regularization by @cjmcgill in #353
Remove deprecated numpy float types by @cjmcgill in #357
Correct a bug in ENCE uncertainty evaluation by @cjmcgill in #360
Hyperopt Parallel Race Conditions and Manual Trial Load by @cjmcgill in #307
Simplified install with PyPI rdkit and git install in setup.py by @JacksonBurns in #364
Allow providing both loaded features and a features generator by @shihchengli in #318
For any multiclass task, make_predictions fails if option --individual_ensemble_predictions is on. by @piotr-semenov in #354
Save loaded molecular features into .npy files by @shihchengli in #337
Ignore invalid atom-mapped SMILES by @shihchengli in #367
Molecule fingerprinting with invalid SMILES in list by @shihchengli in #351
change calibration_features_path from str to List[str] by @ceroth in #358
Change logo style by @shihchengli in #369
Clamp evidential 'v' parameter by @kevingreenman in #371
fix colab demo by @kevingreenman in #368
Avoid OverflowError when setting field size to sys.maxsize by @shihchengli in #373
Set atom and bond constraints when loading model by @shihchengli in #374
Readme updates by @kevingreenman in #385
Remove atom map numbers for scaffold splits by @shihchengli in #383
update bug report template - ask for full stack trace by @kevingreenman in #401
Fix t-SNE script by @kevingreenman in #403
Fixing skipped lines in csv writing when using a windows computer by @cjmcgill in #406

Full Changelog: v1.5.2...v1.6.0

Contributors

piotr-semenov, ceroth, and 4 other contributors

Assets 2

20 Jul 03:18

cjmcgill

v1.5.2

442a160

Flexible hyperparameter search, missing uncertainty target values, evaluation of different magnitude multitask targets, empty test set assignment, and DockerFile updates

Features

Flexible hyperparameter search space

The parameters to be included in hyperparameter optimization can now be selected using the argument --search_parameter_kewords {list-of-keywords}. The parameters supported are: activation, aggregation, aggregation_norm, batch_size, depth, dropout, ffn_hidden_size, ffn_num_layers, final_lr, hidden_size, init_lr, max_lr, warmup_epochs. Some special kewords are also included for groups of keywords or different search behavior: basic, learning_rate, all, linked_hidden_size.
PR #299

Missing targets in uncertainty calibration datasets

Added capabilities to the uncertainty calibration and evaluation methods to allow them to handle missing target values in multitask jobs. This capability was already included in the normal training of models, now implemented in uncertainty calibration and evaluation.
PR #295
Issue #292

Multitask evaluation for tasks of different magnitudes

When evaluation metrics tend to scale with the magnitude of a task (e.g., rmse), averaging metrics between tasks has been replaced with a geometric mean function. This makes the average metric in multitask regression jobs be less dominated by large magnitude targets. This was previously an issue for hyperparameter optimization and the evaluation of optimal epoch during model training, though the calculation of loss for gradient descent is on scaled targets and was already not scale dependent.
PR #290

Empty test set allowed

An empty test split can now be used during training. This was previously possible only using the cv-no-test split method, but now it is available more widely when specifying split sizes, for example with --split_sizes 0.8 0.2 0.
PR #284, #260 related
Issue #279

Updates to conda environment and docker file

Conda environment building will now prefer to use the pytorch channel over the conda-forge channel. The Dockerfile has been updated to use micromamba, allowing for faster environment solves than conda and removing a potential licensing issue.
PR #276

Bug Fixes

Fix MCC loss for multiclass jobs

Corrected a calculation problem in the loss function that was returning infinite loss inappropriately. Also adopted the convention of returning loss of zero when infinite loss is returned, as often happens in very unbalanced datasets. Added appropriate unit testing.
PR #309
Issue #306

Correct code error in ence uncertainty evaluation

Corrects an error in the ence uncertainty evaluation method that made that method unusable. Bug was introduced during PR #305.
PR #302
Issue #301

Fixed link to MoleculeNet website

Corrected the link to the MoleculeNet benchmark dataset website in the readme, following MoleculeNet migrating to a new site location.
PR #296

Multitarget uncertainty calibration mve weighting method

Previously, this method only worked for single task jobs, now has been extended to work for multitask models as well.
PR #291

Remove unused verion.py file

Version tracking in Chemprop no longer uses the version.py file and it was removed.
PR #283

Multiclass argument typo in readme

Corrected a typo where the number of classes used in multiclass regression should have been indicated as --multiclass_num_classes.
PR #281

Repair individual ensemble predictions

Refactoring of prediction file during the addition of uncertainty functions disabled the option to return the individual predictions of each member of an ensemble of models. Option is now available again.
PR #274

Assets 2

Releases: chemprop/chemprop

v2.0.3

Notable changes

What's Changed

Contributors

v2.0.2 Adding Document Modules and hpopt Enhancement

New CLI Features

Improved --model-path CLI

Improvements for hpopt CLI

Bug fix

What's Changed

Contributors

v2.0.1 First Patch

New CLI Features

Caching in CLI

Change the default trial scheduler in HPO

Support Optuna in HPO

CLI Bug Fixes

HPO-related bugs

DDP-related bugs

Backwards incompatibility note

What's Changed

New Contributors

Contributors

v2.0.0 Stable Release

Final Patch for Version 1

What's Changed

New Contributors

Contributors

v2.0.0 Release Candidate

Conformal Calibration

What's Changed

New Contributors

Contributors

Bug fix for reaction atom mapping

Bug fix

What's Changed

Contributors

Atomic/bond targets prediction

Major New Features

What's Changed

Contributors

Flexible hyperparameter search, missing uncertainty target values, evaluation of different magnitude multitask targets, empty test set assignment, and DockerFile updates

Features

Flexible hyperparameter search space

Missing targets in uncertainty calibration datasets

Multitask evaluation for tasks of different magnitudes

Empty test set allowed

Updates to conda environment and docker file

Bug Fixes

Fix MCC loss for multiclass jobs

Correct code error in ence uncertainty evaluation

Fixed link to MoleculeNet website

Multitarget uncertainty calibration mve weighting method

Remove unused verion.py file

Multiclass argument typo in readme

Repair individual ensemble predictions

Improved `--model-path` CLI