MAINT Add one-sided set differences for clarity in param validation #23772

Micky774 · 2022-06-27T17:09:55Z

Reference Issues/PRs

Fixes #23744

What does this implement/fix? Explain your changes.

Add one-sided set differences for clarity in param validation

Any other comments?

glemaitre

LGTM

jeremiedbb · 2022-06-28T11:46:59Z

Usually I run pytest -vl sklearn/tests/test_common.py -k check_param_validation to check that the constraints are properly set. It happens that this test also checks that the constraints match the parameters which will error before showing the message you added here. I guess it makes more sense to keep the message here so I'd remove the test for param match in check_param_validation.

Micky774 · 2022-06-30T15:45:10Z

@jeremiedbb Should be good now?

jeremiedbb · 2022-07-01T10:42:38Z

I'm having second thoughts. I don't think that validate_params should ensure that there's a 1-to-1 match between the parameters and the dict of constraints. The reason is that it's a private thing and we don't want to enfore third party estimators to set or update _parameter_constraints.

If a one writes a custom estimator that inherits from a sklearn estimator and calls super().fit, it would error if the parameters are not the same. See this example:

class MyEstimator(KMeans):
    def __init__(self, <all KMeans params>, additional_param):
        ...

    def fit(self, X, y=None):
        # do some extra stuff

        super().fit(X, y)  # <- calls self._validate_params and fails because 
                           # additional_param is not in _parameter_constraints

To not break user code (e.g. ImbalanceLearn maybe ? @glemaitre), I'd be in favor of being conservative and only validate params that are in the dict. We can still raise if a key in the dict doesn't correspond to any of the parameters.

For scikit-learn estimators we still want to make sure that there's a 1-to-1 match, which should be ensured by the common test.
What do you think @glemaitre @thomasjpfan ?

glemaitre · 2022-07-06T14:04:30Z

To not break user code (e.g. ImbalanceLearn maybe ? @glemaitre),

@jeremiedbb is right. I can think of some predictor like BalancedRandomForestClassifier with an additional parameter that will fail if we don't use the new tools.

It could be great to not impose it on third-party estimators for the moment.

Micky774 · 2022-07-06T14:23:28Z

To not break user code (e.g. ImbalanceLearn maybe ? @glemaitre), I'd be in favor of being conservative and only validate params that are in the dict. We can still raise if a key in the dict doesn't correspond to any of the parameters.

Updated this PR to implement this behavior. I added a list of unexpected/excess parameters to the error message as well to make it more informative. I moved the two set differences to the assertion error message in check_param_validation. Let me know what you think

Edit: To clarify, in this PR the ValueError in validate_parameter_constraints is only raised if the list of _parameter_constraints contains parameters not present in the estimator. For internal strictness, missing parameters are still caught by check_param_validation in the common test, with an informative error message.

thomasjpfan · 2022-07-06T14:25:08Z

For scikit-learn estimators we still want to make sure that there's a 1-to-1 match

I agree with not being strict for third party estimators and being strict for scikit-learn estimators.

sklearn/utils/_param_validation.py

sklearn/utils/estimator_checks.py

jeremiedbb

LGTM. Just a few remaining nitpcks

sklearn/utils/tests/test_param_validation.py

sklearn/utils/_param_validation.py

Co-authored-by: Jérémie du Boisberranger <[email protected]>

Micky774 · 2022-07-07T16:41:41Z

@glemaitre Are you happy with the changes made after you first approved?

Micky774 · 2022-07-17T20:39:24Z

@glemaitre @jeremiedbb Sorry for the spam, just wanted to follow up on this :)

jeremiedbb · 2022-07-19T11:23:41Z

Still good on my side

ogrisel

LGTM as well.

…cikit-learn#23772) Co-authored-by: Jérémie du Boisberranger <[email protected]>

Micky774 added 2 commits June 27, 2022 13:03

Added in one-sided set differences for clarity

73ea805

Improved wording a bit

709871d

github-actions bot added the module:utils label Jun 27, 2022

Micky774 changed the title ~~ENH Add one-sided set differences for clarity in param validation~~ MAINT Add one-sided set differences for clarity in param validation Jun 27, 2022

glemaitre added the No Changelog Needed label Jun 28, 2022

glemaitre approved these changes Jun 28, 2022

View reviewed changes

jeremiedbb added the Validation related to input validation label Jun 28, 2022

Micky774 added 3 commits June 28, 2022 14:39

Removed check for param match in test (now part of ValueError)

d506023

Merge branch 'main' into param_val_sets

9a76a18

Merge branch 'main' into param_val_sets

3617b7c

Micky774 added 2 commits June 30, 2022 11:45

Merge branch 'main' into param_val_sets

8b5be8c

Merge branch 'main' into param_val_sets

dfc00db

Micky774 added 2 commits July 6, 2022 10:06

Merge branch 'main' into param_val_sets

b27c4c2

Updated tests/error to be more informative but not restrictive

dd06a2c

jeremiedbb reviewed Jul 6, 2022

View reviewed changes

sklearn/utils/_param_validation.py Show resolved Hide resolved

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

Addressed review

c0769d6

jeremiedbb mentioned this pull request Jul 6, 2022

TST refactor test for contamination param for outlier detectors #23738

Merged

jeremiedbb approved these changes Jul 6, 2022

View reviewed changes

sklearn/utils/tests/test_param_validation.py Show resolved Hide resolved

sklearn/utils/_param_validation.py Outdated Show resolved Hide resolved

sklearn/utils/_param_validation.py Outdated Show resolved Hide resolved

sklearn/utils/_param_validation.py Outdated Show resolved Hide resolved

Micky774 and others added 4 commits July 6, 2022 13:35

Merge branch 'main' into param_val_sets

00ae602

Apply suggestions from code review

3126975

Co-authored-by: Jérémie du Boisberranger <[email protected]>

Updated docstring

73eb363

Merge branch 'main' into param_val_sets

82d50bf

Merge branch 'main' into param_val_sets

d4255e8

Merge branch 'main' into param_val_sets

562b643

ogrisel approved these changes Jul 20, 2022

View reviewed changes

ogrisel merged commit 91f0227 into scikit-learn:main Jul 20, 2022

Micky774 deleted the param_val_sets branch July 20, 2022 11:47

mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022

MAINT Add one-sided set differences for clarity in param validation (s…

c31dc36

…cikit-learn#23772) Co-authored-by: Jérémie du Boisberranger <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT Add one-sided set differences for clarity in param validation #23772

MAINT Add one-sided set differences for clarity in param validation #23772

Micky774 commented Jun 27, 2022 •

edited by jeremiedbb

Loading

glemaitre left a comment

jeremiedbb commented Jun 28, 2022

Micky774 commented Jun 30, 2022

jeremiedbb commented Jul 1, 2022

glemaitre commented Jul 6, 2022

Micky774 commented Jul 6, 2022 •

edited

Loading

thomasjpfan commented Jul 6, 2022

jeremiedbb left a comment

Micky774 commented Jul 7, 2022

Micky774 commented Jul 17, 2022

jeremiedbb commented Jul 19, 2022

ogrisel left a comment

MAINT Add one-sided set differences for clarity in param validation #23772

MAINT Add one-sided set differences for clarity in param validation #23772

Conversation

Micky774 commented Jun 27, 2022 • edited by jeremiedbb Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

glemaitre left a comment

Choose a reason for hiding this comment

jeremiedbb commented Jun 28, 2022

Micky774 commented Jun 30, 2022

jeremiedbb commented Jul 1, 2022

glemaitre commented Jul 6, 2022

Micky774 commented Jul 6, 2022 • edited Loading

thomasjpfan commented Jul 6, 2022

jeremiedbb left a comment

Choose a reason for hiding this comment

Micky774 commented Jul 7, 2022

Micky774 commented Jul 17, 2022

jeremiedbb commented Jul 19, 2022

ogrisel left a comment

Choose a reason for hiding this comment

Micky774 commented Jun 27, 2022 •

edited by jeremiedbb

Loading

Micky774 commented Jul 6, 2022 •

edited

Loading