Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Add one-sided set differences for clarity in param validation #23772

Merged
merged 16 commits into from
Jul 20, 2022

Conversation

Micky774
Copy link
Contributor

@Micky774 Micky774 commented Jun 27, 2022

Reference Issues/PRs

Fixes #23744

What does this implement/fix? Explain your changes.

Add one-sided set differences for clarity in param validation

Any other comments?

@Micky774 Micky774 changed the title ENH Add one-sided set differences for clarity in param validation MAINT Add one-sided set differences for clarity in param validation Jun 27, 2022
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb
Copy link
Member

Usually I run pytest -vl sklearn/tests/test_common.py -k check_param_validation to check that the constraints are properly set. It happens that this test also checks that the constraints match the parameters which will error before showing the message you added here. I guess it makes more sense to keep the message here so I'd remove the test for param match in check_param_validation.

@jeremiedbb jeremiedbb added the Validation related to input validation label Jun 28, 2022
@Micky774
Copy link
Contributor Author

@jeremiedbb Should be good now?

@jeremiedbb
Copy link
Member

I'm having second thoughts. I don't think that validate_params should ensure that there's a 1-to-1 match between the parameters and the dict of constraints. The reason is that it's a private thing and we don't want to enfore third party estimators to set or update _parameter_constraints.

If a one writes a custom estimator that inherits from a sklearn estimator and calls super().fit, it would error if the parameters are not the same. See this example:

class MyEstimator(KMeans):
    def __init__(self, <all KMeans params>, additional_param):
        ...

    def fit(self, X, y=None):
        # do some extra stuff

        super().fit(X, y)  # <- calls self._validate_params and fails because 
                           # additional_param is not in _parameter_constraints

To not break user code (e.g. ImbalanceLearn maybe ? @glemaitre), I'd be in favor of being conservative and only validate params that are in the dict. We can still raise if a key in the dict doesn't correspond to any of the parameters.

For scikit-learn estimators we still want to make sure that there's a 1-to-1 match, which should be ensured by the common test.
What do you think @glemaitre @thomasjpfan ?

@glemaitre
Copy link
Member

To not break user code (e.g. ImbalanceLearn maybe ? @glemaitre),

@jeremiedbb is right. I can think of some predictor like BalancedRandomForestClassifier with an additional parameter that will fail if we don't use the new tools.

It could be great to not impose it on third-party estimators for the moment.

@Micky774
Copy link
Contributor Author

Micky774 commented Jul 6, 2022

To not break user code (e.g. ImbalanceLearn maybe ? @glemaitre), I'd be in favor of being conservative and only validate params that are in the dict. We can still raise if a key in the dict doesn't correspond to any of the parameters.

Updated this PR to implement this behavior. I added a list of unexpected/excess parameters to the error message as well to make it more informative. I moved the two set differences to the assertion error message in check_param_validation. Let me know what you think

Edit: To clarify, in this PR the ValueError in validate_parameter_constraints is only raised if the list of _parameter_constraints contains parameters not present in the estimator. For internal strictness, missing parameters are still caught by check_param_validation in the common test, with an informative error message.

@thomasjpfan
Copy link
Member

For scikit-learn estimators we still want to make sure that there's a 1-to-1 match

I agree with not being strict for third party estimators and being strict for scikit-learn estimators.

sklearn/utils/_param_validation.py Show resolved Hide resolved
sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few remaining nitpcks

sklearn/utils/tests/test_param_validation.py Show resolved Hide resolved
sklearn/utils/_param_validation.py Outdated Show resolved Hide resolved
sklearn/utils/_param_validation.py Outdated Show resolved Hide resolved
sklearn/utils/_param_validation.py Outdated Show resolved Hide resolved
@Micky774
Copy link
Contributor Author

Micky774 commented Jul 7, 2022

@glemaitre Are you happy with the changes made after you first approved?

@Micky774
Copy link
Contributor Author

@glemaitre @jeremiedbb Sorry for the spam, just wanted to follow up on this :)

@jeremiedbb
Copy link
Member

Still good on my side

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well.

@ogrisel ogrisel merged commit 91f0227 into scikit-learn:main Jul 20, 2022
@Micky774 Micky774 deleted the param_val_sets branch July 20, 2022 11:47
mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add set difference to validate_parameter_constraints ValueError
5 participants