Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow MapieRegressor to use K-fold iterator variants with stratification and groups. #393

Conversation

pidefrem
Copy link
Contributor

@pidefrem pidefrem commented Jan 2, 2024

Description

Continuing the work done in PR
Allow the use of GroupKFold cv-split (see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html) and also the use of custom cv-splits based on StratifiedKFold for example (see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html#sklearn.model_selection.StratifiedKFold)

Fixes #202

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • test_regression.test_results_with_constant_groups
  • test_regression.test_results_with_groups
  • test_classification. test_results_with_constant_groups
  • test_classification. test_results_with_groups

Checklist

  • I have read the contributing guidelines
  • I have updated the HISTORY.rst and AUTHORS.rst files
  • Linting passes successfully : make lint
  • Typing passes successfully : make type-check
  • Unit tests pass successfully : make tests
  • Coverage is 100% : make coverage
  • Documentation builds successfully : make doc

@thibaultcordier
Copy link
Collaborator

thibaultcordier commented Jan 3, 2024

Hello @pidefrem ! Thank you for this contribution. I'll take the time to review it. Just to be in line with your proposal, I understand that in Allow MapieRegressor to use group split strategy, the split strategy refers to the way of doing cross-validation but that it is a cross conformal method.

Don't hesitate to contact me if you need any help.

@pidefrem
Copy link
Contributor Author

pidefrem commented Jan 3, 2024

Hello @thibaultcordier, yes it refers to the split methods of the cross validator used during the fit of the MAPIE estimators. Please feel free to suggest any other description that you think is more suitable.

@pidefrem pidefrem changed the title Allow MapieRegressor to use group split strategy Allow MapieRegressor to use K-fold iterator variants with stratification and groups. Jan 3, 2024
@pidefrem pidefrem force-pushed the 202-estimator-groupkfold-split-strategy branch from 17b0bf3 to 7842773 Compare January 3, 2024 10:56
@pidefrem pidefrem force-pushed the 202-estimator-groupkfold-split-strategy branch from dc21696 to 98fadea Compare January 3, 2024 11:06
@thibaultcordier thibaultcordier added enhancement New feature or request contributors Proposed by contributors. labels Jan 3, 2024
@pidefrem
Copy link
Contributor Author

Wonderful! Thank you for this very welcome contribution. I don't have many comments to add but just a few to complete your proposal exhaustively:

  • some style code suggestions
  • As far as the tests are concerned, you are indeed testing whether MAPIE returns the same results using the same group (constant or None). But what happens if you use different groups like np.concat([np.ones(shape=n_samples/2), 2*np.ones(shape=n_samples/2)]) (untested proposal).
  • duplicate the tests in the classification test file because changes have been made in this part of the code.

@thibaultcordier I fixed some issues and added some tests, tell me if it is ok now.

@thibaultcordier
Copy link
Collaborator

Wonderful! Thank you for this very welcome contribution. I don't have many comments to add but just a few to complete your proposal exhaustively:

  • some style code suggestions
  • As far as the tests are concerned, you are indeed testing whether MAPIE returns the same results using the same group (constant or None). But what happens if you use different groups like np.concat([np.ones(shape=n_samples/2), 2*np.ones(shape=n_samples/2)]) (untested proposal).
  • duplicate the tests in the classification test file because changes have been made in this part of the code.

@thibaultcordier I fixed some issues and added some tests, tell me if it is ok now.

Hello @pidefrem, I'll check your changes this week. Thank you for contacting me about the review. I'll keep you informed.

Copy link
Collaborator

@thibaultcordier thibaultcordier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pidefrem Great! Thank you for these changes, I don't have any feedback to give, ready to merge!

Copy link
Collaborator

@LacombeLouis LacombeLouis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pidefrem,
Thank you for this PR! This will be extremely useful. One little modification would be to remove the environment from the MAPIE folder/GitHub. It can be saved outside the folder.
If you can do this, we will be able to merge your PR asap!
Thank you again!

Makefile Outdated
Comment on lines 3 to 4
lint:
flake8 . --exclude=doc,.venv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pidefrem,
Could you please remove the environment? You can also save the env somewhere else than in the MAPIE folder.
Thank you!

.gitignore Outdated Show resolved Hide resolved
Copy link
Collaborator

@LacombeLouis LacombeLouis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One little modification would be to remove the environment from the MAPIE folder/GitHub. It can be saved outside the folder.

@thibaultcordier thibaultcordier merged commit 50343ec into scikit-learn-contrib:master Feb 9, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributors Proposed by contributors. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] Allow the use of GroupKFold in MAPIE
4 participants