Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add MultiAggJoiner, refactor AggJoiner #876

Merged
merged 115 commits into from
Mar 15, 2024

Conversation

TheooJ
Copy link
Contributor

@TheooJ TheooJ commented Jan 16, 2024

Sharing my WIP for issue #810

Adding the MultiAggJoiner also means refactoring the AggJoiner to match the Joiner, in particular:

  • Adding a key argument
  • Supporting polars in the test suite
  • Only accepting a single table as an input
  • Performing multiple small checks instead of one big check

Notes:

  • For now, the check concerning duplicate column names happens after the aggregation in AggJoiner. Later we / I could move it before aggregation, I didn’t want to touch it for now since it is linked to the _polars.aggregate and _pandas.aggregate functions, let me know how you feel about this
  • Should I change the landing page to show the multijoiners, or keep Joiner and AggJoiner ?

Also, thanks @jeromedockes & @Vincent-Maladiere for the discussions !

Copy link
Member

@jeromedockes jeromedockes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @TheooJ !

skrub/_utils.py Show resolved Hide resolved
skrub/_multi_agg_joiner.py Outdated Show resolved Hide resolved
@glemaitre glemaitre self-requested a review February 23, 2024 11:37
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a first pass. In general, I'm quite happy with what is there.

I assume that we will need an entry in the changelog as well.

doc/assembling.rst Outdated Show resolved Hide resolved
doc/assembling.rst Outdated Show resolved Hide resolved
doc/assembling.rst Outdated Show resolved Hide resolved
skrub/_agg_joiner.py Outdated Show resolved Hide resolved
skrub/_agg_joiner.py Outdated Show resolved Hide resolved
skrub/_multi_agg_joiner.py Outdated Show resolved Hide resolved
skrub/_multi_agg_joiner.py Outdated Show resolved Hide resolved
skrub/_multi_agg_joiner.py Show resolved Hide resolved
skrub/_multi_agg_joiner.py Show resolved Hide resolved
skrub/tests/test_multi_agg_joiner.py Outdated Show resolved Hide resolved
@GaelVaroquaux
Copy link
Member

You need a changelog entry for this (adding an entry in doc/CHANGES.rst)

@glemaitre glemaitre self-requested a review March 4, 2024 21:28
Copy link
Member

@jeromedockes jeromedockes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @TheooJ !

@glemaitre , I think the main outstanding comment from your previous review was whether passing a list of strings (instead of list of list of strings) for the keys should be allowed, and that has been addressed.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @TheooJ

@glemaitre glemaitre merged commit c0c6870 into skrub-data:main Mar 15, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants