Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust!): separate rolling_*_by from rolling_*(..., by=...) in Rust #16102

Merged
merged 1 commit into from
May 10, 2024

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented May 7, 2024

This is quite big. But it's quite rewarding, and will definitely be a better user experience. It's just quite hard to review.

So, summary of changes:

  • on the Python side, rolling_mean_by now calls self._pyexpr.rolling_mean_by, instead of self._pyexpr.rolling_mean. There's a thin validation layer before getting to the Rust function, to make sure that deprecations/errors are shown
  • on the Rust side:
    • PyExpr.rolling_mean and PyExr.rolling_mean_by now each only have the arguments which are relevant to them (i.e. the former doesn't have closed, the latter doesn't have weights)
    • RollingOptions and RollingOptionsImpl are gone 🔥 There's no just two such struct:
      • RollingOptionsFixedWindow
      • RollingOptionsDynamicWindow
    • convert from crates/polars-plan/src/dsl/function_expr/rolling.rs is gone
    • rolling_*(..., by=....) would take an integer window size, convert it to string, parse it as Duration, and then get the window size out again from window_size.duration_ns(). All those gymnastics can be skipped: the user passes in an integer, and it stays as integer, no need to go via Duration
    • the rolling_window and rolling_window_by features are now separate

As a bonus, rolling_*_by now accepts expressions for by!

The rest is just moving things around


Things that I'd like to do as follow-ups:

  • further split out rolling_window from rolling_window_by features - I think ideally the former should have no knowledge of polars-time at all
  • nulls aren't supported in rolling_*_by. I see no theoretical reason why not, so...I'll work on this
  • no requiring the data to be sorted on the user's behalf in the _by ones

Comment on lines -223 to +229
msg = "in `rolling_min` operation, `by` argument of dtype `i64` is not supported"
msg = r"in `rolling_\*_by` operation, `by` argument of dtype `i64` is not supported"
with pytest.raises(InvalidOperationError, match=msg):
df.select(pl.col("b").rolling_min_by("a", 2)) # type: ignore[arg-type]
df.select(pl.col("b").rolling_min_by("a", "2i"))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that rolling_min_by is split from rolling_min, it no longer accepts integers as input for window_size, and raises a different error

I've slightly modified this one to keep the spirit of the existing test (i.e. asserting that the by column can't be of integer dtype)

Comment on lines -227 to +233
msg = "if `by` argument is passed, then `window_size` must be a temporal window"
msg = "`window_size` duration may not be a parsed integer"
with pytest.raises(InvalidOperationError, match=msg):
df.select(pl.col("a").rolling_min_by("b", 2)) # type: ignore[arg-type]
df.select(pl.col("a").rolling_min_by("b", "2i"))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines -251 to +260
msg = "if `by` argument is passed, then `window_size` must be a temporal window"
msg = "`window_size` duration may not be a parsed integer"
with pytest.raises(InvalidOperationError, match=msg):
df.with_columns(pl.col("a").rolling_sum_by("b", 2, closed="left")) # type: ignore[arg-type]
df.with_columns(pl.col("a").rolling_sum_by("b", "2i", closed="left"))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

@MarcoGorelli MarcoGorelli force-pushed the split-rolling-by-in-rust-too branch from 710b1a9 to 588e2b4 Compare May 9, 2024 11:50
Copy link

codecov bot commented May 9, 2024

Codecov Report

Attention: Patch coverage is 94.14802% with 34 lines in your changes are missing coverage. Please review.

Project coverage is 80.96%. Comparing base (b38aa4b) to head (01d74de).
Report is 1 commits behind head on main.

Files Patch % Lines
...es/polars-plan/src/dsl/function_expr/rolling_by.rs 78.94% 12 Missing ⚠️
py-polars/src/lazyframe/visitor/expr_nodes.rs 0.00% 8 Missing ⚠️
...s-time/src/chunkedarray/rolling_window/dispatch.rs 95.51% 7 Missing ⚠️
py-polars/polars/expr/expr.py 90.90% 4 Missing ⚠️
...olars-core/src/chunked_array/ops/rolling_window.rs 88.88% 1 Missing ⚠️
crates/polars-plan/src/dsl/function_expr/mod.rs 92.85% 1 Missing ⚠️
...polars-time/src/chunkedarray/rolling_window/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16102      +/-   ##
==========================================
+ Coverage   80.95%   80.96%   +0.01%     
==========================================
  Files        1386     1387       +1     
  Lines      178423   178629     +206     
  Branches     2881     2886       +5     
==========================================
+ Hits       144437   144632     +195     
- Misses      33493    33503      +10     
- Partials      493      494       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

#[cfg(feature = "rolling_window")]
#[cfg(feature = "rolling_window_by")]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting this to work was oddly satisfying 😌

Comment on lines -6168 to +6170
by: str,
by: IntoExpr,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the great expressification continues!

@MarcoGorelli MarcoGorelli force-pushed the split-rolling-by-in-rust-too branch 2 times, most recently from aaf9e88 to a7f9d48 Compare May 9, 2024 19:57
Copy link

codspeed-hq bot commented May 9, 2024

CodSpeed Performance Report

Merging #16102 will not alter performance

Comparing MarcoGorelli:split-rolling-by-in-rust-too (01d74de) with main (b38aa4b)

Summary

✅ 35 untouched benchmarks

@MarcoGorelli MarcoGorelli changed the title WIP: separate rolling_*_by from rolling_*(..., by=...) in Rust feat(rust!): separate rolling_*_by from rolling_*(..., by=...) in Rust May 10, 2024
@MarcoGorelli MarcoGorelli marked this pull request as ready for review May 10, 2024 14:15
@github-actions github-actions bot added breaking rust Change that breaks backwards compatibility for the Rust crate enhancement New feature or an improvement of an existing feature rust Related to Rust Polars labels May 10, 2024
@ritchie46
Copy link
Member

So only moving around, no execution logic touched?

@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented May 10, 2024

the execution logic is untouched, that's right

@ritchie46
Copy link
Member

Great! Thanks. That's quite a churn. ^^

@ritchie46 ritchie46 merged commit c525d64 into pola-rs:main May 10, 2024
29 checks passed
@rcliu623
Copy link

rcliu623 commented May 29, 2024

Dumb question -- I definitely see the benefits of splitting rolling_* and rolling_*_by into separate functionalities, though how is this related to removing closed=... from rolling_*? I could see scenarios when a user would like to do rolling window calculation but have control over left-closed or right-closed

Also documentation on rolling_* is stale. At least rolling_std is still presenting closed as a valid input.
https://docs.pola.rs/py-polars/html/reference/expressions/api/polars.Expr.rolling_std.html

@MarcoGorelli
Copy link
Collaborator Author

closed was never supported for rolling_*, so there's no change in functionality there. maybe it should be, but that's a separate issue - fancy opening a new one?

i'll update the docs, thanks

Wouittone pushed a commit to Wouittone/polars that referenced this pull request Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking rust Change that breaks backwards compatibility for the Rust crate enhancement New feature or an improvement of an existing feature rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants