-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Optimize is_sorted
for numeric data
#16333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli check comments FYI.
@@ -82,8 +82,9 @@ where | |||
if ca.is_empty() { | |||
return Ok(Series::new_empty(ca.name(), ca.dtype())); | |||
} | |||
let ca = ca.rechunk(); | |||
let by = by.rechunk(); | |||
if by.null_count() > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be factored out of branch and be checked before any compute (rechunking).
if by.null_count() > 0 { | ||
polars_bail!(InvalidOperation: "'Expr.rolling_*_by(...)' not yet supported for series with null values, consider using 'DataFrame.rolling' or 'Expr.rolling'") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs checking for ca
too? looks like it's being removed from L107 but not replaced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, L114 can probably now just become
let by_values = by.cont_slice().expect("`by` has already been rechunked and checked for null values");
?
likewise L137
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't I believe, but we should check that higher up indeed. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it is. NVM.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16333 +/- ##
==========================================
+ Coverage 81.37% 81.38% +0.01%
==========================================
Files 1403 1403
Lines 183691 183751 +60
Branches 2954 2954
==========================================
+ Hits 149471 149548 +77
+ Misses 33709 33692 -17
Partials 511 511 ☔ View full report in Codecov by Sentry. |
Should be much faster and failing fast in case of non-sorted data.