Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Limit the cache size for to_datetime #15826

Merged
merged 2 commits into from
Apr 22, 2024
Merged

Conversation

reswqa
Copy link
Collaborator

@reswqa reswqa commented Apr 22, 2024

Fix #15736.

s = pl.datetime_range(date(2000, 1, 1), date(2001, 1, 1), '1s', eager=True)
strings = s.dt.strftime('%Y-%m-%dT%H:%M:%S')

Without cache:

%timeit strings.str.to_datetime(format='%Y-%m-%dT%H:%M:%S', cache=False)
1.47 s ± 71.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With cache:

%timeit strings.str.to_datetime(format='%Y-%m-%dT%H:%M:%S', cache=True)
  • Before this PR
18.2 s ± 790 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • After this PR
1.88 s ± 59.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Consider that the test data are all unique(The cache will never hit) and large, I think the overhead is acceptable.

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Apr 22, 2024
Copy link

codecov bot commented Apr 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.38%. Comparing base (0c2783a) to head (dcd16f1).
Report is 30 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15826      +/-   ##
==========================================
- Coverage   81.35%   80.38%   -0.97%     
==========================================
  Files        1379     1263     -116     
  Lines      176619   165368   -11251     
  Branches     2544        0    -2544     
==========================================
- Hits       143686   132933   -10753     
+ Misses      32449    32435      -14     
+ Partials      484        0     -484     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codspeed-hq bot commented Apr 22, 2024

CodSpeed Performance Report

Merging #15826 will not alter performance

Comparing reswqa:to_date_cache (dcd16f1) with main (a078d0c)

Summary

✅ 22 untouched benchmarks

@reswqa reswqa marked this pull request as ready for review April 22, 2024 08:24
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. Thanks @reswqa

@ritchie46 ritchie46 merged commit ef69d13 into pola-rs:main Apr 22, 2024
22 checks passed
@reswqa reswqa deleted the to_date_cache branch April 22, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cache=True (default) in to_datetime causes too big slow-down
2 participants