Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygmt.grd2xyz: Improve performance by storing output in virtual files #3097

Merged
merged 7 commits into from
Mar 13, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Mar 11, 2024

Changes in this PR:

Preview: https://pygmt-dev--3097.org.readthedocs.build/en/3097/api/generated/pygmt.grd2xyz.html#pygmt.grd2xyz

@seisman seisman added the run/benchmark Trigger the benchmark workflow in PRs label Mar 11, 2024
Copy link

codspeed-hq bot commented Mar 11, 2024

CodSpeed Performance Report

Merging #3097 will degrade performances by 81.76%

⚠️ No base runs were found

Falling back to comparing output/grd2xyz (2c5aeb3) with main (0b46aad)

Summary

❌ 1 regressions
✅ 97 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main output/grd2xyz Change
test_grd2xyz 43.1 ms 236.2 ms -81.76%

@@ -52,44 +49,6 @@ def test_grd2xyz_format(grid):
assert list(xyz_df.columns) == ["lon", "lat", "z"]


def test_grd2xyz_file_output(grid):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The four tests are not directly related to grd2xyz. test_grd2xyz_file_output is already covered by the doctest of the Session.virtualfile_out method. The other three tests will be covered by doctests in PR #3098.

@seisman
Copy link
Member Author

seisman commented Mar 11, 2024

CodSpeed Performance Report

Merging #3097 will degrade performances by 81.22%

⚠️ No base runs were found

Falling back to comparing output/grd2xyz (f423c7d) with main (0b46aad)

Summary

❌ 1 regressions ✅ 97 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main output/grd2xyz Change
test_grd2xyz 43.1 ms 229.4 ms -81.22%

Run the test locally:

>>> import pygmt
>>> from pygmt.helpers.testing import load_static_earth_relief
>>> grid = load_static_earth_relief()
>>> %timeit pygmt.grd2xyz(grid=grid, output_type="numpy")

In the main branch:

5.06 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In this branch:

10 ms ± 185 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So, for this small test, the new version is much slower, as suggested by the CodeSpeed benchmark report.

For a larger grid data:

>>> import pygmt
>>>  for res in ["01d", "30m", "15m", "10m"]:
...     %timeit pygmt.grd2xyz(grid=f"@earth_relief_{res}_g")
...

In the main branch:

98.4 ms ± 4.77 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
444 ms ± 45.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.6 s ± 33.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.67 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In this branch:

21.8 ms ± 2.98 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
57 ms ± 7.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
143 ms ± 3.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
287 ms ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

It's clear that for large output dataset, the new version is faster.

@seisman seisman changed the title pygmt.grd2xyz: Refactor using the virtualfile_to_dataset method and get rid of temporal output files pygmt.grd2xyz: Refactor using virtual files instead of temporary files for output Mar 11, 2024
@seisman seisman changed the title pygmt.grd2xyz: Refactor using virtual files instead of temporary files for output pygmt.grd2xyz: Improve performanc using virtual files instead of temporary files for output Mar 11, 2024
@seisman seisman changed the title pygmt.grd2xyz: Improve performanc using virtual files instead of temporary files for output pygmt.grd2xyz: Improve performance using virtual files instead of temporary files for output Mar 11, 2024
@seisman seisman added maintenance Boring but important stuff for the core devs needs review This PR has higher priority and needs review. labels Mar 11, 2024
@seisman seisman added this to the 0.12.0 milestone Mar 11, 2024
@seisman seisman marked this pull request as ready for review March 11, 2024 14:20
@michaelgrund michaelgrund added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Mar 12, 2024
@seisman seisman changed the title pygmt.grd2xyz: Improve performance using virtual files instead of temporary files for output pygmt.grd2xyz: Improve performance using virtual files for output Mar 13, 2024
@seisman seisman changed the title pygmt.grd2xyz: Improve performance using virtual files for output pygmt.grd2xyz: Improve performance by storing output in virtual files Mar 13, 2024
@seisman seisman merged commit 752305c into main Mar 13, 2024
18 of 20 checks passed
@seisman seisman deleted the output/grd2xyz branch March 13, 2024 05:52
@seisman seisman removed final review call This PR requires final review and approval from a second reviewer run/benchmark Trigger the benchmark workflow in PRs labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants