Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a static grid, asymmetric region, and xarray.testing for raster tests #1684

Closed
23 tasks done
maxrjones opened this issue Dec 22, 2021 · 23 comments
Closed
23 tasks done
Labels
maintenance Boring but important stuff for the core devs
Milestone

Comments

@maxrjones
Copy link
Member

maxrjones commented Dec 22, 2021

This issue was originally for discussing improvements to the PyGMT test suite and is now dedicated to tracking specific improvements:

  1. Use xarray.testing rather than pygmt.grdinfo
  2. Use static_earth_relief rather than earth_relief
  3. Use asymmetric regions to facilitate identification of lat/lon errors

If anyone wants to help, please edit this comment to add your username next to the module or post a comment below to avoid redundant work.

@maxrjones
Copy link
Member Author

Thanks @willschlitzer for getting started on these tests! As @weiji14 mentioned in #593 (comment), we may want to work to address the likely failures due to upcoming @earth_relief failures during this process of migrating away from grdinfo. I have a few questions/comments regarding the tests:

  1. Should we continue using the @earth_relief remote datasets for grid processing/plotting tests at all? We could instead create a xarray.DataArray with a subset of the V2.1 SRTM15+ data using a new helper function (e.g., pygmt.helpers.testing.static_earth_relief(). Probably no one wants to write out the data for 360 x 180 grid, so this would require determining whether any of the tests using global grids could use a smaller subset. The tests that use @earth_relief are listed below for context. For small grids (10x10), it is ~5% quicker to create a grid than use the remote data and could mean less maintenance work for future SRTM15+ updates.

  2. I think we should move away from regions where the latitudes and longitudes could be switched without notice (e.g., region = [-5, 5, -5, 5]

  3. We may need to adjust the rtol on assert_allclose for some tests - I'll post more on this later but accidentally opened this issue before working out the precision factors.

Processing/plotting tests using load_earth_relief:

Global pixel registration

  • test_grdcut.py - load_earth_relief(registration="pixel")
  • test_grd2cpt.py - load_earth_relief()
  • test_grdfilter.py - load_earth_relief(registration="pixel”)

Global gridline registration

  • test_grdimage.py - load_earth_relief(registration="gridline")
  • test_grdinfo.py - load_earth_relief(registration="gridline")
  • test_grdcontour.py - load_earth_relief(registration="gridline”)
  • test_makecpt.py - load_earth_relief(registration="gridline”)

Symmetric subset

  • test_grdgradient.py - load_earth_relief(resolution="01d", region=[-5, 5, -5, 5])
  • test_grdproject.py - load_earth_relief(resolution="01d", region=[-5, 5, -5, 5])
  • test_grdclip.py - load_earth_relief(resolution="01d", region=[-5, 5, -5, 5]
  • test_grdsample.py - load_earth_relief(resolution="01d", region=[-5, 5, -5, 5], registration=“pixel”)

Other subset

  • test_grd2xyz.py - load_earth_relief(resolution="01d", region=[-1, 1, 3, 5]
  • test_grdtrack.py - load_earth_relief(registration="gridline").sel(lat=slice(-49, -42), lon=slice(-118, -107))
  • test_grdvolume.py - load_earth_relief(resolution="01d", region=[-100, -95, 34, 39])
  • test_grdfill.py - load_earth_relief(registration="pixel", region=[125, 130, -25, -20])

@maxrjones maxrjones added the question Further information is requested label Dec 22, 2021
@maxrjones
Copy link
Member Author

These are the tests that can be expected to fail after the @earth_relief update to V2.3 (n=45):

FAILED ../pygmt/clib/session.py::pygmt.clib.session.Session
FAILED ../pygmt/clib/session.py::pygmt.clib.session.Session.virtualfile_from_grid
FAILED ../pygmt/tests/test_datasets_earth_relief.py::test_earth_relief_01d
FAILED ../pygmt/tests/test_datasets_earth_relief.py::test_earth_relief_01d_with_region
FAILED ../pygmt/tests/test_datasets_earth_relief.py::test_earth_relief_30m
FAILED ../pygmt/tests/test_grd2cpt.py::test_grd2cpt
FAILED ../pygmt/tests/test_grdclip.py::test_grdclip_outgrid
FAILED ../pygmt/tests/test_grdclip.py::test_grdclip_no_outgrid
FAILED ../pygmt/tests/test_grdcontour.py::test_grdcontour
FAILED ../pygmt/tests/test_grdcontour.py::test_grdcontour_labels
FAILED ../pygmt/tests/test_grdcontour.py::test_grdcontour_slice
FAILED ../pygmt/tests/test_grdcontour.py::test_grdcontour_file
FAILED ../pygmt/tests/test_grdcontour.py::test_grdcontour_interval_file_full_opts
FAILED ../pygmt/tests/test_grdcut.py::test_grdcut_file_in_file_out
FAILED ../pygmt/tests/test_grdcut.py::test_grdcut_file_in_dataarray_out
FAILED ../pygmt/tests/test_grdcut.py::test_grdcut_dataarray_in_file_out
FAILED ../pygmt/tests/test_grdcut.py::test_grdcut_dataarray_in_dataarray_out
FAILED ../pygmt/tests/test_grdfill.py::test_grdfill_file_out
FAILED ../pygmt/tests/test_grdfilter.py::test_grdfilter_dataarray_in_dataarray_out
FAILED ../pygmt/tests/test_grdfilter.py::test_grdfilter_dataarray_in_file_out
FAILED ../pygmt/tests/test_grdfilter.py::test_grdfilter_file_in_dataarray_out
FAILED ../pygmt/tests/test_grdfilter.py::test_grdfilter_file_in_file_out
FAILED ../pygmt/tests/test_grdgradient.py::test_grdgradient_outgrid
FAILED ../pygmt/tests/test_grdgradient.py::test_grdgradient_no_outgrid
FAILED ../pygmt/tests/test_grdinfo.py::test_grdinfo
FAILED ../pygmt/tests/test_grdinfo.py::test_grdinfo_file
FAILED ../pygmt/tests/test_grdinfo.py::test_grdinfo_region
FAILED ../pygmt/tests/test_grdproject.py::test_grdproject_file_out
FAILED ../pygmt/tests/test_grdproject.py::test_grdproject_no_outgrid
FAILED ../pygmt/tests/test_grdtrack.py::test_grdtrack_input_dataframe_and_dataarray
FAILED ../pygmt/tests/test_grdtrack.py::test_grdtrack_input_csvfile_and_dataarray
FAILED ../pygmt/tests/test_grdtrack.py::test_grdtrack_input_dataframe_and_ncfile
FAILED ../pygmt/tests/test_grdtrack.py::test_grdtrack_input_csvfile_and_ncfile
FAILED ../pygmt/tests/test_grdtrack.py::test_grdtrack_without_outfile_setting
FAILED ../pygmt/tests/test_grdview.py::test_grdview_with_perspective_and_zscale
FAILED ../pygmt/tests/test_grdview.py::test_grdview_with_perspective_and_zsize
FAILED ../pygmt/tests/test_grdview.py::test_grdview_with_cmap_for_perspective_surface_plot
FAILED ../pygmt/tests/test_grdview.py::test_grdview_on_a_plane
FAILED ../pygmt/tests/test_grdview.py::test_grdview_on_a_plane_with_colored_frontal_facade
FAILED ../pygmt/tests/test_grdview.py::test_grdview_with_perspective_and_zaxis_frame
FAILED ../pygmt/tests/test_grdview.py::test_grdview_surface_plot_styled_with_contourpen
FAILED ../pygmt/tests/test_grdview.py::test_grdview_surface_mesh_plot_styled_with_meshpen
FAILED ../pygmt/tests/test_grdview.py::test_grdview_on_a_plane_styled_with_facadepen
FAILED ../pygmt/tests/test_makecpt.py::test_makecpt_truncated_zlow_zhigh
FAILED ../pygmt/tests/test_makecpt.py::test_makecpt_continuous

@maxrjones maxrjones mentioned this issue Dec 22, 2021
5 tasks
@willschlitzer
Copy link
Contributor

I think this is a good idea. Having a standardized grid should make it easier to write tests, and will save us time in the future. If nothing else, we should move grids to be over-land, as I assume land data is less likely to change instead of ocean data.

@willschlitzer
Copy link
Contributor

@meghanrjones Should we work to change the region arguments for the tests that have symmetric or global regions right now, or wait until the earth_relief update comes and we also (hopefully) move to a local xarray file?

@maxrjones
Copy link
Member Author

I think it would be better to update the region at the same as migrating to a locally defined xarray.DataArray to reduce reviewer work. If we chose to use a function that creates a DataArray rather than the remote files, we do not need to wait for the earth_relief update since we can create the grid based on the contents of SRTM15+V2.1.

@willschlitzer
Copy link
Contributor

Are we planning on using preset DataArrays rather than grids from load_earth_relief? Now that the new Earth relief files have been pushed, it seems like we should decide how this issue will be handled.

@weiji14
Copy link
Member

weiji14 commented Feb 2, 2022

I quite like @meghanrjones's idea at #1684 (comment) to have a pygmt.helpers.testing.static_earth_relief() function which could be a near drop-in replacement for existing tests using pygmt.datasets.load_earth_relief(). Or at least some function in pygmt.helpers.testing which loads a standard grid we can re-use across all tests.

There are some unit tests that run on an actual file (e.g. in test_grdimage.py) rather than an xarray.DataArray grid, so I'm thinking whether it's worth caching an @earth_relief_01d_p_v2.3 file (and possibly the gridline-registered version) at https://github.com/GenericMappingTools/gmtserver-admin/tree/master/cache which pygmt.helpers.testing.static_earth_relief() would pull from. Alternatively, we could generate a purely synthetic grid, even though I feel that testing on semi-real data is better.

@maxrjones
Copy link
Member Author

There are some unit tests that run on an actual file (e.g. in test_grdimage.py) rather than an xarray.DataArray grid, so I'm thinking whether it's worth caching an @earth_relief_01d_p_v2.3 file (and possibly the gridline-registered version) at https://github.com/GenericMappingTools/gmtserver-admin/tree/master/cache which pygmt.helpers.testing.static_earth_relief() would pull from. Alternatively, we could generate a purely synthetic grid, even though I feel that testing on semi-real data is better.

We could also use @tut_bathy.nc for some tests rather than @earth_relief_01d. It's a bit smaller (11 KB versus 114 KB) and wouldn't require a new cache file.

gmt grdinfo @tut_bathy.nc
gmt grdimage @tut_bathy.nc -B -png test
Title: ETOPO5 global topography
Command: grdreformat -fg bermuda.grd bermuda.nc=ns
Remark: /home/elepaio5/data/grids/etopo5.i2
Gridline node registration used [Geographic grid]
Grid file format: ns = GMT netCDF format (16-bit integer), CF-1.7
x_min: -66 x_max: -60 x_inc: 0.0833333333333 (5 min) name: Longitude n_columns: 73
y_min: 30 y_max: 35 y_inc: 0.0833333333333 (5 min) name: Latitude n_rows: 61
v_min: -5475 v_max: -89 name: Topography [m]
scale_factor: 1 add_offset: 0
format: classic
Default CPT: 

test

@willschlitzer
Copy link
Contributor

There are some unit tests that run on an actual file (e.g. in test_grdimage.py) rather than an xarray.DataArray grid, so I'm thinking whether it's worth caching an @earth_relief_01d_p_v2.3 file (and possibly the gridline-registered version) at https://github.com/GenericMappingTools/gmtserver-admin/tree/master/cache which pygmt.helpers.testing.static_earth_relief() would pull from. Alternatively, we could generate a purely synthetic grid, even though I feel that testing on semi-real data is better.

We could also use @tut_bathy.nc for some tests rather than @earth_relief_01d. It's a bit smaller (11 KB versus 114 KB) and wouldn't require a new cache file.

This still seems like it covers a pretty big area/has a lot of data; wouldn't it speed up our plotting tests more to cache a smaller grid file (something like 10 x10 degree squadre at 30 minute resolution)?

@weiji14
Copy link
Member

weiji14 commented Feb 3, 2022

There are some unit tests that run on an actual file (e.g. in test_grdimage.py) rather than an xarray.DataArray grid, so I'm thinking whether it's worth caching an @earth_relief_01d_p_v2.3 file (and possibly the gridline-registered version) at https://github.com/GenericMappingTools/gmtserver-admin/tree/master/cache which pygmt.helpers.testing.static_earth_relief() would pull from. Alternatively, we could generate a purely synthetic grid, even though I feel that testing on semi-real data is better.

We could also use @tut_bathy.nc for some tests rather than @earth_relief_01d. It's a bit smaller (11 KB versus 114 KB) and wouldn't require a new cache file.

This still seems like it covers a pretty big area/has a lot of data; wouldn't it speed up our plotting tests more to cache a smaller grid file (something like 10 x10 degree squadre at 30 minute resolution)?

Ok, how about we make a pygmt.helpers.testing.static_relief() based on a subset of the @tut_bathy.nc? Further to Meghan's point, we probably want a rectangular sample grid (e.g. 4 x 6 pixels) so that we don't get accidental errors with longitude and latitude mixed up.

@willschlitzer
Copy link
Contributor

There are some unit tests that run on an actual file (e.g. in test_grdimage.py) rather than an xarray.DataArray grid, so I'm thinking whether it's worth caching an @earth_relief_01d_p_v2.3 file (and possibly the gridline-registered version) at https://github.com/GenericMappingTools/gmtserver-admin/tree/master/cache which pygmt.helpers.testing.static_earth_relief() would pull from. Alternatively, we could generate a purely synthetic grid, even though I feel that testing on semi-real data is better.

We could also use @tut_bathy.nc for some tests rather than @earth_relief_01d. It's a bit smaller (11 KB versus 114 KB) and wouldn't require a new cache file.

This still seems like it covers a pretty big area/has a lot of data; wouldn't it speed up our plotting tests more to cache a smaller grid file (something like 10 x10 degree squadre at 30 minute resolution)?

Ok, how about we make a pygmt.helpers.testing.static_relief() based on a subset of the @tut_bathy.nc? Further to Meghan's point, we probably want a rectangular sample grid (e.g. 4 x 6 pixels) so that we don't get accidental errors with longitude and latitude mixed up.

That works for me.

@maxrjones maxrjones added maintenance Boring but important stuff for the core devs and removed question Further information is requested labels Feb 4, 2022
@maxrjones maxrjones changed the title Test improvements during migration away from grdinfo? Test improvements during migration away from grdinfo Feb 4, 2022
@maxrjones
Copy link
Member Author

I started working on this but it turns out that it's super annoying to create a static DataArray based on a grid with a 0.0833333333333 grid spacing because of the checks in place for irregular grids. Based on this trial, I would prefer to go back to @weiji14's suggestion to add a small file to the cache based on @earth_relief_01d_p V2.3. I could submit a PR to add a file based on gmt grdcut @earth_relief_01d_p -R-55/-47/-24/-10 -Gstatic_earth_relief.nc to the cache. The output from this command is 1.6 KB with the following metadata:

static_earth_relief.nc: Title: Produced by grdcut
static_earth_relief.nc: Command: grdcut @earth_relief_01d_p -R-55/-47/-24/-10 -Gstatic_earth_relief.nc
static_earth_relief.nc: Remark: Reduced by Gaussian Cartesian filtering (111.2 km fullwidth) from SRTM15_V2.3.nc [Sandwell et al., 2022; https://doi.org/10.1029/2021EA002069]
static_earth_relief.nc: Pixel node registration used [Geographic grid]
static_earth_relief.nc: Grid file format: nf = GMT netCDF format (32-bit float), CF-1.7
static_earth_relief.nc: x_min: -55 x_max: -47 x_inc: 1 name: longitude n_columns: 8
static_earth_relief.nc: y_min: -24 y_max: -10 y_inc: 1 name: latitude n_rows: 14
static_earth_relief.nc: v_min: 190 v_max: 981 name: elevation (m)
static_earth_relief.nc: scale_factor: 1 add_offset: 0
static_earth_relief.nc: format: classic
static_earth_relief.nc: Default CPT: geo

@willschlitzer
Copy link
Contributor

@meghanrjones This works for me! Once you submit that file to the cache I'll update the pull requests to not use a DataArray and instead used that cached grid file.

@seisman
Copy link
Member

seisman commented Feb 5, 2022

@willschlitzer is adding some inline examples in module docstrings (addressing #1686) which access the latest earth relief data by calling the load_earth_relief function. Although these inline examples are skipped in tests, some numbers in these examples will be outdated when new earth relief grids are released.

@willschlitzer
Copy link
Contributor

@willschlitzer is adding some inline examples in module docstrings (addressing #1686) which access the latest earth relief data by calling the load_earth_relief function. Although these inline examples are skipped in tests, some numbers in these examples will be outdated when new earth relief grids are released.

Should I not be using load_earth_relief in the inline examples? I think that's the most accessible way for a new user to get an example grid to use to try a function.

@willschlitzer
Copy link
Contributor

@meghanrjones Now that we have a standard file (which I'm assumed is called with @static_earth_relief), should we be calling the remote file directly or is the plan to add a function to call it?

@maxrjones
Copy link
Member Author

@meghanrjones Now that we have a standard file (which I'm assumed is called with @static_earth_relief), should we be calling the remote file directly or is the plan to add a function to call it?

We can use @static_earth_relief for tests that run on an actual file, but I think we should add this file to pygmt.datasets.list_sample_data and pygmt.datasets.load_sample_data to return an xarray.DataArray for tests on DataArray input. I could submit a PR for this tomorrow AM, but if you or anyone else has time to add this before then please feel free to do so.

@maxrjones maxrjones changed the title Test improvements during migration away from grdinfo Use a static grid, asymmetric region, and xarray.testing for raster tests Feb 7, 2022
@maxrjones
Copy link
Member Author

FYI I updated the checklist at the start of the issue to include all tests that currently use earth_relief except test_grdimage.py which requires a global dataset.

@willschlitzer
Copy link
Contributor

Why wouldn't we use # doctest: +SKIP in the pygmt.clib.session example?

@maxrjones
Copy link
Member Author

Why wouldn't we use # doctest: +SKIP in the pygmt.clib.session example?

The other modules have unit tests in the corresponding pygmt/tests/test_*.py, but that method does not have a unit test in test_clib.py

@willschlitzer
Copy link
Contributor

I'm trying to rewrite the tests for grdtrack. I'm not able to use ocean_ride_points.txt with the static Earth relief grid input. Should we use a different input for grdtrack?

@willschlitzer
Copy link
Contributor

FYI I updated the checklist at the start of the issue to include all tests that currently use earth_relief except test_grdimage.py which requires a global dataset.

Should we rewrite the grdimage tests? They have some of the longest test durations; I would assume they could be sped up significantly if we swapped out the grid to use static_earth_relief.

@maxrjones
Copy link
Member Author

FYI I updated the checklist at the start of the issue to include all tests that currently use earth_relief except test_grdimage.py which requires a global dataset.

Should we rewrite the grdimage tests? They have some of the longest test durations; I would assume they could be sped up significantly if we swapped out the grid to use static_earth_relief.

It may be possible to use @static_earth_relief.nc for a few of the grdimage tests, but others are regression tests that require larger areas than in static_earth_relief. So it would require going through the issues referenced in each test to determine whether the bug could be reproduced with a smaller region.

@seisman seisman closed this as completed Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

No branches or pull requests

4 participants