Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the notes that high-resolution remote datasets don't support slice operations #2378

Closed
seisman opened this issue Feb 21, 2023 · 5 comments · Fixed by #2384
Closed
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@seisman
Copy link
Member

seisman commented Feb 21, 2023

In the documentation of all remote datasets (e.g., https://www.pygmt.org/v0.8.0/api/generated/pygmt.datasets.load_earth_age.html), we have notes like:

The xarray.DataArray grid doesn’t support slice operation, for Earth seafloor crustal age with resolutions of 5 arc-minutes or higher, which are stored as smaller tiles.

The notes are inaccurate:

  1. After PR GMTDataArrayAccessor: Fallback to default grid registration and gtype if the grid source file doesn't exist  #2009 was merged, it's OK to do slice operation on high-resolution remote datasets, although the registration and gtype properties are lost (see Document limitations of GMT xarray accessors #2375 and Make a gmt xarray accessor to store metadata from grdinfo #499 for the reason).
>>> from pygmt.datasets import load_earth_relief
>>> grid = load_earth_relief(resolution="05m", region=[0, 5, 0, 5], registration="pixel")
>>> grid.gmt.registration, grid.gmt.gtype
(1, 1)
>>> grid2 = grid[0:2, 0:2]
>>> grid2.gmt.registration, grid2.gmt.gtype
(0, 0)
  1. The properties will also be lost for low-resolution datasets after some grid operations:
>>> from pygmt.datasets import load_earth_relief
>>> grid =  load_earth_relief()
>>> grid.gmt.registration, grid.gmt.gtype
(0, 1)
>>> grid.encoding
{'zlib': True,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': True,
 'complevel': 9,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (181, 181),
 'source': '/home/seisman/.gmt/server/earth/earth_relief/earth_relief_01d_g.grd',
 'original_shape': (181, 361),
 'dtype': dtype('int16'),
 '_FillValue': -32768,
 'scale_factor': 0.5}
>>> grid2 = grid * 2.0
>>> grid2.encoding
{}
>>> grid2.gmt.registration, grid2.gmt.gtype
(0, 0)

These are all caused by the limitations of the xarray accessors (#499), so it should be documented in a single place (as done in PR #2375). So, I think we should remove these notes for remote datasets, or have a note saying that "The grid registration and gtype information may lose after grid operations." and link to the GMT xarray accessor documentation (i.e., #2375) for reference and workarounds.

@seisman seisman added the question Further information is requested label Feb 21, 2023
@seisman
Copy link
Member Author

seisman commented Feb 26, 2023

I think it makes more sense to remove the notes about "slice operation is not supported for high-resolution datasets", and add a note like:

The registration and coordinate system type of the returned grids can be accessed by the GMT accessors (i.e., DataArray.gmt.registration and DataArray.gmt.gtype respectively).
However, these properties may be lost after specific operations and need to be set manually before further passed to PyGMT functions. Refer to :class:pygmt.GMTDataArrayAccessor for detailed explanations and workarounds.

@seisman
Copy link
Member Author

seisman commented Feb 26, 2023

Ping @GenericMappingTools/pygmt-maintainers for comments on it and review PR #2384.

@weiji14
Copy link
Member

weiji14 commented Feb 26, 2023

In the documentation of all remote datasets (e.g., https://www.pygmt.org/v0.8.0/api/generated/pygmt.datasets.load_earth_age.html), we have notes like:

The xarray.DataArray grid doesn’t support slice operation, for Earth seafloor crustal age with resolutions of 5 arc-minutes or higher, which are stored as smaller tiles.

The notes are inaccurate:

1. After PR [GMTDataArrayAccessor: Fallback to default grid registration and gtype if the grid source file doesn't exist  #2009](https://github.com/GenericMappingTools/pygmt/pull/2009) was merged, it's OK to do slice operation on high-resolution remote datasets, although the registration and gtype properties are lost

Just for context, that sentence was added originally added in #542 to the load_earth_relief function, and has been copied to other load_earth_* functions since then. The slicing issue for high-resolution (higher than 5 arc seconds) remote datasets in that sentence is referring to slicing by grids that have been grdcut, see also #524 for additional context.

I think we'll need to be more specific on which grids support slicing (grids loaded with pygmt.which and load_dataarray) and which don't (grids loaded with pygmt.grdcut):

# different ways to load tiled and non-tiled grids.
# Known issue: tiled grids don't support slice operation
# See https://github.com/GenericMappingTools/pygmt/issues/524
if region is None:
if dataset.resolutions[resolution].tiled:
raise GMTInvalidInput(
f"'region' is required for {dataset.title} resolution '{resolution}'."
)
fname = which(f"@{dataset_prefix}{resolution}{reg}", download="a")
grid = load_dataarray(fname, engine="netcdf4")
else:
grid = grdcut(f"@{dataset_prefix}{resolution}{reg}", region=region)

Maybe reword the sentence to mention that setting a region and/or using high-resolution grids >5 arc minutes will cause slicing to lose the registration/gtype information due to there not being a NetCDF source for grdinfo to read from?

@seisman
Copy link
Member Author

seisman commented Feb 27, 2023

Maybe reword the sentence to mention that setting a region and/or using high-resolution grids >5 arc minutes will cause slicing to lose the registration/gtype information due to there not being a NetCDF source for grdinfo to read from?

My main point is, the registration and gtype properties will be lost after almost any grid operations and slicing is just one of them. So the note should be very general rather than focus on "slicing" only.

@weiji14
Copy link
Member

weiji14 commented Mar 17, 2023

Maybe reword the sentence to mention that setting a region and/or using high-resolution grids >5 arc minutes will cause slicing to lose the registration/gtype information due to there not being a NetCDF source for grdinfo to read from?

My main point is, the registration and gtype properties will be lost after almost any grid operations and slicing is just one of them. So the note should be very general rather than focus on "slicing" only.

Mm, you're right. It's not just slicing that causes the registration and gtype properties to be lost, though slicing is one of the more common operations that could be mentioned as an example. Let me make an inline comment on #2384.

@seisman seisman added documentation Improvements or additions to documentation and removed question Further information is requested labels Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants