Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lat/lon gridded data does not have monotonically increasing latitudes #60

Open
jbusecke opened this issue Oct 17, 2023 · 3 comments
Open

Comments

@jbusecke
Copy link

First of all THANK YOU so much for this effort! Having ERA5 data available in an ARCO format is truly a game changer!

I noticed a small issue: The latitudes of the lat/lon gridded data 1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2/ seems to have decreasing latitude values

import xarray as xr

ar_full_37_1h = xr.open_zarr(
    'gs:https://gcp-public-data-arco-era5/ar/1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2/',
).isel(time=0)
ar_full_37_1h
image

which makes selecting a region in xarray slightly counterintuitive:

ar_full_37_1h.sel(latitude=slice(-50, 50))

returns no latitude indicies
image

while

ar_full_37_1h.sel(latitude=slice(50, -50))

gives (the desired)

image

If you end up reprocessing the data at some point, I wonder if something like xarrays ds.sortby('latitude') or equivalent could be added to the pipeline.

@tom-andersson
Copy link

@jbusecke this is standard for reanalysis data (such as ERA5), although I agree it is counterintuitive.

For example, try loading sample NCEP reanalysis data with xarray:

>>> import xarray as xr
>>> print(xr.tutorial.open_dataset("air_temperature"))
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http:https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

The latitude values are also in decreasing order.

@jbusecke
Copy link
Author

Oh interesting, I did not know that. Thanks @tom-andersson.

I personally would make the argument that this is something that 'should' be changed to make the data more analysis ready, but I guess this is somewhat personal preference and it would be good if there is more general guidance on this that ARCO-producers could refer to.
In fact I wonder if this is something that would fall under a 'tidy array' concept (see this talk from scipy this year). @dcherian, where would be a good place to discuss this sort of stuff?

@dcherian
Copy link

I agree that this is not ideal but there are very many datasets like this ;) particularly in the raster imaging space.

See pydata/xarray#1613 for a discussion on a nicer API that ignores order of the coordinate variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants