Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cftime resampling error #9108

Closed
dcherian opened this issue Jun 12, 2024 · 7 comments · Fixed by #9116
Closed

cftime resampling error #9108

dcherian opened this issue Jun 12, 2024 · 7 comments · Fixed by #9116

Comments

@dcherian
Copy link
Contributor

dcherian commented Jun 12, 2024

What happened?

Something is very wrong with CFTime resampling for some inputs.

What did you expect to happen?

No error

Minimal Complete Verifiable Example

import dask.array
import numpy as np
import xarray as xr

ds = xr.Dataset(
    {"pr": ("time", dask.array.random.random((10,), chunks=(10,)))},
    coords={"time": xr.date_range("0001-01-01", periods=10, freq="D")},
)
ds.resample(time="ME")
ValueError: Data shape (9,) must match shape of object (10,)
@spencerkclark
Copy link
Member

Can you show the output of xr.show_versions()? I am actually not able to reproduce this in the two environments I've tried (one has xarray main installed):

>>> xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:50:49) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2023.4.3.dev863+gce196d56
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: installed
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.3
nc_time_axis: 1.4.1
iris: 3.9.0
bottleneck: 1.3.8
dask: 2024.5.2
distributed: 2024.5.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.6.0
cupy: None
pint: None
sparse: 0.15.4
flox: 0.9.8
numpy_groupies: 0.11.1
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: None
sphinx: None

@dcherian
Copy link
Contributor Author

dcherian commented Jun 13, 2024

Here are the versions:

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.2.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.0
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.5.2
distributed: 2024.5.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.6.0
cupy: None
pint: 0.23
sparse: 0.15.4
flox: 0.9.8
numpy_groupies: 0.11.1
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: 8.25.0
sphinx: 7.3.7

I get a bunch of these warnings too:

[/Users/deepak/miniforge3/envs/xarray-release/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304](http:https://localhost:8888/lab/tree/repos/devel/xarray/miniforge3/envs/xarray-release/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py#line=303): CFWarning: year=0 was specified - this date[/calendar/year](http:https://localhost:8888/calendar/year) zero convention is not supported by CF
  reference = type(date)(year, month, 1)

@spencerkclark
Copy link
Member

spencerkclark commented Jun 13, 2024

Yeah, I get those warnings too. We may decide to do something to silence those, but I think that's a separate issue (not that it necessarily excuses it, but I think they have existed for a while for this case).

Weirdly I still cannot reproduce the ValueError with an environment built to be just like yours:

$ python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dask.array; import numpy as np; import xarray as xr
>>> ds = xr.Dataset({"pr": ("time", dask.array.random.random((10,), chunks=(10,)))},coords={"time": xr.date_range("0001-01-01", periods=10, freq="D")},)
>>> ds.resample(time="ME")
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: year=0 was specified - this date/calendar/year zero convention is not supported by CF
  reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: this date/calendar/year zero convention is not supported by CF
  reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
  return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:308: CFWarning: this date/calendar/year zero convention is not supported by CF
  return date.replace(year=year, month=month, day=day)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
  return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftimeindex.py:563: CFWarning: this date/calendar/year zero convention is not supported by CF
  return CFTimeIndex(np.array(self) + other)
DatasetResample, grouped over '__resample_dim__'
1 groups with labels 0001-01-31, 00:00:00.
>>> ds.resample(time="ME").mean().compute()
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: year=0 was specified - this date/calendar/year zero convention is not supported by CF
  reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:304: CFWarning: this date/calendar/year zero convention is not supported by CF
  reference = type(date)(year, month, 1)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
  return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:308: CFWarning: this date/calendar/year zero convention is not supported by CF
  return date.replace(year=year, month=month, day=day)
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftime_offsets.py:262: CFWarning: this date/calendar/year zero convention is not supported by CF
  return (reference - timedelta(days=1)).day
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/xarray/coding/cftimeindex.py:563: CFWarning: this date/calendar/year zero convention is not supported by CF
  return CFTimeIndex(np.array(self) + other)
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
<xarray.Dataset> Size: 16B
Dimensions:  (time: 1)
Coordinates:
  * time     (time) object 8B 0001-01-31 00:00:00
Data variables:
    pr       (time) float64 8B 0.4763
>>> xr.show_versions()
/Users/spencer/mambaforge/envs/2024-06-13-cftime-resample-env/lib/python3.11/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.0
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.5.2
distributed: 2024.5.2
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.6.0
cupy: None
pint: 0.23
sparse: 0.15.4
flox: 0.9.8
numpy_groupies: 0.11.1
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: 8.25.0
sphinx: 7.3.7

This is the only diff in versions:

$ diff Deepak Spencer
4c4
< python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
---
> python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]
7,9c7,9
< OS-release: 23.2.0
< machine: arm64
< processor: arm
---
> OS-release: 23.5.0
> machine: x86_64
> processor: i386

dcherian added a commit to dcherian/xarray that referenced this issue Jun 13, 2024
@dcherian
Copy link
Contributor Author

dcherian commented Jul 26, 2024

Here's a minimal reproducer

import xarray as xr
from xarray.coding.cftime_offsets import MonthEnd
from xarray.core.resample_cftime import _get_time_bins

index = xr.date_range("0001-01-01", periods=10, freq="D")
datetime_bins, label = _get_time_bins(
    index,
    freq=MonthEnd(1),
    closed="right",
    label="right",
    origin="start_day",
    offset=None,
)
print(repr(datetime_bins[0]), '\n', repr(index[0]))
datetime_bins[0] < index[0] # should be True!

The output is

cftime.DatetimeGregorian(0, 12, 31, 23, 59, 59, 999999, has_year_zero=True) 
 cftime.DatetimeGregorian(1, 1, 1, 0, 0, 0, 0, has_year_zero=False)

The two have different has_year_zero and may be what's screwing up the comparison (np.searchsorted gets used later).

So if I choose to start with year 2 it's all fine.

@spencerkclark
Copy link
Member

Thanks @dcherian—that diagnosis was super helpful. It makes sense that this is sort of an edge case, given that our original tests did not catch it. I think the issue is likely in here:

def _shift_month(date, months, day_option: DayOption = "start"):
"""Shift the date to a month start or end a given number of months away."""
if cftime is None:
raise ModuleNotFoundError("No module named 'cftime'")
delta_year = (date.month + months) // 12
month = (date.month + months) % 12
if month == 0:
month = 12
delta_year = delta_year - 1
year = date.year + delta_year
if day_option == "start":
day = 1
elif day_option == "end":
reference = type(date)(year, month, 1)
day = _days_in_month(reference)
else:
raise ValueError(day_option)
return date.replace(year=year, month=month, day=day)

In other words we need to ensure that we take whether has_year_zero is True or False into account when computing the new year when shifting months (this is what ends up creating the instance with has_year_zero=True in the example). When I get a chance I'll see if I can push a fix to #9116.

@dcherian
Copy link
Contributor Author

Thanks! Why would this be OS-dependent though?

@spencerkclark
Copy link
Member

spencerkclark commented Jul 31, 2024

There may be an underlying cftime issue related to comparison, e.g. this comparison is True on my machine, but False on yours:

>>> cftime.DatetimeGregorian(0, 12, 31, 23, 59, 59, 999999, has_year_zero=True) < cftime.DatetimeGregorian(1, 1, 1, has_year_zero=False)
<stdin>:1: CFWarning: this date/calendar/year zero convention is not supported by CF
True

but regardless I think generating dates with mismatched has_year_zero is an xarray bug.

spencerkclark added a commit that referenced this issue Aug 3, 2024
* Add test for #9108

* Update xarray/tests/test_groupby.py

Co-authored-by: Spencer Clark <[email protected]>

* Take has_year_zero into account in offset arithmetic

* Fix test

* Add whats-new

* Update doc/whats-new.rst

* Add more detail to what's new entry

* Modify wording slightly, since this does not always happen when the time
coordinate includes "0001-01-01".

---------

Co-authored-by: Spencer Clark <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants