[Bug]: Out of bounds timestamp for temporal averaging #301

pochedls · 2022-08-04T17:46:13Z

What happened?

In trying to compute annual averages for data with non-cf-compliant time units I get an OutOfBoundsError.

What did you expect to happen?

I think this functionality should work with non-cf-compliant time.

Minimal Complete Verifiable Example

import xcdat

fn = '/p/user_pub/climate_work/pochedley1/cmip6_msu/ttt_Amon_ACCESS-ESM1-5_piControl_r1i1p1f1_gr_10101-100012.nc'
ds = xcdat.open_dataset(fn)
ds_yearly = ds.temporal.group_average('tf2', freq='year', weighted=True)

Relevant log output

.
.
.

File ~/code/xcdat/xcdat/temporal.py:1358, in TemporalAccessor._convert_df_to_dt(self, df)
   1351     dates = [
   1352         cftime.datetime(year, month, day, hour)
   1353         for year, month, day, hour in zip(
   1354             df_new.year, df_new.month, df_new.day, df_new.hour
   1355         )
   1356     ]
   1357 else:
-> 1358     dates = pd.to_datetime(df_new)
   1360 return np.array(dates)

.
.
.

ValueError: cannot assemble the datetimes: Out of bounds nanosecond timestamp: 1010-10-01 00:00:00

Anything else we need to know?

This issue does not occur with original data (with cf-compliant datetimes).

The condensed log above shows that this issue arises in the _convert_df_to_dt function when the dataframe is cast to datetime objects using pandas (because year_is_unused = False). year_is_unused is set to False because the mode is set to average and the _freq is not yearly. I don't totally understand why we use cftime for this step for some conditions, but use pandas for others.

Can we use cftime for all cases? I'm wondering if we started by using pandas and then used cftime for some conditions (e.g., see the documentation), but in reality we may be able to simply use cftime. @tomvothecoder – do you remember these details well enough to know?

If I remove the if year_is_unused conditional and re-run the code I don't get an error. But I fail a bunch of unit tests – I'm not sure if this is a unit issue or an actual problem.

Also note that it is possible that this issue affects other functions/functionality (but using cftime might resolve those potential issues).

Environment

main branch

The text was updated successfully, but these errors were encountered:

tomvothecoder · 2022-08-04T18:52:56Z

Thanks for raising this issue @pochedls. I'll be able to take a closer look next week, but it looks like you're beating me to it.

Here's that method's docstring to explain when cftime or pd.to_datetime() is used:

xcdat/xcdat/temporal.py

Lines 1308 to 1318 in 050d57e

 def _convert_df_to_dt(self, df: pd.DataFrame) -> np.ndarray: 

 """Converts a DataFrame of datetime components to datetime objects. 

  datetime objects require at least a year, month, and day value. However, 

  some modes and time frequencies don't require year, month, and/or day 

  for grouping. For these cases, use default values of 1 in order to 

  meet this datetime requirement. 

  If the default value of 1 is used for the years, datetime objects 

  must be created using `cftime.datetime` because year 1 is outside the 

  Timestamp-valid range.

Basically, I didn't take into account a situation where you would compute averages for datasets outside the pd.Timestamp range (e.g., start year 1010).

Can we use cftime for all cases? I'm wondering if we started by using pandas and then used cftime for some conditions (e.g., see the documentation), but in reality we may be able to simply use cftime. @tomvothecoder – do you remember these details well enough to know?

Yes, I think using cftime is the safest thing to do here because it accommodates datasets outside the pd.Timestamp range.

If I remove the if year_is_unused conditional and re-run the code I don't get an error. But I fail a bunch of unit tests – I'm not sure if this is a unit issue or an actual problem.

Also note that it is possible that this issue affects other functions/functionality (but using cftime might resolve those potential issues).

Yes, the solution is to probably remove the year_is_unused conditional altogether and fix the tests (they probably expect datetime objects and not cftime). Also, we should derive the correct cftime object based on the calendar type, similar to decode_non_cf_time().

* attempted fix for #301: use cftime for temporal operations * Update temporal accessor to use specific cftime objects - Update format of `decode_non_cf_time()` conditional statement - Update `time_cf` with encoding dict attributes (xarray behavior) - Update `test_dataset.py` and `test_temporal.py` with specific `cftime` date_type objects - Add `xesmf` to `dev.yml` Co-authored-by: Tom Vo <[email protected]>

pochedls added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Aug 4, 2022

pochedls added a commit that referenced this issue Aug 4, 2022

attempted fix for #301

0d1352c

pochedls mentioned this issue Aug 4, 2022

Use cftime for temporal averaging operations #302

Merged

9 tasks

tomvothecoder pushed a commit that referenced this issue Aug 4, 2022

attempted fix for #301

197281d

tomvothecoder pushed a commit that referenced this issue Aug 10, 2022

attempted fix for #301

f83b7b7

tomvothecoder assigned pochedls and tomvothecoder Aug 10, 2022

pochedls closed this as completed in #302 Aug 10, 2022

pochedls mentioned this issue Aug 12, 2022

[Refactor]: Possibly refactor our logic for converting numeric time values to datetime #310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Out of bounds timestamp for temporal averaging #301

[Bug]: Out of bounds timestamp for temporal averaging #301

pochedls commented Aug 4, 2022

tomvothecoder commented Aug 4, 2022 •

edited

Loading

[Bug]: Out of bounds timestamp for temporal averaging #301

[Bug]: Out of bounds timestamp for temporal averaging #301

Comments

pochedls commented Aug 4, 2022

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

Anything else we need to know?

Environment

tomvothecoder commented Aug 4, 2022 • edited Loading

tomvothecoder commented Aug 4, 2022 •

edited

Loading