[Feature]: Add functionality to accurately set time bounds #412

pochedls · 2023-01-26T18:42:53Z

Is your feature request related to a problem?

Following Discussion #411 – xCDAT should modify how it handles bounds in general and time bounds in particular. xCDAT currently uses the bounds within the dataset, but if they are not present it will generate bounds for each axis (by assuming that the bounds are the midpoint between each axis point). This is problematic for time coordinates because a) time points are frequently not saved out at the midpoint value and b) months are not equal in size so the halfway point between two months do not yield the correct bound.

This is illustrated in Slide 4 of climate_specific_tools.pdf.

For example, ERA5 monthly mean data is saved at midnight on the first day of the month:

import xcdat as xc
fn = '/p/user_pub/climate_work/pochedley1/ecmwf/era5/new/tas_era5.nc'
ds = xc.open_dataset(fn)
ds.time[0]

...
array(cftime.DatetimeGregorian(1979, 1, 1, 0, 0, 0, 0, has_year_zero=False),
dtype=object)
...

xCDAT then autogenerates incorrect bounds:

print(ds.bounds.get_bounds('T')[0])

...
array([cftime.DatetimeGregorian(1978, 12, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 1, 16, 12, 0, 0, 0, has_year_zero=False)],
dtype=object)
...
xcdat_bounds: True

Note that this issue also applies to vertical coordinate which are frequently not equally spaced.

Describe the solution you'd like

Addressing this issue would take a few changes:

By default, auto-generate missing bounds for latitude and longitude only (i.e., do not automatically set bounds for vertical and temporal coordinates). This would require modification of xcdat.open_dataset() (in function and documentation).
Create functions to create sub-daily, daily, and monthly time bounds, e.g., setAxisTimeBoundsMonthly() that set the time bounds at the start and end of the period (e.g., for the ERA5 case above, the bounds should be 1979-01-01 00:00:00 and 1979-02-01 00:00:00).
Consider allowing the user to set the bounds that xcdat creates
- This could be a) an option in xcdat.open_dataset or b) a global option as in CDAT's setAutoBounds() function: Allow the user to determine which of t, x, y, z should have bounds creation, e.g., gen_bounds = None | List['X', 'Y', 'T', 'Z']
- If this were implemented we'd likely need some way of inferring what function to use when setting the time bounds (as in isMonthly())

Describe alternatives you've considered

I think that the minimal solution would be to disable temporal and vertical bounds generation by default on dataset open (users could still call xCDAT's functions to generate bounds at their own risk).

Additional context

I think the above changes would largely address the issue.

I did think about an edge case that I think would be mishandled by related tempoeral functions (e.g., xcdat.temporal.group_average()). Consider a dataset with bounds (so xCDAT would not generate bounds) that are weird (but correct):

Center Point	January 1, 2022 00:00:00		February 1, 2022 00:00:00		March 1, 2022 00:00:00		...
Bounds	December 15, 2022 00:00:00	January 16, 2022 00:00:00	January 16, 2022 00:00:00	February 16, 2022 00:00:00	February 16, 2022 00:00:00	March 16, 2022 00:00:00	...	...

If we were to take an annual average of this data (for 2022), xCDAT would first figure out the weights by taking the difference of the upper and lower bounds (31, 31, 28, ...) and use this vector as the weights in the annual average. But some of the data should not be included in the annual average; according to the bounds ~half of the January data is supposedly in December (and the weight should be 15 days for the annual average in 2022).

I don't think this would be an issue for the average function, but probably does affect climatology and departures.

I think this is likely an edge case and I'm not sure what CDAT does in this situation, but it came to mind in thinking about this issue.

The text was updated successfully, but these errors were encountered:

lee1043 · 2023-01-26T19:10:40Z

I am glad @pochedls brought this subject for the team's attention (thanks, @pochedls!). In PMP workflow I have used cdutil.setTimeBoundsDaily to ensure monthly time series of models to have correct monthly bounds because some models have their time point in the middle of month while others have beginning of month. Use case for cdutil.setTimeBoundsDaily is described in this document (search for setTimeBoundsDaily). I was hoping we can have the similar capability in xCDAT. I second that @pochedls suggests.

Below could be somewhat related (thanks @taylor13 for pointing this to us)
https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#climatological-seasons-ex

taylor13 · 2023-01-26T21:47:16Z

I would note that even for latitude/longitude grids, creating bounds half-way between coordinate values is not always correct. For gaussian grids and other latxlon grids that are not equally spaced, you would get the wrong values, in general, for the bounds. I would only auto-generate bounds after determining whether a coordinate represented equal intervals (of whatever it measures, be it latitude, time, vertical position, or whatever).

pochedls · 2023-01-26T22:06:32Z

I would note that even for latitude/longitude grids, creating bounds half-way between coordinate values is not always correct. For gaussian grids and other latxlon grids that are not equally spaced, you would get the wrong values, in general, for the bounds. I would only auto-generate bounds after determining whether a coordinate represented equal intervals (of whatever it measures, be it latitude, time, vertical position, or whatever).

Thanks for reviewing this @taylor13. If a model doesn't provide bounds and a user needed to do an operation with bounds, I think most people would still want to use the generated bounds even if they were imperfect. Do you think a warning would be appropriate in this situation? The user could then decide if they wanted to not use the dataset (or email the modeling center to ask for the bounds?).

taylor13 · 2023-01-26T22:11:03Z

Yes, I would certainly include a warning, and for most cases one might get away with inaccurate bounds, so generating them is probably what most users will want.

chengzhuzhang · 2023-01-26T23:40:09Z

I appreciate this discussion. Personally, for time dimension, i always thought time bounds are more reliable than the time points, because different data centers store their data at different points of the range (beginning, middle, or end). When the time bounds are missing, to generate the accurate bounds, one needs to know exactly how the data is recorded. I agree that we should provide option for these users (by providing similar function provided by setTimeBoundsFREQ).

However, I was not aware of how to properly use setTimeBoundsFREQ until reading the discussion here #411.

If looping over multiple models, using setTimeBoundsMonthly(s) without specifying the second argument, would assume data recorded at beginning of the time interval for all models, which could result inaccurate bounds if otherwise.

If the data users don't know how the time is recorded, I think to provide an educated guess with a warning is still valuable.

lee1043 · 2023-01-27T00:15:38Z

If looping over multiple models, using setTimeBoundsMonthly(s) without specifying the second argument, would assume data recorded at beginning of the time interval for all models, which could result inaccurate bounds if otherwise.

@chengzhuzhang Regarding setTimeBoundsMonthly in my understanding default second argument is 0, which set time bounds to beginning and end of month, by default. I think the argument option for data recorded at end of month was added to prevent occasions such for example data recorded at 2/29 but somehow with combined with incorrect calendar or operation it could be recognized as 3/1. So the option is to prevent such case (see here). I think such occasion might be less common with current xarray/xcdat data handling.

chengzhuzhang · 2023-01-27T00:45:21Z

Hi @lee1043 thank you for pointing to the code! I guess I was confused by the documentation as follows...

>>> import cdutil, cdms
>>> f = cdms.open('some_monthly_data.nc')
>>> x = f('var')
>>> cdutil.setTimeBoundsMonthly(x)
# The default action of the setTimeBoundsMonthly function assumes that the
time point is at the beginning of the month. To compute bounds assuming that
the time point at the end of the month,
>>> cdutil.setTimeBoundsMonthly(x, 1)

The native E3SM output records time at the end of time intervals, luckily it provides time bounds and xcdat can open the datasets and correct it with time centered.

jypeter · 2023-01-27T15:46:27Z

Thanks @pochedls for generating this feature request!

The user should definitely get warnings when missing bounds are automatically generated at midpoint. The midpoint option is probably a better than nothing poor-man option.

I'm not sure other options than midpoint would make sense. If somebody is going to rewrite/improve some functions, you may want to get rid of the width parameter that appears in

I have tried to follow the source links in xcdat.bounds.BoundsAccessor, but I was quickly lost. I did find a reference to iris and a guess_bounds(bound_position=0.5) function. As a parameter name, bound_position sounds more like a fraction and makes more sense than width, but any other value than 0.5 (i.e midpoint) would not make much sense anyway.

So you might as well get rid of this width parameter, simplify some source code, and simplify the documentation (by only talking about midpoint)!

Would it be possible to have a bounds.add_custom_bounds function? This is probably already possible in xarray, but a nice wrapper provided by xcdat would be nice for the climate community end user. And the wrapper documentation could tell the user what type of xarray compatible parameters you have to pass, and possibly perform some basic sanity checks (e.g. axes values are in the bounds' range, bounds are monotonically increasing or decreasing, this kind of stuff)

taylor13 · 2023-01-27T18:56:08Z

Regarding: "width (float, optional) – Width of the bounds relative to the position of the nearest points, by default 0.5."

I agree with @jypeter that rarely would one specify anything but 0.5, but since that's the default, I don't see any harm retaining it. I think the documentation should make clear how the bounds are generated when data are unequally spaced. I don't know what the algorithm currently does, but when width=0.5, I would place the cell bound half-way between a cell's coordinate value and the next cell's coordinate value. For the first and last cells (where there is no "next cell"), I would place the bound such that the coordinate value lies half-way between its bounds (i.e., the bound would be located at the same distance from the coordinate value as the other bound). If that approach is adopted, then maybe the documentation would be clearer saying:

"width (float, optional) indicates the location of a grid cell bound expressed as a fraction of the distance from a cell's coordinate value to the coordinate value of the neighboring cell. Only when width=0.5 will the cells be contiguously located. The bound for a cell with no neighbor (i.e., the first and last cell of a dimension) will be assumed to be the same distance from the cell's coordinate value as its other bound."

pochedls · 2023-01-27T20:00:54Z

Regarding the start / end of the time period. In thinking about how to implement this, I think by default we would just take the start/end of the month that the time point is in (e.g., 2023-01-01 or 2023-01-15 or 2023-01-31 would all map to the same bounds: [2023-01-01 00:00:00, 2023-02-01 00:00:00]). This is easy to do, because a given time point, t, is stored as a cfTime object and you can directly retrieve the month / year (e.g., t.month would yield 1 in each case) and easily define the bounds from that information.

I think the case where you would want to use stored=1 is when the time point is stored at the end of the time period. For E3SM this would be 2023-02-01 00:00:00 (for January data) and calling t.month would give you 2 (and you don't want to set the start / end bounds to [2023-02-01 00:00:00, and 2023-02-28 00:00:00]). So if E3SM didn't have time bounds, I think we'd want that stored=1 value.

taylor13 · 2023-01-27T20:21:08Z

@pochedls, I was addressing the behavior and documentation of Dataset.bounds.add_bounds(axis, width=0.5), not cdutil.setTimeBoundsMonthly(x). I agree that for monthly data you need a special function along the lines you suggest.

jypeter · 2023-01-30T16:01:14Z

I have tried to figure out how the width parameter works, but I'm not sure I really got it. It is actually used in _create_bounds(axis, coord_var, width)

Good luck if somebody wants to explain how bounds are actually created in a docstring! xcdat is still in version 0.nnn so I'm rather in favor of dropping the width=0.5 default parameter (btw, the code does not check that width is actually in the 0 to 1 range).

Maybe somebody could want to use 0 or 1, if they wanted the upper and lower bounds to have the same values as the coordinates, but then I assume they would really know what they are doing, and I'd rather have them use a bounds.add_custom_bounds function

I'd rather have a clear (i.e not confusing) documentation saying that if there are no existing bounds, a warning will be raised and the bounds will be created at the midpoint (no other option). The documentation can possibly add that the created bounds of the Y axis will be clipped in the -90 to 90 range if the Y axis has no units at all, or if the units are in degrees

pochedls · 2023-01-30T17:17:28Z

I think we should probably remove the width parameter unless someone wants to figure out when someone would set this to width ≠ 0.5 and then add more documentation (and perhaps a check to make sure width is within [0, 1]). Is anyone able to put together a pull request to update this? I guess this could be tacked onto this issue when someone tackles this feature request (which is related to setting time bounds correctly).

taylor13 · 2023-01-30T19:07:39Z

I agree. I misinterpreted what "width" meant. Looking at the code, it calculates a fractional distance of "width" for one bound and "1-width" for the other bound. This ensures that all cells are contiguous. As J-Y @jypeter says, I'd rather have them custom create their own bounds.

arfriedman · 2023-03-08T13:06:42Z

Hi @pochedls @tomvothecoder

I linked to this discussion in response to a question about time bounds on the xarray forum in case you would like to add anything (I couldn't find how to tag you directly): pydata/xarray#7580

tomvothecoder · 2023-03-08T16:58:37Z

We appreciate the xCDAT plug @arfriedman!

You brought up a great idea for us to stay active in the xarray discussions forum to be more involved in the community and also help xCDAT grow.

pochedls · 2023-03-08T17:21:23Z

@arfriedman – just an FYI that there is a PR under review to address the contents of this feature request. This matters for temporal averaging if your dataset does not have time bounds and you need to create them (if the dataset has time bounds they will be used for temporal averaging operations).

tomvothecoder · 2023-03-08T17:26:06Z

@pochedls yup Andrew is on top of it! He even linked to that issue in his comment.

tomvothecoder · 2023-03-08T17:42:29Z

@pochedls yup Andrew is on top of it! He even linked to that issue in his comment.

@pochedls Sorry got confused with your comment about the PR for this issue and not this issue itself.

pochedls added the type: enhancement New enhancement request label Jan 26, 2023

chengzhuzhang mentioned this issue Jan 27, 2023

[Feature]: Allow user selection of reference period in departures #387

Closed

tomvothecoder added this to To do in Next Release (v0.5.0) via automation Feb 1, 2023

tomvothecoder added the Priority: High label Feb 1, 2023

tomvothecoder self-assigned this Feb 1, 2023

pochedls self-assigned this Feb 1, 2023

tomvothecoder mentioned this issue Feb 23, 2023

Update departures and climatology APIs with reference period #417

Merged

9 tasks

pochedls mentioned this issue Feb 24, 2023

Functions to produce accurate time bounds #418

Merged

9 tasks

tomvothecoder moved this from To do to In progress in Next Release (v0.5.0) Mar 20, 2023

tomvothecoder removed this from In progress in Next Release (v0.5.0) Mar 29, 2023

tomvothecoder added this to To do in Next Release (v0.6.0) via automation Mar 29, 2023

tomvothecoder moved this from To do to In progress in Next Release (v0.6.0) Mar 29, 2023

tomvothecoder closed this as completed in #418 Apr 20, 2023

Next Release (v0.6.0) automation moved this from In progress to Done Apr 20, 2023

tomvothecoder added this to the v0.6.0 milestone Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add functionality to accurately set time bounds #412

[Feature]: Add functionality to accurately set time bounds #412

pochedls commented Jan 26, 2023

lee1043 commented Jan 26, 2023 •

edited

Loading

taylor13 commented Jan 26, 2023

pochedls commented Jan 26, 2023

taylor13 commented Jan 26, 2023

chengzhuzhang commented Jan 26, 2023

lee1043 commented Jan 27, 2023

chengzhuzhang commented Jan 27, 2023

jypeter commented Jan 27, 2023

taylor13 commented Jan 27, 2023

pochedls commented Jan 27, 2023

taylor13 commented Jan 27, 2023

jypeter commented Jan 30, 2023

pochedls commented Jan 30, 2023

taylor13 commented Jan 30, 2023

arfriedman commented Mar 8, 2023

tomvothecoder commented Mar 8, 2023 •

edited

Loading

pochedls commented Mar 8, 2023

tomvothecoder commented Mar 8, 2023

tomvothecoder commented Mar 8, 2023

[Feature]: Add functionality to accurately set time bounds #412

[Feature]: Add functionality to accurately set time bounds #412

Comments

pochedls commented Jan 26, 2023

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

lee1043 commented Jan 26, 2023 • edited Loading

taylor13 commented Jan 26, 2023

pochedls commented Jan 26, 2023

taylor13 commented Jan 26, 2023

chengzhuzhang commented Jan 26, 2023

lee1043 commented Jan 27, 2023

chengzhuzhang commented Jan 27, 2023

jypeter commented Jan 27, 2023

taylor13 commented Jan 27, 2023

pochedls commented Jan 27, 2023

taylor13 commented Jan 27, 2023

jypeter commented Jan 30, 2023

pochedls commented Jan 30, 2023

taylor13 commented Jan 30, 2023

arfriedman commented Mar 8, 2023

tomvothecoder commented Mar 8, 2023 • edited Loading

pochedls commented Mar 8, 2023

tomvothecoder commented Mar 8, 2023

tomvothecoder commented Mar 8, 2023

lee1043 commented Jan 26, 2023 •

edited

Loading

tomvothecoder commented Mar 8, 2023 •

edited

Loading