-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functions to produce accurate time bounds #418
Conversation
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #418 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 14 14
Lines 1308 1425 +117
==========================================
+ Hits 1308 1425 +117
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
This is an awesome turnaround! Thanks for getting to this so fast Steve. Feel free to tag me when it is ready. I can do code review and high-level usability testing, and maybe @lee1043 can a more in-depth usability testing if he's available? |
@pochedls thank you for the PR! @tomvothecoder yes I can also take a look. I will work on it and keep you all posted. |
This PR is not yet ready for review. |
@pochedls Okay, let us know when it's ready. |
Hey @pochedls I'm following about using In the current implementation, Lines 673 to 674 in 1efb807
A possible workaround is to add an optional ds.temporal.infer_freq(dim="time") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended to be a summary of the changes made to help with the review.
Major changes
- xcdat no longer adds temporal bounds by default (it now only adds lat/lon by default). To add time (or height) bounds you need to include
T
in the newbound_axes
argument inopen_dataset
functionality (e.g.,bounds_axes=[‘X’, ‘Y’, ’T’]
) or theaxes
argument inadd_missing_bounds
. - Functionality to add bounds now handles temporal bounds as a special case (see trace below); we set the time bounds to the expected start/end of the time period (e.g., the start/end of the month for monthly data).
- A number of functions of the form
get_FREQ_time_bounds
are added. They are wrapped byget_time_bounds
which is intended to call the correctget_FREQ_time_bounds
function. infer_freq
is now a function (rather than a temporal accessor method)- Added a new
_month_add
function to add N months to a cftime object (N can be negative for subtraction)
API Changes
- Modified
open_dataset
now adds optionalbounds_axes
argument [dataset.py
]open_mfdataset
now adds optionalbounds_axes
argument [dataset.py
]postprocess_dataset
now adds requiredbounds_axes
argument [dataset.py
]_infer_freq(time_coord)
now requires the time coordinate axis and is no longer an accessor method [temporal.py
]add_missing_bounds
removewidth
argument and accept optionalaxes
argument [bounds.py
]add_bounds
removewidth
argument [bounds.py
]_create_bounds
removewidth
argument [bounds.py
]
- New
_month_add(times, delta, calendar)
[temporal.py
]get_yearly_time_bounds(time)
[bounds.py
]get_monthly_time_bounds(time)
[bounds.py
]get_daily_time_bounds(time, frequency=1)
[bounds.py
]get_time_bounds(time)
[bounds.py
]
Trace
open_dataset
/ open_mfdataset
postprocess_dataset
add_missing_bounds
_create_bounds
get_time_bounds
get_yearly_time_bounds
/ get_monthly_time_bounds
/ get_daily_time_bounds
_infer_freq
/ _month_add
- Add optional
bounds_axes
argument to open dataset functionality:open_dataset(..., bounds_axes=[..., "T"])
oropen_mfdataset(..., bounds_axes=[..., "T"])
, which call_postprocess_dataset(..., bounds_axes=[..., "T"])
. By defaultbounds_axes=["X", "Y"]
. The rest of this "trace" assumes"T"
is included inbounds_axes
. - Post-processing routine calls
add_missing_bounds
, which now only operates on thebounds_axes
passed into the function (rather than allX
/Y
/T
/Z
coordinate variables in dataset). add_missing_bounds
calls_create_bounds
. If the coordinate variable being operated on is time, the routine will call the newget_time_bounds
function to create the time bounds (instead of using the generic bounds creation logic).get_time_bounds
calls the refactored_infer_freq
. Based on the inferred frequency eitherget_yearly_time_bounds
,get_monthly_time_bounds
, orget_daily_time_bounds
is called. The function then creates and returns a time coordinate data array.
xcdat/bounds.py
- Modify
add_missing_bounds
such thatwidth
is no longer an argument (always 0.5) and an optionalaxes
argument is added with[“X”, “Y”]
set as the default. - Modify
add_bounds
to removewidth
argument and to acceptbounds
as an optional argument. Ifbounds
are passed in it will add these bounds to the dataset rather than generating new bounds. - Modify
_create_bounds
to removewidth
argument. Modifies functionality: ifcoord_var
is a temporal axis the routine now callsget_time_bounds
to compute the temporal bounds. - Add
get_yearly_time_bounds
,get_monthly_time_bounds
, andget_daily_time_bounds
based on CDAT functionality. These functions create time bounds for annual, monthly, and daily time series.get_daily_time_bounds
will also calculate sub-daily (but not sub-hourly) time bounds if afrequency
argument is supplied. - Add
get_time_bounds
function to call the correctget_FREQ_time_bounds
function based on the refactored_infer_freq
function.
xcdat/dataset.py
open_dataset
,open_mfdataset
, and_postprocess_dataset
are modified to accept an optionalbounds_axes
argument, which defines the axes for which xcdat ensures there are bounds
xcdat/temporal
- Add
_month_add
function to add N months to cftime objects - Change
_infer_freq
to a function, which now requires time_coords as an argument - Update
average
to pass in the now required time coordinate axis argument to_infer_freq
tests/fixtures.py
- Add sub-hourly, hourly, daily, and yearly time axes and bounds
- Refactor
generate_dataset
so that inputs toxr.Dataset
are decided with conditional and thenxr.Dataset
is called with those inputs (reduces code length a little) - Add
generate_dataset_by_frequency
to generate datasets of varying frequencies in order to test functionality to add temporal bounds. I initially added a frequency argument togenerate_dataset
, but it did not pass the pre-commit hooks (one of them complained the function was overly complex: Flake8 C901).
tests/test_bounds.py
- add test_time_bounds_not_added_to_the_dataset_if_not_specified to ensure that time bounds are not added by default when calling
add_missing_bounds
- Fixed
test_returns_bounds_for_dataset_with_time_coords_as_cftime_objects
test (which was not setting time bounds correctly) - Add
test_get_monthly_bounds
,test_get_yearly_bounds
,test_get_daily_bounds
,test_get_hourly_bounds
which check that they each compute the correct time bounds. - Add
test_generate_monthly_bounds_for_eom_set_true
, which is the same as above, except theend_of_month
flag forget_monthly_time_bounds
is set toTrue
. - Add
test_generic_get_time_bounds_function
which checks to ensureget_time_bounds
adds the correct time bounds to the dataset for yearly, monthly, daily, and hourly data. Also ensures that sub-hourly data throws a ValueError.
tests/test_dataset.py
- Change
test_adds_missing_lat_and_lon_bounds
totest_adds_missing_lat_and_lon_bounds_does_not_add_time_bounds
to ensure that time bounds are not added by default when opening a dataset without time bounds - Add
test_adds_missing_lat_and_lon_and_time_bounds
to ensure time bounds are added whenT
is included in thebounds_axes
argument inopen_dataset
.
tests/test_temporal.py
- Add
test_month_add_plus_minus
function to test addition / subtraction of months with_month_add
One unresolved aspect of this PR is that I needed to add |
a421d98
to
175caa6
Compare
Thank you, @pochedls, for making this PR. Could you help me learning how to use this new capability in following scenario? A sample snippet of codes would be very helpful.
I guess some combination of |
I think there are two ways to do this (90% sure – I don't have this PR open on my machine right now):
@tomvothecoder also suggested renaming |
@pochedls In the above my example scenario how do I know |
|
@pochedls thank you for explaining. I was thinking |
No – I don't think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my initial review for this PR with some changes (link to diff with latest commit vs. previous commit).
I'm exploring the potential conversion of the create
functions to accessor methods in a separate review.
Summary of changes:
- Rename
get
time bounds functions tocreate
- Update docstrings and comments for
create
time bounds functions - Add TODO comment to raise exception when freq cannot be inferred in
_infer_freq
- Add
create
time bounds functions to top-level xcdat module__init__.py
and API referencesapi.rst
xcdat/dataset.py
Outdated
add_bounds: bool = True, | ||
bounds_axes: Optional[List[str]] = ["X", "Y"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about combining add_bounds
and bounds_axes
into a single parameter (e.g., add_bounds = ["X, "Y"]
, also accepts None
)?
I'll need to think about this more to see if it is confusing and/or makes the API less intuitive compared to having two separate parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be okay with this (assuming it defaults to ["X", "Y"]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be considered a breaking change IMO. I assume most users keep the default add_bounds=True
argument and xCDAT is not at v1.0.0 yet so we can kinda get away with it (for now).
I'll take a look at this in PR #434.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pochedls and @lee1043, I removed bounds_axes
and refactored add_bounds
to accept a list axes string, None
, and False
.
I kept False
as an option for API backwards compatibility. I'm certain people have been using add_bounds=False
to fix metadata issues and then calling add_bounds()
or add_missing_bounds()
.
Refer to this commit diff for more info: c28b227#diff-e5b713e4bf50c61361496f77fdadcf9d3ffc2fcb39c4c953932cd780ae951e55
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I view this as another improvement to the API I originally proposed. Thanks.
Thank you @tomvothecoder @pochedls! I will work on this tomorrow. |
@tomvothecoder @pochedls I confirm the |
…ds; update add_bounds to accept user specified bounds
…ime_bounds function. Update documentation. Add sub-hourly, sub-daily, daily, and yearly datasets in test fixtures. Add unit tests.
- Add requirements for using temporal accessor class
- Update docstrings and comments for `create` time bounds functions - Add TODO comment to raise exception when freq cannot be inferred in `_infer_freq` - Add `create` time bounds functions to top-level xcdat module `__init__.py` and API references `api.rst`
Co-authored-by: Stephen Po-Chedley <[email protected]>
edd3985
to
1004880
Compare
- Remove `bounds_axes`
Description
PR is intended to reproduce CDAT functionality (e.g.,
setAxisTimeBoundsMonthly()
) to produce accurate time bounds for datasets with missing bounds.As cleanup tasks, this PR will also a) remove the width parameter from generic bound creation routines, b) only autogenerate bounds for lat/lon (others axes, e.g., time, must be specified).
Checklist
If applicable: