-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor]: Consider using flox
and xr.resample()
to improve temporal averaging grouping logic
#217
Comments
tomvothecoder
changed the title
[FEATURE]: Improve temporal averaging grouping without the use of pandas MultiIndex
[FEATURE]: Improve temporal averaging grouping logic
Apr 6, 2022
tomvothecoder
changed the title
[FEATURE]: Improve temporal averaging grouping logic
[Refactor]: Improve temporal averaging grouping logic
Nov 9, 2022
tomvothecoder
changed the title
[Refactor]: Improve temporal averaging grouping logic
[Refactor]: Consider using Apr 14, 2023
flox
to improve temporal averaging grouping logic
I saw the ping at pydata/xarray#6610. Let me know if you run in to issues or have questions |
Thanks @dcherian! I'm looking forward to trying out |
tomvothecoder
changed the title
[Refactor]: Consider using
[Refactor]: Consider using Sep 5, 2024
flox
to improve temporal averaging grouping logicflox
and xr.resample()
to improve temporal averaging grouping logic
tomvothecoder
modified the milestones:
FY24Q4 (07/01/24 - 09/30/24),
FY24 Items for Dev Day
Sep 12, 2024
tomvothecoder
modified the milestones:
FY24 Items for Dev Day,
FY25Q1 (10/01/24 - 12/31/24)
Sep 25, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem?
Currently, grouping by multiple coordinates (e.g.,
time.year
andtime.season
) requires creating a new set of coordinates before grouping due to the xarray limitations described below.Related code in
xcdat
for temporal grouping:xcdat/xcdat/temporal.py
Lines 1266 to 1322 in c9bcbcd
Current temporal averaging logic (workaround for multi-variable grouping):
xarray.DataArray
to apandas.DataFrame
,a. Keep only the DataFrame columns needed for grouping (e.g., "year" and "season" for seasonal group averages), essentially "labeling" coordinates with their groups
b. Process the DataFrame including:
cftime
coordinates (season strings aren't supported incftime
/datetime
objects)cftime
objects to represent new time coordinatesDescribe the solution you'd like
It is would be simpler, cleaner, and probably more performant to call something like
.groupby(["time.year", "time.season"])
instead (waiting onxarray
to support this withflox
). This solution will reduce a lot of the internal complexities involved with the temporal averaging API.We might able to achieve this using
flox
directly:Additionally, would need to figure out a way to easily perform the processing steps for time coordinates directly in xarray objects described in 2b if we move away from using
pandas.DataFrame
.Describe alternatives you've considered
Multi-variable grouping was originally done using
pd.MultiIndex
but we shifted away from this approach because this object cannot be written out tonetcdf4
. Alsopd.MultiIndex
is not the standard object type for representing time coordinates in xarray. The standard object types arenp.datetime64
andcftime
.Additional context
Future solution through
xarray
+flox
:xarray
version in Update GroupBy constructor for grouping by multiple variables, dask arrays pydata/xarray#6610, we should be able to do this.flox
inGroupBy
andresample
pydata/xarray#5734 is now merged which improves.groupby()
performance significantly.The text was updated successfully, but these errors were encountered: