-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: incorrect group averaging with missing data #319
Labels
type: bug
Inconsistencies or issues which will cause an issue or problem for users or implementors.
Comments
9 tasks
pochedls
added a commit
that referenced
this issue
Aug 25, 2022
* initial fix for #319 * Refactor `_group_average()` - Preserve data variable attributes using `xr.set_options(keep_attrs=True)` - Reuse `self._labeled_time` if it is already set in a previous call to `_group_data()` - Update group average tests to check data variable test attr is preserved Co-authored-by: Tom Vo <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
type: bug
Inconsistencies or issues which will cause an issue or problem for users or implementors.
What happened?
I took the annual average of some data. In one example grid cell, the values were:
Since two months have the same value and data is missing for all other months, the weighted average should be
-0.02
. Instead, I get-0.0032786885
;group_average
seems to be incorrectly excluding the missing data.What did you expect to happen?
I expected to get
-0.02
.Minimal Complete Verifiable Example
Load some data for an example grid cell:
Note that first year of data has two months (of twelve) with data values
Taking the annual average I get an unexpected result:
This is not the right value. Since there are two values in the first year (and they are the same), then the average should simply be
-0.02
.I think I see what is happening. Each month is assigned some weight in the first year (proportional to the number of days in each month):
If we consider just the first year, then:
A simple weighted average is
WA = sum(T*W)/sum(W)
The problem is that there should be no weight assigned if there is no data for a given month/index. The weights should be corrected to reflect this:
This weighting matrix would yield
-0.02
(which is the correct answer).Relevant log output
No response
Anything else we need to know?
I think this is probably handled correctly in
_average
(via xarray.mean
), but_group_average
is calling.sum
instead of.mean
.Environment
main branch
The text was updated successfully, but these errors were encountered: