Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of data in "nearest" trajectory interpolate #5062

Merged
merged 5 commits into from
Nov 16, 2022

Conversation

lbdreyer
Copy link
Member

@lbdreyer lbdreyer commented Nov 11, 2022

Closes #4463

As raised in #4463 when regridding with iris.analysis.UnstructuredNearest the dtype wasn't being preserved and the mask was being thrown away. These issues turned out to be linked.

Regarding the dtype, previous changes to the underlying trajectory code chose to preserve the behaviour from the initial Iris commit (10 years ago!) that created a resulting new_cube with an empty data array, then filled it.

new_cube.data[..., i] = column.data

But by doing so, it used the data type of the empty array with is (usually) float64.
I don't think it was necessary to preserve this behaviour. All other regrid/interpolation operations AFAIK use the data type from the input cube (e.g. linear trajectory interpolation, or nearest neighbour regrid). So it would be better to be consistent across our operations.

The other issue was the mask not being preserved. This seemed to be a consequence of filling the empty array, which currently looks like:

new_cube.data[:] = source_data

that can cause some problems with missing data, which I believe is what this comment is referring to
# "Fix" problems with missing datapoints producing odd values
# when copied from a masked into an unmasked array.
# TODO: proper masked data handling.

As shown in the example below, if I try to fill any empty array with a masked array, the mask is simply thrown away. This is why I suspect we were first filling with the mdi value

>>> source_data = np.ma.array([1, 4], mask=[0,1], dtype=np.float32)
>>> new_data = np.empty((1,2))
>>> new_data
array([[6.94805265e-310, 6.94805265e-310]])
>>> new_data[:] = source_data
>>> new_data
array([[1., 4.]])

But this won't happen if we just assign cube data to our new data array.

In this PR, in the first commit I have first added a couple unit tests to check the dtype and mask are preserved. The second commit include the fix which makes the tests pass.

@lbdreyer lbdreyer changed the title Add trajectory interpolate tests for checking mask and dtype Fix handling of data in "nearest" trajectory interpolate Nov 11, 2022
@lbdreyer lbdreyer marked this pull request as ready for review November 11, 2022 17:17
@trexfeathers trexfeathers self-assigned this Nov 14, 2022
Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lbdreyer, this must have taken a lot of sleuthing! I agree with your assessment. Also note that a similar change was made to the linear method this year in #4366.

Two points:

  • This deserves a What's New entry
  • My below question about use of fixtures

@lbdreyer
Copy link
Member Author

Thanks for reviewing @trexfeathers ! I believe I have now addressed your comments

@trexfeathers trexfeathers merged commit add1365 into SciTools:main Nov 16, 2022
@lbdreyer lbdreyer deleted the traje_Data branch February 21, 2023 11:18
@stephenworsley stephenworsley mentioned this pull request Jun 28, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Regridding with iris.analysis.UnstructuredNearest does not preserve dtype and masks
2 participants