Fix handling of data in "nearest" trajectory interpolate #5062

lbdreyer · 2022-11-11T17:00:10Z

As raised in #4463 when regridding with iris.analysis.UnstructuredNearest the dtype wasn't being preserved and the mask was being thrown away. These issues turned out to be linked.

Regarding the dtype, previous changes to the underlying trajectory code chose to preserve the behaviour from the initial Iris commit (10 years ago!) that created a resulting new_cube with an empty data array, then filled it.

iris/lib/iris/analysis/trajectory.py

Line 240 in 61ac271

new_cube.data[..., i] = column.data

But by doing so, it used the data type of the empty array with is (usually) float64.
I don't think it was necessary to preserve this behaviour. All other regrid/interpolation operations AFAIK use the data type from the input cube (e.g. linear trajectory interpolation, or nearest neighbour regrid). So it would be better to be consistent across our operations.

The other issue was the mask not being preserved. This seemed to be a consequence of filling the empty array, which currently looks like:

iris/lib/iris/analysis/trajectory.py

Line 456 in 10f517b

new_cube.data[:] = source_data

that can cause some problems with missing data, which I believe is what this comment is referring to

iris/lib/iris/analysis/trajectory.py

Lines 448 to 450 in 10f517b

 # "Fix" problems with missing datapoints producing odd values 

 # when copied from a masked into an unmasked array. 

 # TODO: proper masked data handling.

As shown in the example below, if I try to fill any empty array with a masked array, the mask is simply thrown away. This is why I suspect we were first filling with the mdi value

>>> source_data = np.ma.array([1, 4], mask=[0,1], dtype=np.float32)
>>> new_data = np.empty((1,2))
>>> new_data
array([[6.94805265e-310, 6.94805265e-310]])
>>> new_data[:] = source_data
>>> new_data
array([[1., 4.]])

But this won't happen if we just assign cube data to our new data array.

In this PR, in the first commit I have first added a couple unit tests to check the dtype and mask are preserved. The second commit include the fix which makes the tests pass.

trexfeathers

Thanks @lbdreyer, this must have taken a lot of sleuthing! I agree with your assessment. Also note that a similar change was made to the linear method this year in #4366.

Two points:

This deserves a What's New entry
My below question about use of fixtures

lib/iris/tests/unit/analysis/trajectory/test_interpolate.py

lbdreyer · 2022-11-16T16:53:22Z

Thanks for reviewing @trexfeathers ! I believe I have now addressed your comments

lbdreyer added 2 commits November 11, 2022 16:52

Add trajectory interpolate tests for checking mask and dtype

7dc1fbd

Fix trajectory nearest interpolate data

8d851ce

lbdreyer changed the title ~~Add trajectory interpolate tests for checking mask and dtype~~ Fix handling of data in "nearest" trajectory interpolate Nov 11, 2022

Merge branch 'main' into traje_Data

78ae3d5

lbdreyer marked this pull request as ready for review November 11, 2022 17:17

trexfeathers self-assigned this Nov 14, 2022

trexfeathers requested changes Nov 14, 2022

View reviewed changes

lib/iris/tests/unit/analysis/trajectory/test_interpolate.py Outdated Show resolved Hide resolved

lbdreyer and others added 2 commits November 16, 2022 16:40

Fixtures return values; whats new

43e0c0c

Merge branch 'main' into traje_Data

5966da1

lbdreyer commented Nov 16, 2022

View reviewed changes

lib/iris/tests/unit/analysis/trajectory/test_interpolate.py Show resolved Hide resolved

trexfeathers approved these changes Nov 16, 2022

View reviewed changes

trexfeathers merged commit add1365 into SciTools:main Nov 16, 2022

schlunma mentioned this pull request Dec 2, 2022

Removed unnecessary test that fails with iris 3.4.0 ESMValGroup/ESMValCore#1846

Merged

9 tasks

lbdreyer deleted the traje_Data branch February 21, 2023 11:18

stephenworsley mentioned this pull request Jun 28, 2023

Unify regridders #4754

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of data in "nearest" trajectory interpolate #5062

Fix handling of data in "nearest" trajectory interpolate #5062

lbdreyer commented Nov 11, 2022 •

edited

Loading

trexfeathers left a comment

lbdreyer commented Nov 16, 2022

	# "Fix" problems with missing datapoints producing odd values
	# when copied from a masked into an unmasked array.
	# TODO: proper masked data handling.

Fix handling of data in "nearest" trajectory interpolate #5062

Fix handling of data in "nearest" trajectory interpolate #5062

Conversation

lbdreyer commented Nov 11, 2022 • edited Loading

trexfeathers left a comment

Choose a reason for hiding this comment

lbdreyer commented Nov 16, 2022

lbdreyer commented Nov 11, 2022 •

edited

Loading