Improve floating point representation in generation of regular grids that use a reference point and spacings #3559

alastair-gemmell · 2019-11-22T16:00:21Z

Some regular grids in Iris are generated from a reference point along a given coordinate direction along with an associated spacing.

The way this is currently implemented uses numpy.arange as follows:

iris/lib/iris/coords.py

Line 2293 in b2ce2a3

points = (zeroth + step) + step * np.arange(count, dtype=np.float32)

Where zeroth is the value prior to the first point value, step is the numeric difference between successive point values and count is the number of point values.

As discussed at https://stackoverflow.com/questions/47243190/numpy-arange-how-to-make-precise-array-of-floats (amongst other places), this results in grids that contain fractionally incorrectly represented floats. E.g. using zeroth = 352.84, step = 0.11, count = 146, it produces the following array:

array([352.95 , 353.06 , 353.17 , 353.28 , 353.39 , 353.5 , 353.61002, 353.72 , 353.83002, 353.94 , 354.05002, 354.16 , 354.27002, 354.38 , 354.49002, 354.6 , 354.71002, 354.82 , 354.93002, 355.04 , 355.15002, 355.26 , 355.37003, 355.48 , 355.59003, 355.7 , 355.81 , 355.92 , 356.03 , 356.14 , 356.25 , 356.36002, 356.47 , 356.58002, 356.69 , 356.80002, 356.91 , 357.02002, 357.13 , 357.24002, 357.35 , 357.46002, 357.57 , 357.68002, 357.79 , 357.90002, 358.01 , 358.12003, 358.23 , 358.34003, 358.45 , 358.56 , 358.67 , 358.78 , 358.89 , 359. , 359.11002, 359.22 , 359.33002, 359.44 , 359.55002, 359.66 , 359.77002, 359.88 , 359.99002, 360.1 , 360.21002, 360.32 , 360.43002, 360.54 , 360.65002, 360.76 ], dtype=float32)

One consequence of this is that two grids that should be identical cannot be manipulated together (e.g. subtract one from the other) if one is fully specified from a source file, and the other is generated in this way from a reference point and spacings.

A potential solution as mentioned in the above link is to use an approach involving numpy.linspace instead such as:

start = zeroth + step
end = zeroth + (count * step)
points = np.linspace(start, end, num=count, dtype=np.float32)

Using the same values of zeroth, step and count as above this produces the following array:

array([352.95, 353.06, 353.17, 353.28, 353.39, 353.5 , 353.61, 353.72, 353.83, 353.94, 354.05, 354.16, 354.27, 354.38, 354.49, 354.6 , 354.71, 354.82, 354.93, 355.04, 355.15, 355.26, 355.37, 355.48, 355.59, 355.7 , 355.81, 355.92, 356.03, 356.14, 356.25, 356.36, 356.47, 356.58, 356.69, 356.8 , 356.91, 357.02, 357.13, 357.24, 357.35, 357.46, 357.57, 357.68, 357.79, 357.9 , 358.01, 358.12, 358.23, 358.34, 358.45, 358.56, 358.67, 358.78, 358.89, 359. , 359.11, 359.22, 359.33, 359.44, 359.55, 359.66, 359.77, 359.88, 359.99, 360.1 , 360.21, 360.32, 360.43, 360.54, 360.65, 360.76])

The text was updated successfully, but these errors were encountered:

rcomer · 2023-01-24T14:52:00Z

I believe this issue triggered odd behaviour in Cartopy for me: SciTools/cartopy#2128

rcomer · 2023-01-24T17:24:59Z

My problem pp-fields have

lbrow: 325
bzy: -90.55556
bdy: 0.5555556

The linspace solution would still give me a maximum latitude of 90.00001

larsbarring · 2023-01-25T13:52:41Z

We have seen similar problems in various regional model output datasets. This is why CORDEX specifies that projection (CRS) parameters and coordinate data have to be saved as double precision. But despite this, it occasionally happens that the historic simulation is not perfectly consistent with one (or several) of the future projection simulations. They may have been run on different machines, or using different library or OS versions, or ...

When I way back first stumbled on the problem that concatenation did not work as expected it was suggested that I should filter (round) the coordinate data. OK, this is a reasonable solution when the difference in double precision coordinate values amounts to nanometres in the real world (see this old google group comment).

Another solution is to disregard the 2d latitude/longitude data and recreate this information from the model grid (in my case either rotated pole or Lambert conformal conic grid coordinates). This approach has generally be en successful in alleviating the problem.

But recently a colleague struggled with a historic and future part of an experiment that still could not be concatenated. After quite some data debugging we found out that the problem was that for one of the projection parameters the values in the two files differed at the double precision last significant bit. This was not at all easy to spot and only underlines that generally exact comparisons should not be used with floating point numbers.

If I run the same example as in the first post, but change to float64 the arange and linspace alternatives still do not match each other exactly (16 elements differs by 5.68434189e-14). The question then is how the coordinate data in the OP's source file actually was created, was it with python, the same libraries (and versions)....?

rcomer · 2023-02-09T11:59:27Z

Thanks @larsbarring, yes I think I've concluded that for my case the file format was the problem. The header of my file states that there are 325 rows with spacing of 0.5555556 degrees, so if Iris takes that at face value it will come out with a latitude range of slightly more than 180 degrees. It's not obvious to me what, if anything Iris should do instead. The file is generated by the Met Office Unified Model and so decisions about that format were made decades before we thought about using python!

I don't know about the OP, but it seems likely that that was also Met Office UM data.

I think we should remove the "Good First Issue" label as this is problem is clearly more complicated that was originally assumed.

alastair-gemmell added Experience: Low T-Shirt: Small labels Nov 22, 2019

trexfeathers added the Good First Issue A good issue to take on if you're just getting started with Iris development label Sep 25, 2020

gcaria mentioned this issue May 3, 2021

Use numpy.linspace to generate grids built with reference point and spacings #4113

Closed

bjlittle removed the T-Shirt: Small label May 13, 2021

rcomer removed Experience: Low Good First Issue A good issue to take on if you're just getting started with Iris development labels Feb 9, 2023

ESadek-MO added the Status: Decision Required label May 24, 2023

larsbarring mentioned this issue Feb 20, 2024

Parallelising cubes concatenation #5750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve floating point representation in generation of regular grids that use a reference point and spacings #3559

Improve floating point representation in generation of regular grids that use a reference point and spacings #3559

alastair-gemmell commented Nov 22, 2019 •

edited

Loading

rcomer commented Jan 24, 2023

rcomer commented Jan 24, 2023

larsbarring commented Jan 25, 2023

rcomer commented Feb 9, 2023

Improve floating point representation in generation of regular grids that use a reference point and spacings #3559

Improve floating point representation in generation of regular grids that use a reference point and spacings #3559

Comments

alastair-gemmell commented Nov 22, 2019 • edited Loading

rcomer commented Jan 24, 2023

rcomer commented Jan 24, 2023

larsbarring commented Jan 25, 2023

rcomer commented Feb 9, 2023

alastair-gemmell commented Nov 22, 2019 •

edited

Loading