Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should guess_bounds "do what I mean" for Gregorian monthly data? #4864

Closed
hdyson opened this issue Jul 14, 2022 · 8 comments · Fixed by #6090
Closed

Should guess_bounds "do what I mean" for Gregorian monthly data? #4864

hdyson opened this issue Jul 14, 2022 · 8 comments · Fixed by #6090
Assignees
Milestone

Comments

@hdyson
Copy link
Contributor

hdyson commented Jul 14, 2022

✨ Feature Request

Iris coordinate guess_bounds functionality identifies the bounds as halfway between the points. For a time coordinate, when the points are mid-month with a Gregorian calendar, should iris instead set the bounds to start and end of the month?

Motivation

In ANTS, we have this functionality already, but do not support any other time cases. Iris' guess_bounds is more flexible in time handling, so we'd love to retire our limited time handling in favour of using the iris behaviour. In other words, iris handles the general case well, but does not handle this specific case as a user might expect; while ANTS handles this particular case well, but does not handle any other cases for time coordinates. Ideally, iris guess_bounds would give us the best of both worlds.

We can delegate to iris guess_bounds to get the best of both worlds in ANTS. I think the optimal solution though is for this behaviour to be available for all iris users.

ANTS docs are here, for reference: https://code.metoffice.gov.uk/doc/ancil/ants/latest/lib/ants.utils.html#ants.utils.coord.guess_bounds (and link through to the implementation source code - there's also a unit test for the Gregorian case here:https://code.metoffice.gov.uk/trac/ancil/browser/ants/trunk/lib/ants/tests/utils/coord/test_guess_bounds.py?marks=100-111#L100 )

This comes up fairly frequently when working with model data.

Additional context

Example of current behaviour with iris 3.2:

In [1]: import iris

In [2]: import iris.coords

In [3]: time = iris.coords.DimCoord(points=[210756., 211464., 212172., 212904., 213636., 214368., 215100
   ...: .,215844., 216576., 217308., 218040., 218772.], units='hours since epoch', standard_name='time')
   ...: 

In [4]: time.units.num2date(time.points)  # Points are mid-month for gregorian calendar
Out[4]: 
array([cftime.DatetimeGregorian(1994, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 2, 15, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 3, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 4, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 5, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 6, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 7, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 8, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 9, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 10, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 11, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1994, 12, 16, 12, 0, 0, 0, has_year_zero=False)],
      dtype=object)

In [5]: time.guess_bounds()

In [6]: time.units.num2date(time.bounds)  # Bounds are not start/end of month
Out[6]: 
array([[cftime.DatetimeGregorian(1994, 1, 1, 18, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 1, 31, 6, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 1, 31, 6, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 3, 1, 18, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 3, 1, 18, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 3, 31, 18, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 3, 31, 18, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 5, 1, 6, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 5, 1, 6, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 5, 31, 18, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 5, 31, 18, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 7, 1, 6, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 7, 1, 6, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 8, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 8, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 8, 31, 18, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 8, 31, 18, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 10, 1, 6, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 10, 1, 6, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 10, 31, 18, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 10, 31, 18, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 12, 1, 6, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(1994, 12, 1, 6, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(1994, 12, 31, 18, 0, 0, 0, has_year_zero=False)]],
      dtype=object)

and similar for ANTS 0.19:

In [1]: import iris

In [2]: import ants

In [3]: time = iris.coords.DimCoord(points=[210756., 211464., 212172., 212904., 213636., 214368., 215100
   ...: .,215844., 216576., 217308., 218040., 218772.], units='hours since epoch', standard_name='time')
   ...: 

In [4]: time.units.num2date(time.points)  # Points are mid-month for gregorian calendar
Out[4]: 
array([real_datetime(1994, 1, 16, 12, 0),
       real_datetime(1994, 2, 15, 0, 0),
       real_datetime(1994, 3, 16, 12, 0),
       real_datetime(1994, 4, 16, 0, 0),
       real_datetime(1994, 5, 16, 12, 0),
       real_datetime(1994, 6, 16, 0, 0),
       real_datetime(1994, 7, 16, 12, 0),
       real_datetime(1994, 8, 16, 12, 0),
       real_datetime(1994, 9, 16, 0, 0),
       real_datetime(1994, 10, 16, 12, 0),
       real_datetime(1994, 11, 16, 0, 0),
       real_datetime(1994, 12, 16, 12, 0)], dtype=object)

In [5]: ants.utils.coord.guess_bounds(time)

In [6]: time.units.num2date(time.bounds)  # Bounds are now start/end of month
Out[6]: 
array([[real_datetime(1994, 1, 1, 0, 0), real_datetime(1994, 2, 1, 0, 0)],
       [real_datetime(1994, 2, 1, 0, 0), real_datetime(1994, 3, 1, 0, 0)],
       [real_datetime(1994, 3, 1, 0, 0), real_datetime(1994, 4, 1, 0, 0)],
       [real_datetime(1994, 4, 1, 0, 0), real_datetime(1994, 5, 1, 0, 0)],
       [real_datetime(1994, 5, 1, 0, 0), real_datetime(1994, 6, 1, 0, 0)],
       [real_datetime(1994, 6, 1, 0, 0), real_datetime(1994, 7, 1, 0, 0)],
       [real_datetime(1994, 7, 1, 0, 0), real_datetime(1994, 8, 1, 0, 0)],
       [real_datetime(1994, 8, 1, 0, 0), real_datetime(1994, 9, 1, 0, 0)],
       [real_datetime(1994, 9, 1, 0, 0),
        real_datetime(1994, 10, 1, 0, 0)],
       [real_datetime(1994, 10, 1, 0, 0),
        real_datetime(1994, 11, 1, 0, 0)],
       [real_datetime(1994, 11, 1, 0, 0),
        real_datetime(1994, 12, 1, 0, 0)],
       [real_datetime(1994, 12, 1, 0, 0),
        real_datetime(1995, 1, 1, 0, 0)]], dtype=object)
@trexfeathers
Copy link
Contributor

trexfeathers commented Jul 14, 2022

In line with the design decision in #4723, it is more likely that a user argument should be provided, rather than special behaviour for specific cases. We're trying to shy away from Iris 'magically' guessing what the user might want.

If others agree with this, then I guess there's less debate about whether it should be implemented - it would be opt-in behaviour.

@hdyson
Copy link
Contributor Author

hdyson commented Jul 14, 2022

it is more likely that a user argument should be provided...

I think that makes a great deal of sense to me. The current behaviour has iris doing exactly what the user is telling it to do, and is consistent with non-time coordinates. Having a flag to say "align_with_months" (or similar) to get more lenient behaviour that is aware of the pattern of the points being mid-months feels like a way to handle the irregularity of the Gregorian calendar in a user friendly manner.

@trexfeathers
Copy link
Contributor

@hdyson has said their team will put up a PR in due course 👍

@trexfeathers
Copy link
Contributor

@hdyson this is currently assigned to you, but since we did this your team has gone through some changes. Do you still want this?

@bjlittle
Copy link
Member

@hdyson I've unassigned you from this issue, are you still keen to see this issue addressed?

Just wanted to confirm that this isn't a WIP that we don't know about ... otherwise, we'll consider it for future work.

Thanks

@hdyson
Copy link
Contributor Author

hdyson commented Nov 29, 2023

@trexfeathers, @bjlittle Thanks - you're both spot on. It is functionality we would like, but it's not something that's being actively worked on by us.

@trexfeathers trexfeathers added this to the v3.9 milestone Jan 3, 2024
@trexfeathers
Copy link
Contributor

Sizing this based on the assumption that we want this convenience to error if there is more than 1 point in any of the months, and that we could offer years at the same time by re-using most of the code.

@hdyson
Copy link
Contributor Author

hdyson commented Jul 29, 2024

Awesome - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants