Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meshcoord metadata follows face/edge coords where present. #5020

Merged
merged 7 commits into from
Oct 21, 2022

Conversation

pp-mo
Copy link
Member

@pp-mo pp-mo commented Oct 10, 2022

Addresses #4860

@pp-mo pp-mo changed the title New meshcoord metadata logic, barely functioning with some tests fixed. Meshcoord metadata follows face/edge coords where present. Oct 10, 2022
@pp-mo
Copy link
Member Author

pp-mo commented Oct 10, 2022

Ping @schlunma
Hoping you may be able to test against this for usability in light of #4860 ?

This code should give an idea of whether the idea works.
Though it is still very experimental, and I may well make further changes in following days.

Please note
I am proposing to "outlaw" clashes of phenomenon identity (including units) between node- and face/edge-coords.
This may go against some of your suggested testcases in #4860, but ...

  • I believe Iris simply cannot tolerate any conflict of phenomenon or units between a Coord points and its bounds, since we don't treat bounds as a separate object.
  • so likewise, for MeshCoords, that applies also to any disagreements between node- and face/edge-coords metadata

The currently-proposed code is allowing matches where one of the two coords is missing a standard/long-name or units property. And it makes no requirement for attributes to match.
So ... these are precisely the questions that we need to resolve, for which your opinions + testing are much wanted !

@schlunma
Copy link
Contributor

Thanks @pp-mo for these changes!

I fully agree with your suggestions here that iris cannot accept node and face/edge coordinate metadata to be different. The main problem I'm having right now and described in #4860 is that currently they can differ (which even can lead to wrong results when the units differ!), so not allowing these to differ in first place also solves my problem.

Three comments on this:

  1. Is there a particular reason the var_name is fixed to None? What's the reason this is not treated similarily to long_name (i.e., ignored when standard_name is not set)? This is actually the largest problem we have in ESMValTool, since we except the input coordinates to have specific names given by the CMIP data request, and this hardcoded None causes our checks to fail. For all other metadata we could solve this with a workaround (in theory).
  2. I am not entirely sure if matches should be allowed if one of them is empty. Especially for the units this seems dangerous, as you might still end up with points or bounds not matching their units. Thus, I'd rather not allow matches when one of the values is missing. I'm fine with ignoring the long name if the standard name is set and with ignoring the attributes altogether.
  3. I still believe that it would make sense to compare the bounds of the face/edge coordinate (if they are not empty) with the corresponding values calculated from the connectivity and the node coordinate in the MeshCoord constructor. Yes, it's just a warning in the UGRID checker, but in my opinion this is an inconsistency in the data that should not be allowed.

Unfortunately I probably won't have the chance to actually test this before the ESMValTool workshop, but the code changes look really good already. I am happy to focus on this during the workshop.

Cheers, and thanks again for all your work on this! 🚀

@pp-mo
Copy link
Member Author

pp-mo commented Oct 14, 2022

Some initial answers on your points ...
Hopefully, we get to talk next week !

Is there a particular reason the var_name is fixed to None? What's the reason this is not treated similarily to long_name

The iris ".var_name" carries the original file identity of an Iris object which was loaded from a variable in a netcdf file -- and it likewise determines variable name on writing out to netcdf.
A MeshCoord, however, is basically a "convenience" object that has no state of its own, does not represent any netcdf variable, and is never written to an output file (instead, the referenced mesh coordinates and connectivities would be). Since it never was or can be a netcdf variable, a var_name is not appropriate.


I still believe that it would make sense to compare the bounds of the face/edge coordinate (if they are not empty) with the corresponding values calculated from the connectivity and the node coordinate in the MeshCoord constructor. Yes, it's just a warning in the UGRID checker, but in my opinion this is an inconsistency in the data that should not be allowed.

I think we rejected this because of concerns about performance costs with large meshes. The UGRID checker only checks it if the arrays are "small enough", with a user-defined memory size threshold. Which I think works OK for a command utility, but is not really appropriate for Iris.
The UGRID spec also seems to state that, if they exist, bounds values "should" be redundant. And in particular, that they cannot be used instead of a connectivity variable. So, we felt it was justified to simply ignore them and bypass the problems of checking.
Iris explicitly avoids adding bounds to a coordinate in these cases, so it treats this information as "irrelevant" : Since, from Iris PoV, it cannot cause any conflict, I think not checking it is defensible.

Though ... the ugrid-checker position is a bit different, since unlike Iris it is intended to cover all UGRID (**).
So maybe this should be an error, but at the time I felt the door was still open to bounds coding something different from the product of the connectivity and node-coordinates. So as currently proposed it is a "should" rather than a "must"
So it might be worth re-visiting this choice. However, note that all of this is rather lacking oversight by anyone but me, and not yet adopted by UGRID.

(**) with the rather notable exception of 3d meshes / volumes !

@pp-mo
Copy link
Member Author

pp-mo commented Oct 14, 2022

This one is maybe a bit trickier ...

I am not entirely sure if matches should be allowed if one of them is empty. Especially for the units this seems dangerous, as you might still end up with points or bounds not matching their units.

As the meaning of the 'coordinate' metadata in UGRID is nowhere explicitly stated, I'm not confident that there is a solid logic to determine on this. So I wanted to be very tolerant of what we find in a file, since in the past Iris loading code has often proved too strict, and we have had to install extra workarounds + issue warnings so that it can cope with "incorrect" non-CF input rather than failing to load it altogether.

Details... in this case, I reckon that we could+should cope if one or other of the "face/edge" and "node" coordinates does not have a usable identity at all (i.e. a name or units). (maybe this could be seen like a regular bounds-variable, getting its identity from the parent variable). I am also allowing the other to be missing a name or units, so that only a definite conflict will cause an error.
Note, however, that in any case it will not mix information from the two possible metadata-s, but only choose which of them should be used.

However, I now see that when something does go wrong by this definition, it will cause a load failure.
So perhaps I should adapt that strategy to raise a warning instead -- do you think that would be more useful ?
Certainly, it would then make more sense to complain about other unexpected occurences (like name vs no-name), if they are raised as warnings rather than fatal.

N.B. at present, this is rather undermined by the Mesh class, which currently requires coordinates to have names that identify an axis.
But, I have argued elsewhere that this is not really in the spirit of UGRID, and I'm hoping we can fix it to make it more tolerant in future - see #4438, point (2).

@pp-mo
Copy link
Member Author

pp-mo commented Oct 18, 2022

From offline discussion with @bjlittle @schlunma ..

  • for convenience in cmor checking within ESMValTool, really would like MeshCoords to have a (non-empty) var_name
    Specifically, for compatibility with "ordinary" coords, want to be able to extract a MeshCoord with cube.coord(var_name=name)
    • so, this is not really a big deal for Iris, we will just copy var_name with the other metadata
  • @schlunma still thinking that accepting mismatch between name or units of face-coord and node-coord is unwise :
    i.e. should really be an error, not a warning. But one missing (e.g. no units) probably can be tolerant, as currently proposed
    • ok!
  • explained that Iris is ignoring (not loading) bounds of face/edge/node coordinates when loading mesh info,
    so checking would mean allowing that
    • probably not worth it -- can leave out

@bjlittle standing ready to merge !
But we also have some outstanding test errors to fix (and possibly some missing tests, TBD) ..

@pp-mo
Copy link
Member Author

pp-mo commented Oct 19, 2022

Replaced earlier code with much simpler implementation (and tests likewise),
since I realised that the current Mesh limitations mean that many of the complex cases it was trying to address can't actually occur (!)

@pp-mo
Copy link
Member Author

pp-mo commented Oct 19, 2022

The new, intended behaviour, is that:

  • a face/edge MeshCoord now copies all the metadata from the face/edge ("points") coord
    -- i.e., now including var_name.
  • an error is raised if the face/edge and node coords have different standard_name or units
    -- but different long_name/attributes are allowed/ignored (also var_name, obviously).
    • This also now provides that a provided standard_name will not match a missing one ..
    • .. but a valid units is allowed to 'match' a missing one, mostly because otherwise it invalidates lot of our existing testcases.

@pp-mo pp-mo marked this pull request as ready for review October 19, 2022 17:24
@pp-mo
Copy link
Member Author

pp-mo commented Oct 19, 2022

I think I've finally got somewhere with this.
@schlunma will this work for you now ?
@bjlittle care to review ?
Thanks

@bjlittle bjlittle self-assigned this Oct 20, 2022
@bjlittle bjlittle self-requested a review October 20, 2022 09:54
Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all these changes @pp-mo ! I just tested this within ESMValTool and it indeed solves our problems 🎉

I didn't test this in too much detail (e.g., if it successfully fails the metadata of the nodes and faces differs), but the code looks fine to me 👍 Cheers, thanks for all your help on this!!

Copy link
Member

@bjlittle bjlittle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pp-mo This looks really good to me 🤩

Just a minor comment to service, otherwise let's get this banked 👍

lib/iris/experimental/ugrid/mesh.py Show resolved Hide resolved
lib/iris/experimental/ugrid/mesh.py Outdated Show resolved Hide resolved
@bjlittle
Copy link
Member

@pp-mo Do you fancy also adding a whatsnew entry?

@pp-mo
Copy link
Member Author

pp-mo commented Oct 21, 2022

Thanks @bjlittle hope that does ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

MeshCoord constructor always uses metadata of node coordinate regardless of location
3 participants