make CDL tests independent of attribute string typing #2133

marqh · 2016-09-07T12:35:42Z

Behaviour changes in netCDF4-python have triggered test failures for v1.1 and 1.2 of netcdf4-python

With these versions, some attributes are typed at string where they previously were not. There is ongoing work and discussion which may alter this behaviour again, somewhere in netcdf-c netcdf4-python and/or hdf5

see:
Unidata/netcdf-c#298
Unidata/netcdf4-python#529
Unidata/netcdf4-python#575
amongst others for further details.

My suggestion is an explicit work around through assertCDL, into _assert_str_same to allow outputs to be interpreted as the same, even though they are manifestly not

I'm not sure I like this approach, but I find myself suggesting it.

DPeterK · 2016-09-14T14:58:34Z

This is not a nice change to have to be making - but that, as @marqh's comment above shows, is not our fault, more something that we're having to react to.

Functionally this change looks to do what we need it to (ignore \t\tstring in CML output). Pragmatically we need something like this change to help us move forward, and I think it can be argued that the rest of the CDL is the same whether \t\tstring is present or not.

My opinion then is that we grit our teeth and merge this 😬 But that's just my opinion; @bjlittle @pelson it would be nice to have some consensus on this...

marqh · 2016-09-26T09:04:48Z

My opinion then is that we grit our teeth and merge this 😬 But that's just my opinion;

understood

@bjlittle @pelson it would be nice to have some consensus on this...

indeed

bjlittle · 2016-09-29T11:37:13Z

lib/iris/tests/__init__.py

+ # https://github.com/Unidata/netcdf4-python/issues/575
+ # amongst others for further details.
+ if type_comparison_name == 'CDL' and reference_str != test_str:
+ test_str = test_str.replace('\t\tstring ', '\t\t')


@marqh Matching with a replace in this way is kinda open to making incorrect changes.

Perhaps you could be more rigorous and use a compiled regexp that also anchors the pattern match to the beginning of a line with globbed white space ...

@marqh I really don't like the fact that we're burying this patch in the bowels of the testing framework, where it will be forgotten and perhaps sting us some time (long?) in the future.

Is there an opportunity here to elevate this change to be a registered iris patch? A proper thing (a first class iris citizen) that isn't hidden, but something that is very visible, easy to control, govern and perhaps at some point retire or deprecate.

Specifically, I'm thinking of a private iris patch file per module that contains all relevant patch functionality that we currently apply in that module. The application of such patching (to me) seems quite generic and specific to individual patching cases. But in this case I'm thinking of a specific patch decorator that we would apply to this specific method, where the patch decorator would do the regexp on the test_str, as required.

The specific patch decorator could even only be applied for netCDF v1.1+ and contain all relevant documented/PR details/history for other dev's in the future to understand the context of this specific patch ...

Thoughts? Or am I guilty of over thinking this ... I still believe it would be a simple change, but it's this kind of explicit patching strategy that is of more interest (to me) as we move forward

marqh · 2016-10-03T09:22:41Z

@bjlittle

I have updated the change to use a more careful regex/sub pattern.

@marqh I really don't like the fact that we're burying this patch in the bowels of the testing framework, where it will be forgotten and perhaps sting us some time (long?) in the future.

Is there an opportunity here to elevate this change to be a registered iris patch? A proper thing (a first class iris citizen) that isn't hidden, but something that is very visible, easy to control, govern and perhaps at some point retire or deprecate.

I don't think I agree at this point. This is a change to our testing to assert that two different files are actually the same, to work around instabilities in netcdf4-python versions.

It is arguable that the files are different and these tests should fail: that this change is wrong in principal. We can make that call.
This PR presents an alternative, that these files are different, but that that does not matter, so we should accept it.

So, I don't think this fits a patch strategy. If we don't think these files should test as the same, we need a more fundamental assessment of how to deal with the changes in netcdf4-python:
Unidata/netcdf-c#298
Unidata/netcdf4-python#529
Unidata/netcdf4-python#575

QuLogic · 2016-10-03T19:24:53Z

lib/iris/tests/__init__.py

+ # tab|tab|string |attr_definition.
+ pattern = re.compile('(^\t\t)string (.+)$', re.MULTILINE)
+ # Replace with tab|tab|attr_definition.
+ replacement = r'\t\t\2'


A raw string means the tab isn't interpreted; how does it pass?

QuLogic · 2016-10-03T19:26:12Z

Can this be targeted at 1.10.x instead?

pelson · 2016-10-04T15:23:57Z

We have a number of options here:

we make iris behave consistently across multiple versions of netCDF4 python by forcing byte-string strings (NC_CHAR)
we accept the differences between versions heuristically (when testing), as per this PR
we enforce unicode strings (NC_STRING) when reading and writing from NetCDF
we introduce a heuristic to determine whether we save a string as NC_STRING or NC_CHAR

Enforcing NC_CHAR

Pros:

Simple to implement
This was the behaviour for ~ <=1.0.2 of netcdf4-python
All tests are already assuming this

Cons:

Unicode isn't supported without the user explicitly decoding the loaded byte-strings
Unpredictable type in loaded string depending on netcdf4-python heuristics (may be bytes/string, may be unicode)

Heuristic acceptance of NC_CHAR or NC_STRING when testing

Pros:

Simple to implement

Cons:

We will miss situations where we care about the NetCDF string type and could therefore easily miss an unexpected change of behaviour (though it is questionable whether we actually care).
Unpredictable behaviour for iris user across different netcdf4-python versions
Unpredictable behaviour for iris user when saving a cube which may or may not contain encodable ascii characters

Unicode strings throughout

Pros:

Predictable behaviour
Using unicode

Cons:

CDL may or may not have a string item for different netcdf4-python versions
Roundtrip will change strings to unicode
(Potential?) increased nc file size

Heuristic to save either NC_STRING or NC_CHAR

Pros:

Smaller file size than "Unicode strings throughout"
Repeatable behaviour, but somewhat arbitrarily string/unicode saving
Unicode is supported

Cons:

CDL may or may not have a string item for different netcdf4-python versions when saving attributes that can not be encoded as ASCII
Roundtrip will change unicode to string unless the unicode characters exist

I'll keep this comment up-to-date as more pros/cons/options emerge.

pelson · 2016-10-04T15:36:46Z

In order to put the above comment together in a coherent way, I was experimenting with our options and have put together #2158. I do not currently advocate that PR over this one, but would like to raise awareness of that as an option. While this PR represents a reduction in test certainty, PR #2158 represents a change in behaviour...

marqh · 2016-10-06T10:36:18Z

While this PR represents a reduction in test certainty, PR #2158 represents a change in behaviour...

I think we should be wary of a change in behaviour for all files, that would need to be justified

however, #2158 looks like all the test results are the same, so is this approach actually maintaining the same behaviour through netcdf4-python changes?

in #2158 none of the test results files are altered; apart from pep8 the tests pass

@pelson So, how is this a behaviour change, how does that manifest?

marqh · 2016-10-06T16:22:58Z

i think that #2158 represents a continuation of current Iris behaviour. I agree that it would be a behaviour reversion for people already using the latest netcdf4-python, but given the limited scope of the change, I think this represents a well constrained edge case

I think that #2158 is preferable to this PR and represents a sensible level of protection for Iris' netcdf output from netcdf4python changes, I suggest this PR is retired, unless #2158 proves problematic

marqh · 2016-10-06T16:24:40Z

Can this be targeted at 1.10.x instead?

i think that is a question for #2158, given my last comment

I will move the comment across to that ticket for further discussion. I am not convinced that it is required, but I have no particular objections. I want this in master and iris 1.11 which I currently feel needs to be imminent

QuLogic · 2016-10-10T03:08:03Z

#2158 replaced this one?

pelson · 2016-10-10T07:55:52Z

Thanks @QuLogic. It did indeed.

make CDL tests independent of attribute string typing

a88c9fb

marqh added the Status: Work in Progress label Sep 7, 2016

marqh assigned bjlittle Sep 7, 2016

marqh mentioned this pull request Sep 7, 2016

unpin matplotlib #2124

Merged

DPeterK approved these changes Sep 15, 2016

View reviewed changes

bjlittle reviewed Sep 29, 2016

View reviewed changes

update with regex sub

81f9bd8

QuLogic reviewed Oct 3, 2016

View reviewed changes

pelson mentioned this pull request Oct 4, 2016

Heuristically determine if saving a NetCDF attribute as a string or unicode. #2158

Merged

pelson closed this Oct 10, 2016

QuLogic removed the Status: Work in Progress label Oct 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make CDL tests independent of attribute string typing #2133

make CDL tests independent of attribute string typing #2133

marqh commented Sep 7, 2016

DPeterK commented Sep 14, 2016

marqh commented Sep 26, 2016

bjlittle Sep 29, 2016

bjlittle Sep 29, 2016 •

edited

Loading

marqh commented Oct 3, 2016

QuLogic Oct 3, 2016 •

edited

Loading

QuLogic commented Oct 3, 2016

pelson commented Oct 4, 2016 •

edited

Loading

pelson commented Oct 4, 2016

marqh commented Oct 6, 2016

marqh commented Oct 6, 2016

marqh commented Oct 6, 2016

QuLogic commented Oct 10, 2016

pelson commented Oct 10, 2016

make CDL tests independent of attribute string typing #2133

make CDL tests independent of attribute string typing #2133

Conversation

marqh commented Sep 7, 2016

DPeterK commented Sep 14, 2016

marqh commented Sep 26, 2016

bjlittle Sep 29, 2016

Choose a reason for hiding this comment

bjlittle Sep 29, 2016 • edited Loading

Choose a reason for hiding this comment

marqh commented Oct 3, 2016

QuLogic Oct 3, 2016 • edited Loading

Choose a reason for hiding this comment

QuLogic commented Oct 3, 2016

pelson commented Oct 4, 2016 • edited Loading

Enforcing NC_CHAR

Heuristic acceptance of NC_CHAR or NC_STRING when testing

Unicode strings throughout

Heuristic to save either NC_STRING or NC_CHAR

pelson commented Oct 4, 2016

marqh commented Oct 6, 2016

marqh commented Oct 6, 2016

marqh commented Oct 6, 2016

QuLogic commented Oct 10, 2016

pelson commented Oct 10, 2016

bjlittle Sep 29, 2016 •

edited

Loading

QuLogic Oct 3, 2016 •

edited

Loading

pelson commented Oct 4, 2016 •

edited

Loading