-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make CDL tests independent of attribute string typing #2133
Conversation
This is not a nice change to have to be making - but that, as @marqh's comment above shows, is not our fault, more something that we're having to react to. Functionally this change looks to do what we need it to (ignore My opinion then is that we grit our teeth and merge this 😬 But that's just my opinion; @bjlittle @pelson it would be nice to have some consensus on this... |
# https://github.com/Unidata/netcdf4-python/issues/575 | ||
# amongst others for further details. | ||
if type_comparison_name == 'CDL' and reference_str != test_str: | ||
test_str = test_str.replace('\t\tstring ', '\t\t') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marqh Matching with a replace in this way is kinda open to making incorrect changes.
Perhaps you could be more rigorous and use a compiled regexp that also anchors the pattern match to the beginning of a line with globbed white space ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marqh I really don't like the fact that we're burying this patch in the bowels of the testing framework, where it will be forgotten and perhaps sting us some time (long?) in the future.
Is there an opportunity here to elevate this change to be a registered iris patch? A proper thing (a first class iris citizen) that isn't hidden, but something that is very visible, easy to control, govern and perhaps at some point retire or deprecate.
Specifically, I'm thinking of a private iris patch file per module that contains all relevant patch functionality that we currently apply in that module. The application of such patching (to me) seems quite generic and specific to individual patching cases. But in this case I'm thinking of a specific patch decorator that we would apply to this specific method, where the patch decorator would do the regexp on the test_str, as required.
The specific patch decorator could even only be applied for netCDF v1.1+ and contain all relevant documented/PR details/history for other dev's in the future to understand the context of this specific patch ...
Thoughts? Or am I guilty of over thinking this ... I still believe it would be a simple change, but it's this kind of explicit patching strategy that is of more interest (to me) as we move forward
I have updated the change to use a more careful regex/sub pattern.
I don't think I agree at this point. This is a change to our testing to assert that two different files are actually the same, to work around instabilities in netcdf4-python versions. It is arguable that the files are different and these tests should fail: that this change is wrong in principal. We can make that call. So, I don't think this fits a patch strategy. If we don't think these files should test as the same, we need a more fundamental assessment of how to deal with the changes in netcdf4-python: |
# tab|tab|string |attr_definition. | ||
pattern = re.compile('(^\t\t)string (.+)$', re.MULTILINE) | ||
# Replace with tab|tab|attr_definition. | ||
replacement = r'\t\t\2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A raw string means the tab isn't interpreted; how does it pass?
Can this be targeted at |
We have a number of options here:
Enforcing NC_CHARPros:
Cons:
Heuristic acceptance of NC_CHAR or NC_STRING when testingPros:
Cons:
Unicode strings throughoutPros:
Cons:
Heuristic to save either NC_STRING or NC_CHARPros:
Cons:
I'll keep this comment up-to-date as more pros/cons/options emerge. |
In order to put the above comment together in a coherent way, I was experimenting with our options and have put together #2158. I do not currently advocate that PR over this one, but would like to raise awareness of that as an option. While this PR represents a reduction in test certainty, PR #2158 represents a change in behaviour... |
I think we should be wary of a change in behaviour for all files, that would need to be justified however, #2158 looks like all the test results are the same, so is this approach actually maintaining the same behaviour through netcdf4-python changes? in #2158 none of the test results files are altered; apart from pep8 the tests pass @pelson So, how is this a behaviour change, how does that manifest? |
i think that #2158 represents a continuation of current Iris behaviour. I agree that it would be a behaviour reversion for people already using the latest netcdf4-python, but given the limited scope of the change, I think this represents a well constrained edge case I think that #2158 is preferable to this PR and represents a sensible level of protection for Iris' netcdf output from netcdf4python changes, I suggest this PR is retired, unless #2158 proves problematic |
i think that is a question for #2158, given my last comment I will move the comment across to that ticket for further discussion. I am not convinced that it is required, but I have no particular objections. I want this in master and iris 1.11 which I currently feel needs to be imminent |
#2158 replaced this one? |
Thanks @QuLogic. It did indeed. |
Behaviour changes in netCDF4-python have triggered test failures for v1.1 and 1.2 of netcdf4-python
With these versions, some attributes are typed at
string
where they previously were not. There is ongoing work and discussion which may alter this behaviour again, somewhere in netcdf-c netcdf4-python and/or hdf5see:
Unidata/netcdf-c#298
Unidata/netcdf4-python#529
Unidata/netcdf4-python#575
amongst others for further details.
My suggestion is an explicit work around through assertCDL, into _assert_str_same to allow outputs to be interpreted as the same, even though they are manifestly not
I'm not sure I like this approach, but I find myself suggesting it.