Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to load a NetCDF dataset #5067

Open
timo0thy opened this issue Nov 14, 2022 · 14 comments
Open

Failing to load a NetCDF dataset #5067

timo0thy opened this issue Nov 14, 2022 · 14 comments

Comments

@timo0thy
Copy link

timo0thy commented Nov 14, 2022

馃悰 Bug Report

I am failing to load a NetCDF file, getting an 'index out of range' error. The properties of the dataset and the error code are provided in the screenshots below. This issue is unique to the latest version of iris as I tried with version 3.2.1 and it works.

How To Reproduce

Steps to reproduce the behaviour:

  1. Calculate Fire Weather Index using the xclim package
  2. Save the output as a NetCDF file using the to_netcdf command
  3. Load the dataset using iris 3.3.1 (the same error is produced no matter whether iris.load, iris.load_cube, or even the xarray plugin to_iris is used)

Expected behaviour

With version 3.2.1 I am able to load and plot the dataset as follows:
image
image

Screenshots

The dataset (opened and plotted using xarray):
image
image

The error code when loading the dataset using iris 3.3.1:
image
image
image

Environment

  • OS & Version: Ubuntu 18.04
  • Iris Version: 3.3.1

Additional context

Click to expand this section...
Please add additional verbose information in this section e.g., code, output, tracebacks, screenshots etc
@trexfeathers
Copy link
Contributor

Hi @timo0thy, we're definitely interested in why this behaviour changed recently. But we can't proceed any further without the file. Are you able to share this? Or at the very least share the ncdump -h output? Thanks

@lbdreyer
Copy link
Member

It looks like the cell_methods attribute in your file is not formatted correctly.

From the looks of your xarray output I can see that your file contains a cell_methods attribute but nothing appears to be displaying which seems to suggest it may be an empty string?

When Iris interprets a file, if it finds a cell_methods attribute, it will take this string and try to parse it to create Iris CellMethod objects. It looks like Iris is struggling to parse the string as it doesn't follow the format it expects and so is throwing an error, albeit not a particularly illuminating error! I will look into how we can improve so that error is more helpful in future!

Assuming the cell_methods attribute exists but is just empty, you could consider removing it prior to loading with Iris. There are different tools for doing this, for example you could use ncatted as follows:
ncatted -O -a cell_methods,fwi,d,, fwi.nc
That will delete the cell_methods attribute from the fwi variable in your fwi.nc file.

@timo0thy
Copy link
Author

Hi both,

Thank you very much for looking into this. Please see attached the fwi.nc file (it's renamed to fwi_extract.nc as I had to extract the first 1000 timesteps to pass the file size requirement). Same issue anyway.

fwi_extract.zip

Yes I agree that it's a strangely formatted file, thanks for suggesting the workaround (haven't tried it though, as there's no issue with the older iris version).

Tim

@trexfeathers trexfeathers self-assigned this Nov 21, 2022
@trexfeathers
Copy link
Contributor

@timo0thy thanks for the file.

The change in behaviour was introduced in #4436. As @lbdreyer suspects: the file cell methods are an empty string (""). In the past Iris was able to move on without adding any cell methods to the Cube.

Assuming the cell_methods attribute exists but is just empty ...

We could actually cope with None, but not an empty string. I don't know if None is possible in a NetCDF file.

@lbdreyer presumably we should look at enabling a skip with a warning, rather than crashing out with an exception?

@pp-mo
Copy link
Member

pp-mo commented Nov 21, 2022

I don't know if None is possible in a NetCDF file.

I don't think so. Even attributes have a definite type, string or numeric, and 'None' is not a possible value.

@larsbarring
Copy link
Contributor

I ran your test file through one of the online CF checkers available (I used this particular one), and it indeed complained that the cell_methods is malformed:

... ...
ERROR: (7.3): Invalid syntax for cell_methods attribute
... ...

CF-1.10 Section 7.3, fourth para begins:

The default interpretation for variables that do not have the cell_methods attribute specified depends on whether the quantity is extensive...

The CF checker obviously interprets this as either the attribute is not present in which case the default kicks in, or the attribute is present, and its value has then to be properly formatted. Personally I think a slightly more relaxed, or lenient, approach might be helpful: to more focus on the presence of the method as such, i.e. the empty string is the same as no cell method present. A warning about this deviation from CF might then be helpful.

@pp-mo
Copy link
Member

pp-mo commented Nov 23, 2022

@SciTools/peloton prefer a tolerant + warning approach

@lbdreyer
Copy link
Member

I'm not keen on the lenient loading approach. The file is malformed, why should a user expect Iris to load it? I would prefer Iris raise a helpful, informative error and then the user could contact the data owner to get the file fixed, or fix it themself.

@pp-mo
Copy link
Member

pp-mo commented Nov 23, 2022

@lbdreyer I'm not keen on the lenient loading approach. The file is malformed, why should a user expect Iris to load it? I would prefer Iris raise a helpful, informative error and then the user could contact the data owner to get the file fixed, or fix it themself.

In my opinion, and experience. there can be a variety of reasons why this approach is unhelpful to the user.

  1. Iris does not actually require that files are CF encoded
    • we intend to load non-CF files, i.e. general netcdf
    • it's only things which mis-use CF defined terms which cause actual trouble in Iris loading
      • i.e. mis-use of defined attributes like "standard_name", "units", "coordinates" ...
    • in my experience, such terms can be rather freely used by data creators
      who are not aware of or interested in CF (just copying ideas),
      or don't know the whole of CF (which of us does?).
      It's a problem because that CF defined terms are not namespaced (e.g. "cf_standard_name", "cf_units"), because they were mostly adopted from earlier conventions
  2. it is not always easy for the user to modify an input file
    • usage of netCDF4 or nco tools may not be familiar
    • file is read-only for user
    • file is shared, but uncomfortably large to make a copy
  3. the file may be from a standard data repository.
    which typically has its own defined format conventions, but not always rigorously good CF
    In those cases, it's really not very helpful saying the file is "not good CF"
    Like the "someone else created this" objection, but worse
    A good recent example has been the CORDEX repository, which has caused problems like this one
    It may also in this cases be inappropriate to retrospectively change historical data that has been submitted + accepted.

So, I feel that Iris loading "should have been" largely tolerant from the outset : It is usually possible to see where+how the code can fail to interpret in the usual way. And practically, if something can't be interpreted in the usual way, it is generally possible to ignore it (and maybe raise a warning).

So far, we do actually do this for a bunch of common errors, all of which were raised in experience :

It feels like I should be writing this in a discussion.
Does anyone think that is worth it ?

@lbdreyer
Copy link
Member

lbdreyer commented Nov 23, 2022

It feels like I should be writing this in a discussion.
Does anyone think that is worth it ?

Yes, perhaps this should be discussed elsewhere. Most of the comments you raise are about lenient/tolerant loading in general rather than this specific issue so I'm not sure it makes sense to address them here?

@wjbenfold
Copy link
Contributor

Does anyone think that is worth it?

I'd be interested in that discussion, albeit rather abstractly

@larsbarring
Copy link
Contributor

@lbdreyer I'm not keen on the lenient loading approach. The file is malformed, why should a user expect Iris to load it? I would prefer Iris raise a helpful, informative error and then the user could contact the data owner to get the file fixed, or fix it themself.

The CF checker reported on several other problems with the test file. So, in this particular case I think that it might be useful to contact the xclim developers. But I know first hand that it is a struggle to squeeze climate indices (et al.) into CF. Nevertheless, things are slowly improving; just recently the canadian_fire_weather_index was accepted as standard name.

More generally, it is not a viable option to always expect the data owners to fix the files. For example, there are literally tens of thousands of files on ESGF (CMIP5 and CORDEX) that are more or less malformated and they are never going to be fixed. Still they are a valuable resource.

@pp-mo
Copy link
Member

pp-mo commented Jan 4, 2023

I believe #5126 should address this (among some other related problems) -- it seems to work for me.
Can you confirm, @larsbarring ?

@larsbarring
Copy link
Contributor

Works also for me:

$ git status
On branch cellmethod_tolerance
Your branch is up to date with 'pp-mo/cellmethod_tolerance'.

nothing to commit, working tree clean
$ python
Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import iris
>>> iris.__version__
'3.5.0.dev15'
>>> c = iris.load_cube("/home/a001257/Downloads/fwi_extract.nc")
>>> print(c)
Fire weather index / (unknown)      (latitude: 26; longitude: 34; time: 1000)
    Dimension coordinates:
        latitude                             x              -         -
        longitude                            -              x         -
        time                                 -              -         x
    Attributes:
        description                 'Numeric rating of fire intensity.'
        history                     'tas: \npr: \nsfcWind: \nhurs: \nlat: \n[2022-11-18 10:46:26] fwi: FWI(tas=t2m, ...'
        invalid_standard_name       'fire_weather_index'
>>> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

6 participants