Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collapsing over a lazy auxiliary coordinate results in a netCDF4 error when the cube is saved as *.nc file. #4599

Closed
schlunma opened this issue Feb 23, 2022 · 14 comments
Assignees

Comments

@schlunma
Copy link
Contributor

🐛 Bug Report

Collapsing a cube over a lazy auxiliary coordinate results in the following error when the cube is save as nc file:

ValueError: slicing expression exceeds the number of dimensions of the variable

I tested this on two independent machines. Some additional hints:

  • The error only appears in iris=3.2.0.post0, not for iris=3.1.0.
  • Adding a print(cube) between the collapsing the the save somehow fixes this issue.
  • Using a non-lazy auxiliary coordinate also fixes this issue.
  • Using a lazy dimensional coordinate also does not trigger the error (if used without a lazy auxiliary coordinate).

How To Reproduce

import dask.array as da
import numpy as np
import os

import iris
from iris.coords import AuxCoord, DimCoord
from iris.cube import Cube


print("iris version:", iris.__version__)


# Create cube with lazy aux coord and aggregate over this dimension
dim_coord = DimCoord(np.arange(10), var_name='time')
aux_coord = AuxCoord(da.arange(10), var_name='year')  # the "da" is important here!
cube = Cube(np.arange(10),
            var_name='x',
            dim_coords_and_dims=[(dim_coord, 0)],
            aux_coords_and_dims=[(aux_coord, 0)],
           )

cube = cube.collapsed('time', iris.analysis.MEAN)


# Adding a print() somehow fixes this issue
# print(cube)


# Saving this cube gives the error
filename = os.path.expanduser('~/test_iris_32.nc')
iris.save(cube, filename)

Expected behaviour

No error, similar to iris=3.1.0.

Environment

  • OS & Version: openSUSE Tumbleweed 20220221; Red Hat Enterprise Linux Server release 6.10
  • Iris Version: 3.2.0.post0

Additional context

Full traceback
Traceback (most recent call last):
  File "/home/manuel/Tresorit/DLR/scripts/iris/iris32bug.py", line 31, in <module>
    iris.save(cube, filename)
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/io/__init__.py", line 436, in save
    saver(source, target, **kwargs)
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/fileformats/netcdf.py", line 3167, in save
    sman.write(
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/fileformats/netcdf.py", line 1242, in write
    self._add_aux_coords(cube, cf_var_cube, cube_dimensions)
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/fileformats/netcdf.py", line 1569, in _add_aux_coords
    return self._add_inner_related_vars(
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/fileformats/netcdf.py", line 1534, in _add_inner_related_vars
    cf_name = self._create_generic_cf_array_var(
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/fileformats/netcdf.py", line 2407, in _create_generic_cf_array_var
    self._create_cf_bounds(element, cf_var, cf_name)
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/fileformats/netcdf.py", line 2069, in _create_cf_bounds
    self._lazy_stream_data(
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/fileformats/netcdf.py", line 2899, in _lazy_stream_data
    is_masked, contains_fill_value = store(
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/fileformats/netcdf.py", line 2874, in store
    da.store([data], [target])
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/dask/array/core.py", line 1163, in store
    compute_as_if_collection(Array, store_dsk, map_keys, **kwargs)
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/dask/base.py", line 317, in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/dask/threaded.py", line 81, in get
    results = get_async(
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/dask/local.py", line 506, in get_async
    raise_exception(exc, tb)
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/dask/local.py", line 314, in reraise
    raise exc
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/dask/local.py", line 219, in execute_task
    result = _execute_task(task, data)
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/dask/array/core.py", line 4164, in store_chunk
    return load_store_chunk(x, out, index, lock, return_stored, False)
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/dask/array/core.py", line 4151, in load_store_chunk
    out[index] = x
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/iris/fileformats/netcdf.py", line 972, in __setitem__
    self.target[keys] = arr
  File "src/netCDF4/_netCDF4.pyx", line 4903, in netCDF4._netCDF4.Variable.__setitem__
  File "/home/manuel/mambaforge/envs/xxx/lib/python3.10/site-packages/netCDF4/utils.py", line 335, in _StartCountStride
    raise ValueError("slicing expression exceeds the number of dimensions of the variable")
ValueError: slicing expression exceeds the number of dimensions of the variable
@valeriupredoi
Copy link

cheers, Manu! This looks to be direct iris-dask miscommunication issue, calling print will realize the coords so that will get out of dask, hence no problem. Can you pls post the dask version in the env too, and maybe check see if bumping the dask version/downgrading it results in the correct behaviour? 🍺

@schlunma
Copy link
Contributor Author

Changing the version of iris changes no other package than iris. In both cases dask-core=2022.2.0=pyhd8ed1ab_0.

You are right; print(cube) realizes the coordinates, didn't know that!

Environments:
iris31.txt
iris32.txt

@schlunma schlunma changed the title Collapsing over a lazy auxiliary coordinate results in a netCDF4 error when the cube is save as *.nc file. Collapsing over a lazy auxiliary coordinate results in a netCDF4 error when the cube is saved as *.nc file. Feb 24, 2022
@zklaus
Copy link
Contributor

zklaus commented Feb 24, 2022

@bjlittle, just a heads-up that this issue probably makes us pin iris <3.2 in our release that is just being prepared, which is a shame really cause we are eager to dig into the unstructured stuff.

@valeriupredoi
Copy link

many thanks @schlunma both for raising this and for the excellent minimal code to reproduce the issue, I confirm I can reproduce it, and with the following package specs (the ones that matter):

rc3-3) valeriu@valeriu-PORTEGE-Z30-C:~$ conda list netcdf
# packages in environment at /home/valeriu/miniconda3/envs/rc3-3:
#
# Name                    Version                   Build  Channel
netcdf4                   1.5.8           nompi_py310hd7ca5b8_101    conda-forge
libnetcdf                 4.8.1           mpi_mpich_h319fa22_1    conda-forge
(rc3-3) valeriu@valeriu-PORTEGE-Z30-C:~$ conda list iris
# packages in environment at /home/valeriu/miniconda3/envs/rc3-3:
#
# Name                    Version                   Build  Channel
iris                      3.2.0.post0        pyhd8ed1ab_0    conda-forge
(rc3-3) valeriu@valeriu-PORTEGE-Z30-C:~$ conda list dask
# packages in environment at /home/valeriu/miniconda3/envs/rc3-3:
#
# Name                    Version                   Build  Channel
dask                      2022.2.0           pyhd8ed1ab_0    conda-forge
dask-core                 2022.2.0           pyhd8ed1ab_0    conda-forge

since netcdf libs and dask have not changed recently (dask is the youngest at 12 days old), and the fact that you checked with identical environments but different iris's I am confident this is an iris bug, I'll look around see if I can find the cause, but by all means, let's see what @bjlittle @lbdreyer and the good SciTools folk can say about it 🍺

@valeriupredoi
Copy link

the problem occurs when len(elem) exceeds the number of dimensions nDims in netCDF4/utils.py - indeed the slices iterable elem is len 2 [slice(0, 1, None), slice(0, 1, None)] and that raises the exception. In the case of scalars (coordinates) in the loop starting at line 298 newElem is constructed from the very last optional:

        else:
            newElem.append(e)

if there was a break there if nDims == 1 and no more appending, the show runs fine. Obviously, this is the netCDF4 library so it is correct, so then by reverse engineering, it means that iris is passing a few too many slices to dask in this particular case, and indeed:

arr = [[0]] with keys = (slice(0, 1, None), slice(0, 1, None))

so there are 2 identical slices but one single key to save in target, there is a duplication of slices going on somewhere - maybe you need a set() somewhere? slice object is unhashable so it may be a bit more than a set operator

@valeriupredoi
Copy link

OK I kluged the problem by adding a check on keys and arr in iris/fileformats/netcdf.py l.968:

    def __setitem__(self, keys, arr):
        if self.fill_value is not None:
            self.contains_value = self.contains_value or self.fill_value in arr
        if len(keys) > 1 and len(arr) == 1:
            keys = keys[0]
        self.is_masked = self.is_masked or ma.is_masked(arr)
        self.target[keys] = arr

am sure this is (very) wrong but it gives you an idea of the bit that's malfunctioning 😁

@uwefladrich
Copy link

uwefladrich commented Mar 1, 2022

For what it's worth (given that @valeriupredoi already dug deeper), just printing the coordinates (points/bounds), thus making sure that has_lazy_points/has_lazy_bounds return false, doesn't seem to do the trick. While printing the cube does.

EDIT: See below, I made a mistake while doing this test. When has_lazy_bounds() returns false for the coord in question, things should work afterwards.

@rcomer
Copy link
Member

rcomer commented Mar 1, 2022

From Iris v3.2, coordinate points and bounds are no longer realised when you print the coordinate (see Feature 12 in the whatsnew. If you access the bounds of the collapsed "year" coordinate, the example runs OK. It doesn't seem to matter if the points stay lazy.

@valeriupredoi
Copy link

have a look at this test module I wrote exactly to test the behaviour pointed out by @schlunma here, and account for cases when the coords and data are lazy or not - might even be helpful to you guys to test after you've put in the bugfix ESMValGroup/ESMValCore#1510

@uwefladrich
Copy link

From Iris v3.2, coordinate points and bounds are no longer realised when you print the coordinate

I see. Thanks for pointing out!

I was redoing my test and can basically confirm what you say. Printing (realising) the coordinate bounds is enough to make it work, the coordinate points can still be lazy.

@trexfeathers trexfeathers self-assigned this Mar 1, 2022
@trexfeathers
Copy link
Contributor

Another change for v3.2 is the lazy streaming of all arrays (#4375), which explains why you were not seeing this before v3.2.

No changes have been made to the _FillValueMaskCheckAndStoreTarget class itself, but coordinate saving newly goes via this class where it didn't before. It seems likely that I didn't account for the particular properties of a collapsed coordinate's Dask array, which has caused an inappropriate slicing operation to be attempted.

I'll investigate further, thanks for the reproducing code ❤

@trexfeathers
Copy link
Contributor

trexfeathers commented Mar 11, 2022

The fix has now been released in Iris v3.2.1 🙂.

@zklaus
Copy link
Contributor

zklaus commented Mar 11, 2022

Great, thanks guys!

@schlunma
Copy link
Contributor Author

Thanks guys!! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants