Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert_identical fails to generate diff when an array is an attribute value #9153

Closed
5 tasks done
DocOtak opened this issue Jun 22, 2024 · 0 comments · Fixed by #9169
Closed
5 tasks done

assert_identical fails to generate diff when an array is an attribute value #9153

DocOtak opened this issue Jun 22, 2024 · 0 comments · Fixed by #9169

Comments

@DocOtak
Copy link
Contributor

DocOtak commented Jun 22, 2024

What happened?

I'm writing a bunch of tests for one of my code bases that checks that the output of some operation done on a dataset has the expected results. As such, I'm using xarray.testing. assert_identical in these tests. I discovered that the assertion would fail occasionally with a ValueError rather than the expected AssertionError. Poking at different combinations of inputs, it appears to only fail when comparing a Dataset with non identical DataArrays that both contain an attribute that isn't comparable with ==

What did you expect to happen?

An AssertionError to be raised with the appropriate diff.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

ds1 = xr.Dataset({"t1": xr.DataArray([1], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
ds2 = xr.Dataset({"t1": xr.DataArray([2], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
xr.testing.assert_identical(ds1, ds2)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 6
      4 ds1 = xr.Dataset({"t1": xr.DataArray([1], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
      5 ds2 = xr.Dataset({"t1": xr.DataArray([2], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
----> 6 xr.testing.assert_identical(ds1, ds2)

    [... skipping hidden 2 frame]

File ~/.dotfiles/pyenv/versions/3.12.3/envs/jupyter/lib/python3.12/site-packages/xarray/core/formatting.py:974, in diff_dataset_repr(a, b, compat)
    971 summary.append(diff_dim_summary(a, b))
    972 summary.append(diff_coords_repr(a.coords, b.coords, compat, col_width=col_width))
    973 summary.append(
--> 974     diff_data_vars_repr(a.data_vars, b.data_vars, compat, col_width=col_width)
    975 )
    977 if compat == "identical":
    978     summary.append(diff_attrs_repr(a.attrs, b.attrs, compat))

File ~/.dotfiles/pyenv/versions/3.12.3/envs/jupyter/lib/python3.12/site-packages/xarray/core/formatting.py:824, in _diff_mapping_repr(a_mapping, b_mapping, compat, title, summarizer, col_width, a_indexes, b_indexes)
    820 b_attrs = b_mapping[k].attrs
    822 attrs_to_print = set(a_attrs) ^ set(b_attrs)
    823 attrs_to_print.update(
--> 824     {k for k in set(a_attrs) & set(b_attrs) if a_attrs[k] != b_attrs[k]}
    825 )
    826 for m in (a_mapping, b_mapping):
    827     attr_s = "\n".join(
    828         "    " + summarize_attr(ak, av)
    829         for ak, av in m[k].attrs.items()
    830         if ak in attrs_to_print
    831     )

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Anything else we need to know?

Comparing the underlying DataArray objects works as expected:

import xarray as xr
import numpy as np

ds1 = xr.Dataset({"t1": xr.DataArray([1], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
ds2 = xr.Dataset({"t1": xr.DataArray([2], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
xr.testing.assert_identical(ds1.t1, ds2.t1)
Traceback
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[3], line 6
      4 ds1 = xr.Dataset({"t1": xr.DataArray([1], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
      5 ds2 = xr.Dataset({"t1": xr.DataArray([2], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
----> 6 xr.testing.assert_identical(ds1.t1, ds2.t1)

    [... skipping hidden 1 frame]

File [~/.dotfiles/pyenv/versions/3.12.3/envs/jupyter/lib/python3.12/site-packages/xarray/testing/assertions.py:215](http://localhost:8888/~/.dotfiles/pyenv/versions/3.12.3/envs/jupyter/lib/python3.12/site-packages/xarray/testing/assertions.py#line=214), in assert_identical(a, b, from_root)
    213 elif isinstance(a, DataArray):
    214     assert a.name == b.name
--> 215     assert a.identical(b), formatting.diff_array_repr(a, b, "identical")
    216 elif isinstance(a, (Dataset, Variable)):
    217     assert a.identical(b), formatting.diff_dataset_repr(a, b, "identical")

AssertionError: Left and right DataArray objects are not identical

Differing values:
L
    array([1])
R
    array([2])

This potentially looks related to #3711

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.3 (main, May 23 2024, 16:39:17) [Clang 15.0.0 (clang-1500.3.9.4)]
python-bits: 64
OS: Darwin
OS-release: 23.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development

xarray: 2024.6.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.9.0
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.24.0
sphinx: None

@DocOtak DocOtak added bug needs triage Issue that has not been reviewed by xarray team member labels Jun 22, 2024
@TomNicholas TomNicholas added topic-testing and removed needs triage Issue that has not been reviewed by xarray team member labels Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants