Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessors are recreated on every access #3205

Closed
djhoese opened this issue Aug 11, 2019 · 3 comments · Fixed by #3209
Closed

Accessors are recreated on every access #3205

djhoese opened this issue Aug 11, 2019 · 3 comments · Fixed by #3209

Comments

@djhoese
Copy link
Contributor

djhoese commented Aug 11, 2019

MCVE Code Sample

  1. Create test_accessor.py in current directory with:
import xarray as xr

@xr.register_dataarray_accessor('test')
class TestDataArrayAccessor(object):
    def __init__(self, obj):
        self._obj = obj
        print("DataArray accessor created")

@xr.register_dataset_accessor('test')
class TestDatasetAccessor(object):
    def __init__(self, obj):
        self._obj = obj
        print("Dataset accessor created")
  1. Run the following code:
import xarray as xr
import numpy as np
import test_accesor

ds = xr.Dataset({'a': xr.DataArray(np.array([1, 2, 3])), 'b': xr.DataArray(np.array([4, 5, 6]))})
ds.test
# Dataset accessor created <accessor created on first access - expected>

ds.test
# <no output - expected>

ds['a'].test
# DataArray accessor created <accessor created on first access - expected>

ds['a'].test
# DataArray accessor created <accessor created on second access - unexpected>

var = ds['a']
var.test
# DataArray accessor created <accessor created on direct instance access - expected/unexpected>

var.test
# <no output - expected>

Expected Output

Based on the xarray accessor documentation I would have assumed that the accessor would stick around on the same DataArray object for the life of the data. My guess is that Dataset.__getitem__ is recreating the DataArray every time from the underlying Variable object which means ds['a'] is not ds['a']?

Problem Description

I'm currently working on an accessor for a new package called geoxarray to address the issues talked about in #2288. One of the cases I'm trying to handle is a NetCDF file with CR standard grid_mapping variables. This means that the easiest way to set a CRS object for all variables in a Dataset is to have the Dataset accessor use the accessor of every DataArray (to cache a _crs property). However, with the way things are working doing something like the below won't work:

ds.geo.apply_grid_mapping()
ds['Rad'].geo.crs  # this is current `None` because the DataArray was recreated

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2

xarray: 0.12.3
pandas: 0.25.0
numpy: 1.17.0
scipy: None
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.2.0
distributed: 2.2.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.0.1
pip: 19.2.2
conda: None
pytest: None
IPython: 7.7.0
sphinx: None

@shoyer
Copy link
Member

shoyer commented Aug 12, 2019

My guess is that Dataset.__getitem__ is recreating the DataArray every time from the underlying Variable object which means ds['a'] is not ds['a']?

Yes, I believe this is the case.

@djhoese
Copy link
Contributor Author

djhoese commented Aug 12, 2019

So I guess the questions are:

  1. Is creating a new DataArray for Dataset.__getitem__ important enough that the above behavior is expected and should be documented as a known limitation/gotcha? Or...

  2. Accessors should work when used like this and the Dataset.__getitem__ creation of DataArrays should be "fixed" or worked around to handle this better?

  3. Or some other option?

@shoyer
Copy link
Member

shoyer commented Aug 12, 2019

I'm leaning towards (1), unfortunately.

Xarray's data model doesn't keep around the DataArray objects that are part of a Dataset, only Variable objects. So even if we cached DataArray objects on a Dataset, they would get lost as soon as you do any sort of more complex computation or data manipulation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants