Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make subclassing easier? #3980

Open
TomNicholas opened this issue Apr 17, 2020 · 9 comments
Open

Make subclassing easier? #3980

TomNicholas opened this issue Apr 17, 2020 · 9 comments

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Apr 17, 2020

Suggestion

We relatively regularly have users asking about subclassing DataArray and Dataset, and I know of at least a few cases where people have gone through with it. However we currently explicitly discourage doing this, on the basis that basically all operations will return a bare xarray object instead of the subclassed version, it's full of trip hazards, and we have the accessor interface to point people to instead.

However, while useful, the accessors aren't enough for some users, and I think we could probably do better. If we refactored internally we might be able to make it much easier to subclass.

Example to follow in Pandas

Pandas takes an interesting approach: while they also explicitly discourage subclassing, they still try to make it easier, and show you what you need to do in order for it to work.

They ask you to override some constructor properties with your own, and allow you to define your own original properties.

Potential complications

  • .construct_dataarray and DataArray.__init__ are used a lot internally to reconstruct a DataArray from dims, coords, data etc. before returning the result of a method call. We would probably need to standardise this, before allowing users to override it.

  • Pandas actually has multiple constructor properties you need to override: _constructor, _constructor_sliced, and _constructor_expanddim. What's the minimum set of similar constructors we would need?

  • Blocking access to attributes - we current stop people from adding their own attributes quite aggressively, so that we can have attributes as an alias for variables and attrs, we would need to either relax this or better allow users to set a list of their own _properties which they want to register, similar to pandas.

  • __slots__ - I think something funky can happen if you inherit from a class that defines __slots__?

Documentation

I think if we do this we should also slightly refactor the relevant docs to make clear the distinction between 3 groups of people:

  • Users - People who import and use xarray at the top-level with (ideally) no particular concern as to how it works. This is who the vast majority of the documentation is for.
  • Developers - People who are actually improving and developing xarray upstream. This is who the Contributing to xarray page is for.
  • Extenders - People who want to subclass, accessorize or wrap xarray objects, in order to do something more complicated. These people are probably writing a domain-specific library which will then bring in a new set of users. There maybe aren't as many of these people, but they are really important IMO. This is implicitly who the xarray internals page is aimed at, but it would be nice to make that distinction much more clear. It might also be nice to give them a guide as to "I want to achieve X, should I use wrapping/subclassing/accessors?"

@max-sixty you had some ideas about what would need to be done for this to work?

@jhamman
Copy link
Member

jhamman commented Apr 17, 2020

Thanks @TomNicholas for opening up this issue. Just cross referencing one more issue: #1097

@keewis
Copy link
Collaborator

keewis commented Apr 19, 2020

There is also #3582 where the recommendation was to wrap instead of subclassing.

@mueslo
Copy link

mueslo commented May 1, 2022

Still relevant

@stale stale bot removed the stale label May 1, 2022
@wmayner
Copy link

wmayner commented Dec 7, 2022

Does xarray have anything like NumPy's dispatch mechanism? This would make it relatively easy to encapsulate domain-specific logic inside a wrapper class, rather than a subclass or an accessor namespace.

@headtr1ck
Copy link
Collaborator

Does xarray have anything like NumPy's dispatch mechanism?

Yes, xarray fully supports this.

@headtr1ck
Copy link
Collaborator

I think we have made good progress to better support this, only requirement is that the __init__ method of the subclass has the same syntax as the original xarray implementation.

Maybe in the future we can relax this as well and use a shortcut internally.

The only thing which probably won't work anytime soon is custom Datasets returning custom DataArrays.

@TomNicholas
Copy link
Member Author

@pydata/xarray should this feature be added to our development roadmap? It's arguably another approach to making more flexible data structures...

@dcherian
Copy link
Contributor

dcherian commented Feb 28, 2023

Following this very long discussion on propagating grid information with Xarray objects, this group wants to subclass and attach unstructured grid information to their derived classes.

I invited them to the meeting tomorrow to discuss a public subclassing interface as proposed here. Come with opinions!

@headtr1ck
Copy link
Collaborator

I will not be able to join tomorrow, so here are my thoughts on this topic:

  • How to work with subclassed Datasets that should return DataArrays? I think the pandas approach is good, we add Dataset.dataarray_cls = DataArray and DataArray.dataset_cls = Dataset which subclasses can overwrite.
  • How to deal with subclasses that change the constructor aka. __init__? Probably we need to introduce a kind of shortcut constructor that will be called instead in functions that create new instances (right now the subclass needs the same constructor signature as xarrays version). But then custom instance attributes will not be set etc. Not sure what is the best approach here.
  • Is the whole thing limited to DataArrays and Datasets or can one also subclass Variables?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants