Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "unique()" method, mimicking pandas #2795

Open
ahuang11 opened this issue Feb 28, 2019 · 6 comments
Open

Add "unique()" method, mimicking pandas #2795

ahuang11 opened this issue Feb 28, 2019 · 6 comments

Comments

@ahuang11
Copy link
Contributor

ahuang11 commented Feb 28, 2019

Would it be good to add a unique() method that mimics pandas?

import pandas as pd
import xarray as xr
pd.Series([0, 1, 1, 2]).unique()
xr.DataArray([0, 1, 1, 2]).unique()  # not implemented

Output:

array([0, 1, 2])
AttributeError: 'DataArray' object has no attribute 'unique'
@shoyer
Copy link
Member

shoyer commented Mar 4, 2019

What would .unique() return on xarray.DataArray? For consistency with pandas, I guess it would return a 1D numpy or dask array?

I don't see a lot of value in adding this to xarray, given that all the xarray metadata gets lost by the unique() operation. You might as well just write np.unique(my_data_array.data).

@ahuang11
Copy link
Contributor Author

ahuang11 commented Mar 5, 2019

Right, it would return a 1D numpy or dask array.

I suppose I'm used to simply typing pd.Series().unique() rather than np.unique(pd.Series()).

I use it in for loops primarily.
for season in da['time.season'].unique():
vs
for season in np.unique(da['time.season'].data):

@kripnerl
Copy link

Hi, I also vote for this function, My typical use-case.

There is some structure in 3D space and I need to "flatten it" to 2D. Let us say it is axially symetric so I assign R and Z coordinate to points (or r and theta in polar). And I want to simplify this using interp; however, it requuires unique coordinates.

I have some solution here: https://stackoverflow.com/questions/51058379/drop-duplicate-times-in-xarray

and adapted this into actuall function:

def distribure_uniform(ds, N_points=512):

    ds_theta = ds.sortby("theta").swap_dims({"idx": "theta"})
    _, index = np.unique(ds_theta['theta'], return_index=True)

    ds_theta = ds_theta.isel(theta=index)

    ds_theta = ds_theta.interp(
        theta=np.linspace(ds.theta.min(), ds.theta.max(), N_points))

    ds_theta = ds_theta.swap_dims({"theta": "idx"})
    return ds_theta

In an idal case I would like to write something like this:

def distribure_uniform(ds, N_points=512):

    ds_theta= ds.unique("theta", sorted=False, sort=True)

    ds_theta = ds_theta.swap_dims({"idx": "theta"})
    ds_theta = ds_theta.interp(
        theta=np.linspace(ds.theta.min(), ds.theta.max(), N_points))
    ds_theta = ds_theta.swap_dims({"theta": "idx"})
    return ds_theta

This was referenced Mar 29, 2021
@aaronsarna
Copy link

A case I ran into where supporting .unique() in the pandas sense would be helpful is when an object dtype is used to support nullable strings:

>>> ar = xr.DataArray(np.array(['foo', np.nan], dtype='object'), coords={'bar': range(2)}, name='foo')
>>> np.unique(ar.data)
TypeError: '<' not supported between instances of 'float' and 'str'
>>> ar.to_dataframe().foo.unique()
array(['foo', nan], dtype=object)

@aaronsarna
Copy link

Actually, pd.unique(ar) also works fine here, so maybe there's no need to add it to xarray.

@aaronsarna
Copy link

I guess the limitation on using pd.unique() is that it requires 1D data. pd.unique(ar.data.flatten()) isn't so painful, but that feels like the kind of thing xarray should do for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants