Add "unique()" method, mimicking pandas #2795

ahuang11 · 2019-02-28T18:58:15Z

Would it be good to add a unique() method that mimics pandas?

import pandas as pd
import xarray as xr
pd.Series([0, 1, 1, 2]).unique()
xr.DataArray([0, 1, 1, 2]).unique()  # not implemented

Output:

array([0, 1, 2])
AttributeError: 'DataArray' object has no attribute 'unique'

The text was updated successfully, but these errors were encountered:

shoyer · 2019-03-04T07:58:23Z

What would .unique() return on xarray.DataArray? For consistency with pandas, I guess it would return a 1D numpy or dask array?

I don't see a lot of value in adding this to xarray, given that all the xarray metadata gets lost by the unique() operation. You might as well just write np.unique(my_data_array.data).

ahuang11 · 2019-03-05T00:01:58Z

Right, it would return a 1D numpy or dask array.

I suppose I'm used to simply typing pd.Series().unique() rather than np.unique(pd.Series()).

I use it in for loops primarily.
for season in da['time.season'].unique():
vs
for season in np.unique(da['time.season'].data):

kripnerl · 2020-10-16T11:34:05Z

Hi, I also vote for this function, My typical use-case.

There is some structure in 3D space and I need to "flatten it" to 2D. Let us say it is axially symetric so I assign R and Z coordinate to points (or r and theta in polar). And I want to simplify this using interp; however, it requuires unique coordinates.

I have some solution here: https://stackoverflow.com/questions/51058379/drop-duplicate-times-in-xarray

and adapted this into actuall function:

def distribure_uniform(ds, N_points=512):

    ds_theta = ds.sortby("theta").swap_dims({"idx": "theta"})
    _, index = np.unique(ds_theta['theta'], return_index=True)

    ds_theta = ds_theta.isel(theta=index)

    ds_theta = ds_theta.interp(
        theta=np.linspace(ds.theta.min(), ds.theta.max(), N_points))

    ds_theta = ds_theta.swap_dims({"theta": "idx"})
    return ds_theta

In an idal case I would like to write something like this:

def distribure_uniform(ds, N_points=512):

    ds_theta= ds.unique("theta", sorted=False, sort=True)

    ds_theta = ds_theta.swap_dims({"idx": "theta"})
    ds_theta = ds_theta.interp(
        theta=np.linspace(ds.theta.min(), ds.theta.max(), N_points))
    ds_theta = ds_theta.swap_dims({"theta": "idx"})
    return ds_theta

aaronsarna · 2024-01-08T16:47:03Z

A case I ran into where supporting .unique() in the pandas sense would be helpful is when an object dtype is used to support nullable strings:

>>> ar = xr.DataArray(np.array(['foo', np.nan], dtype='object'), coords={'bar': range(2)}, name='foo')
>>> np.unique(ar.data)
TypeError: '<' not supported between instances of 'float' and 'str'
>>> ar.to_dataframe().foo.unique()
array(['foo', nan], dtype=object)

aaronsarna · 2024-01-08T16:50:50Z

Actually, pd.unique(ar) also works fine here, so maybe there's no need to add it to xarray.

aaronsarna · 2024-01-08T17:31:29Z

I guess the limitation on using pd.unique() is that it requires 1D data. pd.unique(ar.data.flatten()) isn't so painful, but that feels like the kind of thing xarray should do for you.

This was referenced Mar 29, 2021

Add drop duplicates #5089

Closed

Add unique method #5091

Closed

dcherian added topic-interpolation and removed topic-interpolation labels Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "unique()" method, mimicking pandas #2795

Add "unique()" method, mimicking pandas #2795

ahuang11 commented Feb 28, 2019 •

edited

Loading

shoyer commented Mar 4, 2019

ahuang11 commented Mar 5, 2019

kripnerl commented Oct 16, 2020

aaronsarna commented Jan 8, 2024

aaronsarna commented Jan 8, 2024

aaronsarna commented Jan 8, 2024

Add "unique()" method, mimicking pandas #2795

Add "unique()" method, mimicking pandas #2795

Comments

ahuang11 commented Feb 28, 2019 • edited Loading

shoyer commented Mar 4, 2019

ahuang11 commented Mar 5, 2019

kripnerl commented Oct 16, 2020

aaronsarna commented Jan 8, 2024

aaronsarna commented Jan 8, 2024

aaronsarna commented Jan 8, 2024

ahuang11 commented Feb 28, 2019 •

edited

Loading