Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isotropic_powerspectrum fails for extra dimensions #9

Closed
rabernat opened this issue May 26, 2017 · 12 comments
Closed

isotropic_powerspectrum fails for extra dimensions #9

rabernat opened this issue May 26, 2017 · 12 comments
Labels

Comments

@rabernat
Copy link
Collaborator

rabernat commented May 26, 2017

I think this is a self-contained minimum reproducible example of #6.

isotropic_powerspectrum doesn't work when there are extra dimensions.

da = xr.DataArray(np.random.rand(5,20,30),
                  dims=['time', 'y', 'x'],
                  coords={'time': np.arange(5), 'y': np.arange(20), 'x': np.arange(30)})
xrft.isotropic_powerspectrum(da, dim=['y', 'x'])

error is

ValueError                                Traceback (most recent call last)
<ipython-input-24-6482ba3ce67d> in <module>()
----> 1 xrft.isotropic_powerspectrum(da, dim=['y', 'x'])

/Users/rpa/RND/Public/xrft/xrft/xrft.py in isotropic_powerspectrum(da, dim, shift, remove_mean, density, window, nbins)
    309     ps = power_spectrum(da, dim=dim, shift=shift,
    310                        remove_mean=remove_mean, density=density,
--> 311                        window=window)
    312     if len(ps.dims) > 2:
    313         raise ValueError('The data set has too many dimensions')

/Users/rpa/RND/Public/xrft/xrft/xrft.py in power_spectrum(da, dim, shift, remove_mean, density, window)
    185         ps /= (np.asarray(N).prod())**2
    186         for i in range(len(dim)):
--> 187             ps /= daft[coord[-i-1]].values
    188 
    189     return xr.DataArray(ps, coords=daft.coords, dims=daft.dims)

ValueError: operands could not be broadcast together with shapes (5,20,30) (5,) (5,20,30) 

Same thing happens for isotropic_crossspectrum

cc: @roxyboy

@rabernat rabernat changed the title isotropic_powerspectrum fails for isotropic_powerspectrum fails for extra dimensions May 26, 2017
@roxyboy
Copy link
Member

roxyboy commented May 26, 2017

I think this issue is resolved unless the user tries to apply windowing to dask arrays in Python 2.7 environement.

@rabernat
Copy link
Collaborator Author

Do we have a test case for my example from my original comment?

@rabernat
Copy link
Collaborator Author

rabernat commented Jul 24, 2017

Now if I try this, I get the error

ValueError: The data has too many or few dimensions. The input should only have the two 
dimensions to take the azimuthal averaging over.

Do we consider this closed?

@roxyboy
Copy link
Member

roxyboy commented Jul 24, 2017

I couldn't come up with a clean way to do it so yes, I coded it so that the user would have to use a for loop if the data has more than two dimensions.

@rabernat
Copy link
Collaborator Author

Ok, let's leave the issue open for now then.

@rabernat
Copy link
Collaborator Author

I just hit this issue again. It seems weird that we can't take isotropic spectra with more than 2 dimensions, provided that we are only doing FT in 2 of the dimensions.

@rabernat
Copy link
Collaborator Author

It basically comes down to bincount. Here is one possible solution:
https://stackoverflow.com/questions/40591754/vectorizing-numpy-bincount

The other possible solution would be to define a ufunc that does the azimuthal averaging and then call xarray.apply_ufunc. This would work well with dask, while the SO idea above would not.

@roxyboy
Copy link
Member

roxyboy commented Apr 10, 2019

It seems that the numpy community is "satisfied" with the user using a for loop for np.bincount with multi-dimensional arrays rather than adding a axis argument: numpy/numpy#4330. In other words, as long as we're using np.bincount to take the azimuthal averaging, I think the user is going to have to use a for loop for isotropic_powerspectrum.

I tried circumventing the issue by using dask.array.map_blocks, xarray.apply_ufunc and numpy.apply_along_axis but providing the weights argument for np.bincount seems to significantly complicate the syntax and couldn't get it to work. The syntaxes I tried are:

da = xr.DataArray(np.random.rand(2,16,32),
                  dims=['time', 'y', 'x'],
                  coords={'time': np.array(['2019-04-18', '2019-04-19'],
                                          dtype='datetime64'), 
                         'y': np.arange(16), 'x': np.arange(32)})
f = xr.DataArray(da.data.reshape((2,512)),dims=['time','idx'])
func = lambda a, b: np.bincount(a, weights=b)
dsar.map_blocks(func, xr.DataArray(ridx,dims=['idx']), 
               f.chunk({'time':1}), chunks=f.chunks, dtype=np.float64, new_axis=0)
xr.apply_ufunc(func, xr.DataArray(ridx,dims=['idx']), f)
np.apply_along_axis(np.bincount, -1, xr.DataArray(np.tile(ridx,(2,1)),dims=['time','idx']).data, weights=f.data)

@roxyboy
Copy link
Member

roxyboy commented Apr 10, 2019

@rabernat Kinda unsatisfying that I conceded to using a for loop but this PR allows isotropic_powerspectrum to take arrays with up to 4 dimensions.

@shoyer
Copy link
Contributor

shoyer commented Sep 30, 2020

I also encountered this same problem of an efficient vectorized multi-dimensional bincount for calculating isotropic power spectra. groupby_bins works, but can be really slow and doesn't parallelize very well with dask.

Here's a prototype using xarray.apply_ufunc that seems to work well:
https://nbviewer.jupyter.org/gist/shoyer/6d6c82bbf383fb717cc8631869678737

The underlying implementation of the groupby aggregation relies on "numpy-groupies" and could be upstreamed into xarray:
pydata/xarray#4473

@rabernat
Copy link
Collaborator Author

Wow numpy-groupies is amazing! Could be very useful in xhistogram as well!

@roxyboy
Copy link
Member

roxyboy commented Dec 9, 2020

PR #128 fixed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants