-
I have been working on code with a dataset.groupby("time.year").quantile(np.linspace(0.01, 0.9, 21), dim="time") It basically takes forever (over an hour for ONE dataset). And I have to run this line for upwards of 50 datasets whose results I have to write on disk as Is there a way I can speed this up, or maybe rewrite it in a different way such that I get the same results? I have found that the only way it executes a little faster is by using |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
Is it a dask array? Calling 93 timesteps with groupby and 21 quantiles seem very few? |
Beta Was this translation helpful? Give feedback.
-
I also faced some difficulties to speed up quantile calculation. In my case, using skipna=False was not an option because handling nan values was mandatory. Thus, I created a library (fastnanquantile) for it (it uses numba under the hood). You can install it using pip:
Replace your code with: dataarray.groupby("time.year").map(xrcompat.xr_apply_nanquantile, q=np.linspace(0.01, 0.9, 21), dim="time") Note that you need to use a DataArray instead of a Dataset. Performance gains depends on the data shape. For time composite creation, when you typically have one small dimension (time) and two large dimensions (coordinates), you can expect much faster computation (10x faster or more). Please see this post for a brief benchmark. |
Beta Was this translation helpful? Give feedback.
-
Cc @maresb |
Beta Was this translation helpful? Give feedback.
-
Thanks @lbferreira and @zoj613! This could be useful for my current work. |
Beta Was this translation helpful? Give feedback.
skipna=False
will help a bit but probably not enough.Is it a dask array? Calling
.compute()
(or.load()
?) before the groupby operation might also help.93 timesteps with groupby and 21 quantiles seem very few?