Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conditionally disable bottleneck #5560

Merged
merged 12 commits into from
Aug 12, 2021

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Jul 1, 2021

As this came up in #5424 (and because I can't seem to reliably reproduce expected values in #4972 if it is enabled) this adds a option to disable bottleneck, even if it is installed.

In #5424 it was suggested to also allow replacing bottleneck with numbagg. If that's something we want (and numbagg supports the operations we need) I can try looking into renaming the new option to something like xr.set_options(accelerate_with="bottleneck") (or something else, if someone has a great idea).

Tests are missing because I have no idea how to check that this works (except by mocking bottleneck).

@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2021

Unit Test Results

         6 files  ±0           6 suites  ±0   57m 2s ⏱️ ±0s
16 217 tests ±0  14 483 ✔️ ±0  1 734 💤 ±0  0 ❌ ±0 
90 498 runs  ±0  82 324 ✔️ ±0  8 174 💤 ±0  0 ❌ ±0 

Results for commit 3956b73. ± Comparison against base commit 3956b73.

♻️ This comment has been updated with latest results.

@dcherian
Copy link
Contributor

dcherian commented Jul 2, 2021

We use it in other places too (for e.g.):

if (
_USE_BOTTLENECK
and isinstance(values, np.ndarray)
and bn_func is not None
and not isinstance(axis, tuple)
and values.dtype.kind in "uifc"
and values.dtype.isnative
and (dtype is None or np.dtype(dtype) == values.dtype)
):

xr.set_options(accelerate_with="bottleneck")

I like this idea but we should wait for more input.

@max-sixty
Copy link
Collaborator

Nice!

pandas uses use_bottleneck — and same for numba / numexpr.

To what extent would bottleneck & numba be mutually exclusive? Until everything is implemented in numbagg, I guess they won't be, and we might want separate options. Having a single option would be nicer otherwise.

@keewis
Copy link
Collaborator Author

keewis commented Jul 21, 2021

as discussed in the meeting, we will keep use_bottleneck for now. I still don't know how to test this, but otherwise this should be ready for reviews and merging.

Edit: we also don't have alternatives for functions like duck_array_ops.push (used in ffill etc.) so disabling bottleneck would fail the function. Should we just silently use bottleneck, or raise an error?

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I guess you would have to mock bottleneck to properly test this.

@dcherian
Copy link
Contributor

so disabling bottleneck would fail the function. Should we just silently use bottleneck, or raise an error?

Ah this is a good test!

with xr.set_options(use_bottleneck=False):
	with pytest.raises(...):
		dataarray.ffill()

IMO it should raise an error so that use_bottleneck is a "global" control on whether xarray uses bottleneck or not. The context manager gives the user some flexibility to opt-in to using bottleneck where they want to.

@keewis keewis force-pushed the conditionally-disable-bottleneck branch from 4ce8a6a to cfeffa4 Compare July 30, 2021 11:45
@keewis
Copy link
Collaborator Author

keewis commented Aug 11, 2021

I'm not sure how to test rolling: monkeypatching bottleneck.move_sum does not work because Rolling only accesses that on import, i.e. before the test is executed.

Everything else is should be done, though.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a suggestion on the error message.

Can you open a new issue about testing for rolling? I think it is OK to merge without that.

xarray/core/dataset.py Outdated Show resolved Hide resolved
@dcherian
Copy link
Contributor

Thanks @keewis. We can extend the tests for rolling later.

@dcherian dcherian merged commit 3956b73 into pydata:main Aug 12, 2021
@keewis keewis deleted the conditionally-disable-bottleneck branch August 12, 2021 14:48
dcherian added a commit to dcherian/xarray that referenced this pull request Aug 13, 2021
* upstream/main: (34 commits)
  Use same bool validator as other inputs (pydata#5703)
  conditionally disable bottleneck (pydata#5560)
  Refactor index vs. coordinate variable(s) (pydata#5636)
  pre-commit: autoupdate hook versions (pydata#5685)
  Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670)
  Speed up _mapping_repr (pydata#5661)
  update the link to `scipy`'s intersphinx file (pydata#5665)
  Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663)
  pre-commit: autoupdate hook versions (pydata#5660)
  fix the binder environment (pydata#5650)
  Update api.rst (pydata#5639)
  Kwargs to rasterio open (pydata#5609)
  Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633)
  new blank whats-new for v0.19.1
  v0.19.0 release notes (pydata#5632)
  remove deprecations scheduled for 0.19 (pydata#5630)
  Make typing-extensions optional (pydata#5624)
  Plots get labels from pint arrays (pydata#5561)
  Add to_numpy() and as_numpy() methods (pydata#5568)
  pin fsspec (pydata#5627)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Aug 13, 2021
* upstream/main: (307 commits)
  Use same bool validator as other inputs (pydata#5703)
  conditionally disable bottleneck (pydata#5560)
  Refactor index vs. coordinate variable(s) (pydata#5636)
  pre-commit: autoupdate hook versions (pydata#5685)
  Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670)
  Speed up _mapping_repr (pydata#5661)
  update the link to `scipy`'s intersphinx file (pydata#5665)
  Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663)
  pre-commit: autoupdate hook versions (pydata#5660)
  fix the binder environment (pydata#5650)
  Update api.rst (pydata#5639)
  Kwargs to rasterio open (pydata#5609)
  Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633)
  new blank whats-new for v0.19.1
  v0.19.0 release notes (pydata#5632)
  remove deprecations scheduled for 0.19 (pydata#5630)
  Make typing-extensions optional (pydata#5624)
  Plots get labels from pint arrays (pydata#5561)
  Add to_numpy() and as_numpy() methods (pydata#5568)
  pin fsspec (pydata#5627)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bottleneck bug with unusual strides - causes segfault or wrong number
3 participants