[BUG] Integer promotion fixes needed for NumPy 2 for comparison operators #16282

seberg · 2024-07-15T19:05:14Z

While NumPy 2 and pandas promotions should now work mostly correctly on the main branch (tests pass with cupy==12.3.1 (not actually released) and numpy==2.0.0, the comparison operators are still a problem.

As of now the following new test would just fail:

@pytest.mark.parametrize(
    "dtype,val",
    [("int8", 200), ("int32", 2**32), ("uint8", -128), ("uint64", -1)],
)
def test_series_compare_integer(dtype, val):
    # Tests that these actually work, even though they are out of bound.
    force_cast = np.array(val).astype(dtype)
    sr = Series(
        [np.iinfo(dtype).min, np.iinfo(dtype).max, force_cast], dtype=dtype
    )
    assert (sr != val).all()
    assert not (sr == val).any()

    if val < 0:
        assert (sr > val).all()
    else:
        assert (sr < val).all()

Because when comparing an integer series and a 0-D object, we would assume it is OK to use the series' dtype even when the integer does not fit.

The solution to this should not be very complicated. For integer comparison operations change normalize_binop_value to:

Check whether series.dtype.type(other_integer) fits (if yes OK)
If it does not fit, we need some fallback path, either:
- convert to a NumPy array (which will end up upcasting the series), or
- add an immediate return path since in that case the result is known to be all True or all False. (i.e. avoid an upcast and also guarantee correct behavior for any scalar comparison).

I may look into it now that the other things are merged. But if anyone has a preference for the approach, happy to hear it.

N.B. for series to series comparisons, pandas/numpy now compare uint64 and int64 correctly by value (for scalars the above fixes already do that when they work). One could fix that, but it is a separate issue and seems lower priority.

The text was updated successfully, but these errors were encountered:

jakirkham · 2024-08-01T07:49:37Z

If this will be fixed in CuPy 13.3.0, is there anything more for us to do here?

seberg · 2024-08-01T08:01:59Z

This needs to be fixed in CuDF itself to avoid users running into annoying limitations (such as uint32_series > -1 failing).

It isn't a difficult thing as such: We need a custom promotion logic for comparisons, so getting a good-enough fix should be quick. But opened an issue, because I thought someone deeper in cudf may have a better thought on the best approach.

) When Python integers are compared to a series of integers, the result can always be correctly defined no matter the values of the Python integer. This was always a very mild issue. But with NumPy 2 behavior not upcasting the computation result type based on the value anymore, even things like: ``` cudf.Series([1, 2, 3], dtype="int8") < 1000 ``` would fail. (Similar paths could be taken for other integer scalars, but there would be mostly nice for performance.) N.B. NumPy/pandas also support exact comparisons when mixing e.g. uint64 and int64. This is another rare exception that cudf currently does not support. Closes gh-16282 Authors: - Sebastian Berg (https://github.com/seberg) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #16532

seberg mentioned this issue Jul 15, 2024

WIP,API: Enable full Numpy2 compat #15897

Closed

2 tasks

seberg mentioned this issue Aug 12, 2024

Ensure comparisons with pyints and integer series always succeed #16532

Merged

3 tasks

rapids-bot bot closed this as completed in #16532 Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Integer promotion fixes needed for NumPy 2 for comparison operators #16282

[BUG] Integer promotion fixes needed for NumPy 2 for comparison operators #16282

seberg commented Jul 15, 2024 •

edited by bdice

Loading

jakirkham commented Aug 1, 2024

seberg commented Aug 1, 2024

[BUG] Integer promotion fixes needed for NumPy 2 for comparison operators #16282

[BUG] Integer promotion fixes needed for NumPy 2 for comparison operators #16282

Comments

seberg commented Jul 15, 2024 • edited by bdice Loading

jakirkham commented Aug 1, 2024

seberg commented Aug 1, 2024

seberg commented Jul 15, 2024 •

edited by bdice

Loading