Overriding eq does not set hash to None #2191

ixje · 2020-04-28T10:00:35Z

Issue description

The Python documentation states

A class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None. When the __hash__() method of a class is None, instances of the class will raise an appropriate TypeError when a program attempts to retrieve their hash value, and will also be correctly identified as unhashable when checking isinstance(obj, collections.abc.Hashable).

Thus if we override __eq__ but not __hash__ we should get an unhashable type (e.g. cannot be used in a set container). Pybind however seems to always have a __hash__ implemented, regardless if we override __eq__. The expected behaviour is that __hash__ is set to None

Reproducible example code

In Python it exhibits the following behaviour

from collections import abc

class MyClass1:
    pass

class MyClass2:
    def __eq__(self, other):
        return True

def main():
    c1 = MyClass1()
    c2 = MyClass2()
    print(isinstance(c1, abc.Hashable)) # True 
    print(isinstance(c2, abc.Hashable)) # False <- OK

if __name__ == "__main__":
    main()

Using pybind

#include <pybind11/pybind11.h>
namespace py = pybind11;

class DummyClass {};
class DummyClass2 {};

PYBIND11_MODULE(DummyClass, m) {
    py::class_<DummyClass>(m, "MyClass1")
            .def(py::init());
    py::class_<DummyClass2>(m, "MyClass2")
            .def(py::init())
            .def("__eq__", [](py::object&) { return true; });
}

from collections import abc
from lib.DummyClass import MyClass1, MyClass2

def main():
    c1 = MyClass1()
    c2 = MyClass2()
    print(isinstance(c1, abc.Hashable)) # True 
    print(isinstance(c2, abc.Hashable)) # True <- NOT OK

if __name__ == "__main__":
    main()

The workaround

A workaround is to manually add .attr("__hash__") = py::none()

I'm assuming it should be possible to detect if there is a __eq__ override without any __hash__ override and then set the attribute. This would resolve the deviating/unexpected behaviour.

The text was updated successfully, but these errors were encountered:

bstaletic · 2020-07-07T08:52:11Z

Interesting. Pybind can check that the argument to .def is __eq__ and then set attr("__hash__") to py::none().

ixje · 2020-07-07T09:03:53Z

Interesting. Pybind can check that the argument to .def is __eq__ and then set attr("__hash__") to py::none().

Note that it should only do that if there is no .def('__hash__' elsewhere for the object. The following should not overwrite __hash__ again to py::none()

.def("__hash__", ..)
.def("__eq__", ...)

YannickJadoul · 2020-07-08T12:00:36Z

I agree in principle, but it seems quite horrible to do a string comparison with "__eq__" on every call to .def?

ixje · 2020-07-08T14:44:35Z

Is that really a concern? I mean it is all compile time and how big are bindings really going to be that this will increase will have a noticeable impact?

YannickJadoul · 2020-07-08T15:39:48Z

Is that really a concern? I mean it is all compile time and how big are bindings really going to be that this will increase will have a noticeable impact?

Is it? Most of the time, it will be, indeed, but in principle, you could dynamically generate strings. So this can't be expected to have no effect on runtime, I think. I'm not against, but it's something to consider.

There are talks about getting benchmarks (both for size and time), so that would make such a discussion a lot easier :-)

sizmailov · 2020-07-08T16:55:54Z

Can this check & fixup be in py::class_ desctructor?

YannickJadoul · 2020-07-08T17:14:22Z

Can this check & fixup be in py::class_ desctructor?

Interesting idea, but I don't think there's anything stopping you from keeping the py::class_ stored somewhere?
In a typical case, it will be destructed, most likely, but wouldn't it be a horrible, hard-to-discover/trace bug if just by storing an existing variable somewhere, this behaviour would suddenly change?

sizmailov · 2020-07-08T17:36:41Z

I agree, it might lead to nasty debug session. I've choose destructor as most closest approximation to "class bindings are complete" event. Would it be better to have dedicated py::class_::done method instead? Probably there other things that needs to be checked/fixed after python class construction completion.

YannickJadoul · 2020-07-08T18:36:15Z

Would it be better to have dedicated py::class_::done method instead? Probably there other things that needs to be checked/fixed after python class construction completion.

I'm not entirely convinced of that either. Lots of existing project might depend on that not being there, currently?
For now, we've been able to avoid this, btw.

Actually, maybe the check with 7 chars in __eq__\0 could still be OK (if you take into account the amount of lookups and compares Python does on every call, anyway). But adding it to def seems quite ad-hoc in design.

sizmailov · 2020-07-08T18:57:03Z

I meant explicit call to "finalize" method (if it was not clear from previous comment), smth like

    py::class_<DummyClass2>(m, "MyClass2")
            .def(py::init())
            .def("__eq__", [](py::object&) { return true; })
            .done(); // checks, fix-ups, etc

I don't think it would break any code (it might, but chances are very low).

YannickJadoul · 2020-07-08T19:31:05Z

Yes. So once we start counting on that (for maybe more crucial things than __eq__ and __hash__), all the existing code that doesn't call done would be wrong?

bstaletic · 2020-07-08T19:56:38Z

Here's a potentially bad idea. Instead of strcmp("__eq__", name), compare with ==, as pointers. Yes, this wouldn't be as robust as lexicographic comparison, but string literals are stored in .data section and all pointers to the same string literal are equal.

My idea breaks if the user copies the string literal onto the stack and passes that pointer, or creates a std::string and passes c_str().

I don't like the idea of done() as it's easy to forget and all the examples and docs need to be updated.

sizmailov · 2020-07-08T20:05:21Z

del

No, existing code would be as good as it is right now, not worse nor better.

Such call will be required for some cases, but not every class. The check that required call was not made can go to module destructor (py::class_ instances can be created multiple times for same c++ class)

@bstaletic I agree, for this issue done() seems to be unnecessary complication compared to patching def.

YannickJadoul · 2020-07-08T20:06:06Z

Or, best of both worlds: if (name == "__eq__" || !std::strcmp(name, "__eq__"))?

bstaletic · 2020-07-08T20:31:47Z

That might be a good idea. I'd just be explicit regarding that ! and actually compare strcmp to 0.

YannickJadoul · 2020-07-08T20:39:29Z

Let's try tomorrow. Or do you want to make a PR, @ixje?

sizmailov · 2020-07-08T21:17:31Z

Or, best of both worlds: if (name == "__eq__" || !std::strcmp(name, "__eq__"))?

In majority of cases name !="__eq__", therefore it's one pointer comparison worse than bare strcmp.

wjakob · 2020-07-08T21:46:07Z

I'm not sure if it makes sense to add code to def here for what is a fairly narrow use case. Keep in mind that def() is the most frequently called entry point in pybind11, so the stakes for changing something here are fairly high. My suggestion would be to document this behavior and ask people to set __hash__ to none if they truly want a non-hashable type.

ixje · 2020-07-10T04:13:12Z

I was under the impression that __hash__ and __eq__ were set on the base object (e.g. @ tp_hash), but I've come to learn this is not the case. Just documenting the deviating behaviour sounds right then.

YannickJadoul · 2020-07-10T13:04:59Z

OK, then. We could also add it to the py::self == py::self notation, but maybe that would cause more confusion/inconsistencies than not doing it.

@ixje Do you have time to make a PR to add this to the docs somewhere?

ixje · 2020-07-10T13:39:16Z

@YannickJadoul I took a stab

YannickJadoul · 2020-07-10T13:58:28Z

Thanks!

@sizmailov & @bstaletic, please have a look if #2287 looks good to you? :-)

fixes pybind#2191

…2291) fixes #2191

ixje mentioned this issue Jul 10, 2020

Update docs: mention setting __hash__ = None for custom __eq__ method #2287

Closed

sizmailov added a commit to sizmailov/pybind11 that referenced this issue Jul 11, 2020

Set __hash__ to None for types that defines __eq__, but not __hash__

95d6ebd

fixes pybind#2191

sizmailov added a commit to sizmailov/pybind11 that referenced this issue Jul 11, 2020

Set __hash__ to None for types that defines __eq__, but not __hash__

01745f4

fixes pybind#2191

sizmailov mentioned this issue Jul 11, 2020

Set __hash__ to None for types that defines __eq__ but not __hash__ #2291

Merged

sizmailov added a commit to sizmailov/pybind11 that referenced this issue Jul 11, 2020

Set __hash__ to None for types that defines __eq__, but not __hash__

6cd7749

fixes pybind#2191

sizmailov added a commit to sizmailov/pybind11 that referenced this issue Jul 11, 2020

Set __hash__ to None for types that defines __eq__, but not __hash__

88dae38

fixes pybind#2191

YannickJadoul closed this as completed in #2291 Jul 26, 2020

YannickJadoul pushed a commit that referenced this issue Jul 26, 2020

Set __hash__ to None for types that defines __eq__, but not __hash__ (#…

7b067cc

…2291) fixes #2191

taranu mentioned this issue Sep 26, 2022

DM-35554: Add SersicMixComponent lsst-dm/gauss2d_fit#3

Merged

rwgk mentioned this issue Feb 10, 2023

FWD pybind11 google/pybind11clif#2191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overriding eq does not set hash to None #2191

Overriding eq does not set hash to None #2191

ixje commented Apr 28, 2020 •

edited

Loading

bstaletic commented Jul 7, 2020

ixje commented Jul 7, 2020 •

edited

Loading

YannickJadoul commented Jul 8, 2020

ixje commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

sizmailov commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

sizmailov commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

sizmailov commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

bstaletic commented Jul 8, 2020

sizmailov commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

bstaletic commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

sizmailov commented Jul 8, 2020

wjakob commented Jul 8, 2020

ixje commented Jul 10, 2020

YannickJadoul commented Jul 10, 2020

ixje commented Jul 10, 2020

YannickJadoul commented Jul 10, 2020

Overriding __eq__ does not set __hash__ to None #2191

Overriding __eq__ does not set __hash__ to None #2191

Comments

ixje commented Apr 28, 2020 • edited Loading

Issue description

Reproducible example code

The workaround

bstaletic commented Jul 7, 2020

ixje commented Jul 7, 2020 • edited Loading

YannickJadoul commented Jul 8, 2020

ixje commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

sizmailov commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

sizmailov commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

sizmailov commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

bstaletic commented Jul 8, 2020

sizmailov commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

bstaletic commented Jul 8, 2020

YannickJadoul commented Jul 8, 2020

sizmailov commented Jul 8, 2020

wjakob commented Jul 8, 2020

ixje commented Jul 10, 2020

YannickJadoul commented Jul 10, 2020

ixje commented Jul 10, 2020

YannickJadoul commented Jul 10, 2020

Overriding eq does not set hash to None #2191

Overriding eq does not set hash to None #2191

ixje commented Apr 28, 2020 •

edited

Loading

ixje commented Jul 7, 2020 •

edited

Loading