Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overriding __eq__ does not set __hash__ to None #2191

Closed
ixje opened this issue Apr 28, 2020 · 22 comments · Fixed by #2291
Closed

Overriding __eq__ does not set __hash__ to None #2191

ixje opened this issue Apr 28, 2020 · 22 comments · Fixed by #2291

Comments

@ixje
Copy link

ixje commented Apr 28, 2020

Issue description

The Python documentation states

A class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None. When the __hash__() method of a class is None, instances of the class will raise an appropriate TypeError when a program attempts to retrieve their hash value, and will also be correctly identified as unhashable when checking isinstance(obj, collections.abc.Hashable).

Thus if we override __eq__ but not __hash__ we should get an unhashable type (e.g. cannot be used in a set container). Pybind however seems to always have a __hash__ implemented, regardless if we override __eq__. The expected behaviour is that __hash__ is set to None

Reproducible example code

In Python it exhibits the following behaviour

from collections import abc

class MyClass1:
    pass

class MyClass2:
    def __eq__(self, other):
        return True

def main():
    c1 = MyClass1()
    c2 = MyClass2()
    print(isinstance(c1, abc.Hashable)) # True 
    print(isinstance(c2, abc.Hashable)) # False <- OK

if __name__ == "__main__":
    main()

Using pybind

#include <pybind11/pybind11.h>
namespace py = pybind11;

class DummyClass {};
class DummyClass2 {};

PYBIND11_MODULE(DummyClass, m) {
    py::class_<DummyClass>(m, "MyClass1")
            .def(py::init());
    py::class_<DummyClass2>(m, "MyClass2")
            .def(py::init())
            .def("__eq__", [](py::object&) { return true; });
}
from collections import abc
from lib.DummyClass import MyClass1, MyClass2

def main():
    c1 = MyClass1()
    c2 = MyClass2()
    print(isinstance(c1, abc.Hashable)) # True 
    print(isinstance(c2, abc.Hashable)) # True <- NOT OK

if __name__ == "__main__":
    main()

The workaround

A workaround is to manually add .attr("__hash__") = py::none()

I'm assuming it should be possible to detect if there is a __eq__ override without any __hash__ override and then set the attribute. This would resolve the deviating/unexpected behaviour.

@bstaletic
Copy link
Collaborator

Interesting. Pybind can check that the argument to .def is __eq__ and then set attr("__hash__") to py::none().

@ixje
Copy link
Author

ixje commented Jul 7, 2020

Interesting. Pybind can check that the argument to .def is __eq__ and then set attr("__hash__") to py::none().

Note that it should only do that if there is no .def('__hash__' elsewhere for the object. The following should not overwrite __hash__ again to py::none()

.def("__hash__", ..)
.def("__eq__", ...)

@YannickJadoul
Copy link
Collaborator

I agree in principle, but it seems quite horrible to do a string comparison with "__eq__" on every call to .def?

@ixje
Copy link
Author

ixje commented Jul 8, 2020

Is that really a concern? I mean it is all compile time and how big are bindings really going to be that this will increase will have a noticeable impact?

@YannickJadoul
Copy link
Collaborator

Is that really a concern? I mean it is all compile time and how big are bindings really going to be that this will increase will have a noticeable impact?

Is it? Most of the time, it will be, indeed, but in principle, you could dynamically generate strings. So this can't be expected to have no effect on runtime, I think. I'm not against, but it's something to consider.

There are talks about getting benchmarks (both for size and time), so that would make such a discussion a lot easier :-)

@sizmailov
Copy link
Contributor

Can this check & fixup be in py::class_ desctructor?

@YannickJadoul
Copy link
Collaborator

Can this check & fixup be in py::class_ desctructor?

Interesting idea, but I don't think there's anything stopping you from keeping the py::class_ stored somewhere?
In a typical case, it will be destructed, most likely, but wouldn't it be a horrible, hard-to-discover/trace bug if just by storing an existing variable somewhere, this behaviour would suddenly change?

@sizmailov
Copy link
Contributor

I agree, it might lead to nasty debug session. I've choose destructor as most closest approximation to "class bindings are complete" event. Would it be better to have dedicated py::class_::done method instead? Probably there other things that needs to be checked/fixed after python class construction completion.

@YannickJadoul
Copy link
Collaborator

Would it be better to have dedicated py::class_::done method instead? Probably there other things that needs to be checked/fixed after python class construction completion.

I'm not entirely convinced of that either. Lots of existing project might depend on that not being there, currently?
For now, we've been able to avoid this, btw.

Actually, maybe the check with 7 chars in __eq__\0 could still be OK (if you take into account the amount of lookups and compares Python does on every call, anyway). But adding it to def seems quite ad-hoc in design.

@sizmailov
Copy link
Contributor

I meant explicit call to "finalize" method (if it was not clear from previous comment), smth like

    py::class_<DummyClass2>(m, "MyClass2")
            .def(py::init())
            .def("__eq__", [](py::object&) { return true; })
            .done(); // checks, fix-ups, etc 

I don't think it would break any code (it might, but chances are very low).

@YannickJadoul
Copy link
Collaborator

Yes. So once we start counting on that (for maybe more crucial things than __eq__ and __hash__), all the existing code that doesn't call done would be wrong?

@bstaletic
Copy link
Collaborator

Here's a potentially bad idea. Instead of strcmp("__eq__", name), compare with ==, as pointers. Yes, this wouldn't be as robust as lexicographic comparison, but string literals are stored in .data section and all pointers to the same string literal are equal.

My idea breaks if the user copies the string literal onto the stack and passes that pointer, or creates a std::string and passes c_str().

 

I don't like the idea of done() as it's easy to forget and all the examples and docs need to be updated.

@sizmailov
Copy link
Contributor

del

No, existing code would be as good as it is right now, not worse nor better.

Such call will be required for some cases, but not every class. The check that required call was not made can go to module destructor (py::class_ instances can be created multiple times for same c++ class)

@bstaletic I agree, for this issue done() seems to be unnecessary complication compared to patching def.

@YannickJadoul
Copy link
Collaborator

Or, best of both worlds: if (name == "__eq__" || !std::strcmp(name, "__eq__"))?

@bstaletic
Copy link
Collaborator

That might be a good idea. I'd just be explicit regarding that ! and actually compare strcmp to 0.

@YannickJadoul
Copy link
Collaborator

Let's try tomorrow. Or do you want to make a PR, @ixje?

@sizmailov
Copy link
Contributor

Or, best of both worlds: if (name == "__eq__" || !std::strcmp(name, "__eq__"))?

In majority of cases name !="__eq__", therefore it's one pointer comparison worse than bare strcmp.

@wjakob
Copy link
Member

wjakob commented Jul 8, 2020

I'm not sure if it makes sense to add code to def here for what is a fairly narrow use case. Keep in mind that def() is the most frequently called entry point in pybind11, so the stakes for changing something here are fairly high. My suggestion would be to document this behavior and ask people to set __hash__ to none if they truly want a non-hashable type.

@ixje
Copy link
Author

ixje commented Jul 10, 2020

I was under the impression that __hash__ and __eq__ were set on the base object (e.g. @ tp_hash), but I've come to learn this is not the case. Just documenting the deviating behaviour sounds right then.

@YannickJadoul
Copy link
Collaborator

OK, then. We could also add it to the py::self == py::self notation, but maybe that would cause more confusion/inconsistencies than not doing it.

@ixje Do you have time to make a PR to add this to the docs somewhere?

@ixje
Copy link
Author

ixje commented Jul 10, 2020

@YannickJadoul I took a stab

@YannickJadoul
Copy link
Collaborator

Thanks!

@sizmailov & @bstaletic, please have a look if #2287 looks good to you? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants