Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs: mention setting __hash__ = None for custom __eq__ method #2287

Closed
wants to merge 2 commits into from
Closed

Conversation

ixje
Copy link

@ixje ixje commented Jul 10, 2020

close #2191

@ixje
Copy link
Author

ixje commented Jul 10, 2020

I thought the operators section was the closest related. I mostly re-used the wordings from the official Python docs.

YannickJadoul
YannickJadoul previously approved these changes Jul 10, 2020
Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thanks, @ixje! :-)

@YannickJadoul
Copy link
Collaborator

Just some docs, so I'll give the others involved in #2191 a few days to check, and merge if there are no objections :-)

@bstaletic
Copy link
Collaborator

One thing I've noticed here, though I don't know if anybody else cares, are long lines - significantly longer than the rest of the document.

@sizmailov
Copy link
Contributor

I would say it's fine but a bit hard to grasp from first read. Can this be shortened to something like this:

A class that overrides __eq__ MUST override __hash__ as well. If you want non-hashable type do .attr("__hash__") = py::none()

?

@bstaletic
Copy link
Collaborator

Wording is perfectly fine. I was thinking more like what is just above the block you changed - a single sentence broken into two lines.

    To use the more convenient ``py::self`` notation, the additional
    header file :file:`pybind11/operators.h` must be included.

So just change a few spaces into new lines + indentation.

@sizmailov
Copy link
Contributor

sizmailov commented Jul 10, 2020

My point was that __hash__ must be overwritten whenever __eq__ is. If it's not then it's a bug in binding code and leads to peculiar x in {x} == False.

So phrase

A class that overrides __eq__() and does not define __hash__()

describes class with the bug.

@ixje
Copy link
Author

ixje commented Jul 10, 2020

I actually think @sizmailov is right here. I don't think that for pybind there is a scenario where overriding __eq__ without overriding __hash__ is ever correct. Normal Python creates a sane default by setting __hash__ to None, which pybind doesn't.

@YannickJadoul
Copy link
Collaborator

That's worth a discussion, but if that's the case, that it's strictly incorrect, then we should probably add something to pybind11, and find a way to do so that convinces @wjakob.

If i's not then it's a bug in binding code and leads to peculiar x in {x} == False.

That's not really true, though. The default hash will be based on object identity, so any sane __eq__ would consider more objects to be equal, as any interface that has (x is x) and (x != x) is semantically broken.

>>> class X:
...     def __eq__(self, other):
...             # Fine, yes, normally, you'd compare `self.value == other.value` or so.
...             # The important point is that `__eq__` is less restrictive than `is`.
...             return True
... 
>>> x = X()
>>> x in {x}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'X'
>>> X.__hash__ = object.__hash__
>>> x = X()
>>> x in {x}
True

The weird or wrong situation happens the other way around:

>>> y = X()
>>> x == y
True
>>> x in {y}
False

@YannickJadoul YannickJadoul changed the title Update classes.rst Update docs: mention setting __hash__ = None for custom __eq__ method Jul 10, 2020
@sizmailov
Copy link
Contributor

sizmailov commented Jul 10, 2020

x == y implies hash(x) == hash(y) for hashable types. In python you can violate this contract by explicitly providing bad implementation of __hash__, in pybind you do it implicitly by overriding __eq__ and keeping default implementation.

Here is the quote from python doc:

The only required property is that objects which compare equal have the same hash value;

@YannickJadoul
Copy link
Collaborator

x == y implies hash(x) == hash(y) for hashable types. In python you can violate this contract by explicitly providing bad implementation of __hash__, in pybind you do it implicitly by overriding __eq__ and keeping default implementation.

Here is the quote from python doc:

The only required property is that objects which compare equal have the same hash value;

Yes, true. I'm not saying it might not be a problem, but your example of x in {x} should never fail, I think.

For the record, adding this (untested) code in the right place (def?) would make things work, I believe:

if ((name == "__eq__"  || strcmp(name, "__eq__") == 0) && !attr("__dict__").contains("__hash__")) {
    attr("__hash__") = py::none();
}

@sizmailov
Copy link
Contributor

but your example of x in {x} should never fail, I think.

Probably you are right, not exactly in this form, it should be a bit more complex example, like foo.x in {foo.x}

Once again, name == "__eq__" is almost never true, therefore I think this extra check is pessimisation.

At the moment def seems to be the only candidate. I think we need to measure the impact.

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Jul 10, 2020

Probably you are right, not exactly in this form, it should be a bit more complex example, like foo.x in {foo.x}

This will still have the same id, so probably not. But OK, the exact details don't matter. It's true that it's breaking Python's assumptions.

Once again, name == "__eq__" is almost never true, therefore I think this extra check is pessimisation.

It might, actually, no? Because almost always™ (?), def will be called with a string literal, and since this is in a header, any sane compiler would reuse a pointer to the same static data and not allocate a new piece of memory ever time (or not?). But yes, could be left out as well. Just calling attr(name) = py::cpp_function(...) will probably already cost a lot more comparisons.

@YannickJadoul
Copy link
Collaborator

It might, actually, no?

Nvm, @sizmailov is correct, since this won't speed up other things than __eq__, since they would still need to strcmp. At any rate, the strcmp should be reasonably fast, compared to the rest of Python? The main issue is still how to do it cleanly.

@sizmailov
Copy link
Contributor

It turns out that there is only one function to patch, the def. There is no overloads, templates, etc.

For benchmark purposes I've copied def with suggested patch and named it def_hash_fix. Here is the gist with snippet, def_hash_fix definition and full benchmark results.

It seems like the obtained data is not reliable. 99% of time def spends in py::cpp_function (according to profiler), therefore benchmarked code takes only 1% of whole time and quite noisy. To overcome this issue I've extracted "overhead" part of patch into separate benchmark. Absolute time per iteration of overhead is 0.08% (yes, 1/12 of percent) of def time.

My best guess is that python runtime is responsibile for this discrepancy. (And as always there is a chance that my benchmarks are flawed)

Here is one of the outputs:

-------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations
-------------------------------------------------------------------------
Fixture/def_foo                    674525 ns       674499 ns         7069
Fixture/def_foo_with_hash_fix      696623 ns       696595 ns         7737
Fixture/def__eq__                  700769 ns       700744 ns         8344
Fixture/def__eq__with_hash_fix     653667 ns       653612 ns         7914
Fixture/hash_fix_overhead             568 ns          567 ns      1259413

CPU scaling is disabled, I've tried to run on (almost) idle system

@wjakob
Copy link
Member

wjakob commented Jul 10, 2020

Thank you for looking at performance @sizmailov. While you are at it: does the size of the .so file of the test suite change noticably in size with the patch?

@wjakob
Copy link
Member

wjakob commented Jul 10, 2020

By the way, one thing worth investigating would be to factor out the whole middle part

    attr(cf.name()) = cf;
    if (strcmp(name_, "__eq__") == 0 && !attr("__dict__").contains("__hash__")) {
      attr("__hash__") = none();
    }

into a non-templated function (maybe in the detail namespace) and see how that affects binary size.

@sizmailov
Copy link
Contributor

Here are the numbers

# g++-8.3.0, release mode
# du -b pybind11_tests.cpython-37m-x86_64-linux-gnu.so
 
master           3363296
patch inplace    3453408
patch function   3195360
extracted function
inline void add_class_method(object& klass, const char *name_, const cpp_function &cf) {
  klass.attr(cf.name()) = cf;
  if (strcmp(name_, "__eq__") == 0 && !klass.attr("__dict__").contains("__hash__")) {
    klass.attr("__hash__") = none();
  }
}

@wjakob
Copy link
Member

wjakob commented Jul 10, 2020

Not bad, a 5% size decrease. A good reminder how much we are paying for code that is generic but ends up being templated over and over again. I would be happy with this change (with extra function), please go ahead and put it into the PR. Super-minor: indentation is wrong (4 spaces), and can you call it "cls" or something like that? (it's also called py::class_<> and not py::klass :))


.. note::

A class that overrides ``__eq__()`` and does not define ``__hash__()`` will have its ``__hash__()`` implicitly set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: When a class overrides __eq__() and does not define __hash__(), pybind11 will automatically add a __hash__(). (I think this rewording makes it more clear that this behavior is something pybind11 does.)

@YannickJadoul
Copy link
Collaborator

Great job, @sizmailov! Thank you so much for investigating and getting us some numbers! :-)

By the way, one thing worth investigating would be to factor out the whole middle part

On the long run, this might even be a customization point for users?

@ixje
Copy link
Author

ixje commented Jul 11, 2020

closing in favour of #2291

@ixje ixje closed this Jul 11, 2020
@ixje ixje deleted the patch-1 branch July 11, 2020 18:27
@henryiii henryiii added the docs Docs or GitHub info label Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Docs or GitHub info
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overriding __eq__ does not set __hash__ to None
7 participants