-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify hash and equality operations #47
base: master
Are you sure you want to change the base?
Conversation
Also broaden the test coverage with set operation and equality to different object.
The (macos-latest, 3.6) integration test failure looks like an unrelated internet access failure. |
Thanks for the help @kalvdans! I checked your version of Also please take into account that python's >>> hash(1)
1
>>> hash(100)
100
>>> It only computes the actual hash for other types of values: >>> hash("a")
-2483041677650604276 In your case the calling |
A hash algorithm is used in hash tables to allow for constant time average lookup time. It should not be used for anything else. The old implementation of On 64-bit architectures, the risk is smaller, but it is just fundamentally wrong to use hash for equality. |
You are right. hash() automatically truncates integers to machine precision. I will change that, hang on! |
It is perfectly fine for __hash__() to return an integer.
I'm not sure that I agree since python It seems that the solution you are presenting is a weaker equality condition that the one based on hashing since it uses a single attribute for comparison instead of 69 attributes. I don't think that I would like to rely on a single attribute since the odds of someone accidentally assigning a new value to |
They use the hash for inequality only, for equality they still call the
Yes the documentation for |
I cannot see a strong case to get this merged, would like to continue iterating on this? |
I care a lot about the quality of the mendeleev package, please tag someone else on the PR to get a third opinion. |
Thanks for your concern. I cannot currently see enough reason to devote more attention to this issue. If you find a concrete example (as in https://stackoverflow.com/help/minimal-reproducible-example) that illustrates any vulnerabilities with the current implementation I would be happy to jump in and help address that. |
@mattwthompson, can you have a look at this PR? |
return hash(tuple(sorted(hashable))) | ||
# Allow Element objects as keys in dictionaries. Same atomic | ||
# number means all other properties are equal too. | ||
return self.atomic_number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally be weary of hashing elements based on a single property, as it assumes there is no possible way for two elements to differ if their atomic numbers are the same. Because the Element
class is mutable and can be used to generate custom, pesudo-/non-physical elements, it seems much more likely that there would be a collision based on this assumption than the birthday paradox on 32-bit architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to remove the __hash__
function until we make the instances immutable. The hashable property wasn't part of #42 after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made an alternative pull request for that, see #51
assert element(12) != "sample string" | ||
|
||
|
||
def test_hashable(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to having this tested, whether or not the changes to the source code are accepted
Also broaden the test coverage with set operation and equality to
different object.
Follow-up on #43