Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to rapidhash #17422

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Upgrade to rapidhash #17422

wants to merge 8 commits into from

Conversation

Nicoshev
Copy link

@Nicoshev Nicoshev commented Jun 11, 2024

Upgrading 64-bit from wyhash to rapidhash.

The wyhash owner has recognized rapidhash as the official successor in its repo.

rapidhash has shown to be faster and of higher quality than wyhash.

It is also officially compatible with MSVC across all architectures.

References and Relevant Issues

rapidhash is currently the fastest recommended hash function by SMHasher.

rapidhash is currently the fastest passing hash in SMHasher3.

Collision-based study showing a collision probability lower than wyhash and close to ideal.

@Nicoshev
Copy link
Author

@microsoft-github-policy-service agree

This comment has been minimized.

This comment has been minimized.

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gating on @lhecker as the final signoff, as he's the one who chose and integrated wyahsh

src/inc/til/rapidhash.h Outdated Show resolved Hide resolved
src/inc/til/rapidhash.h Outdated Show resolved Hide resolved
src/inc/til/rapidhash.h Outdated Show resolved Hide resolved
src/inc/til/rapidhash.h Outdated Show resolved Hide resolved
src/inc/til/rapidhash.h Outdated Show resolved Hide resolved
@DHowett
Copy link
Member

DHowett commented Jul 8, 2024

Thanks for all the effort here, Nicolas!

I have some concerns, plus a couple "favorite things" about what we're doing today.

Right now, our implementation of wyhash is fully contained and inlined into til, and nobody needs to include any further headers or take on any statics1.

I'm concerned that we are also only replacing the 64-bit hash function, so we're not fully pulling wyhash from our code. We will forever have two OSS hash library dependencies rather than just the one.

All that said, I'd still be interested... but we are also not using it for any data larger than, say, 1kb.

I'm not certain the performance boost for our specific workload is worth switching, all other things taken into consideration. If we were going to be processing untold heaps of data I might feel differently, but... eh. It's written now, but so is the code we had before. 🤷🏻

Footnotes

  1. Doesn't the compiler emit one of these into every translation unit? We've got a lot of TUs that include this...

@Nicoshev
Copy link
Author

Nicoshev commented Jul 9, 2024

Hey @DHowett, thanks a lot for the review!

The global variable should not be replicated accross translation units because rapidhash.h is only used by hash.h, which states '#pragma once' before including it. For extra security, I've added '#pragma once' to rapidhash.h :)

I've also just updated the rapidhash header to a newer version, which has greater compatibility with C++.
The global variable is now marked as 'constexpr' within C++ compilation, hinting the compiler to inline its value whenever used.
Nevertheless, this optimization was still observed when compiling a C file including the header.
The new updated header file also adds the noexcept qualifiers, which allowed the removal of warning supressing code.

Having the 64-bit hash function undocked may be a favorable thing looking forward.
The last commit shows how effortless is to update the hash implementation.
It should be easy to perform any change to the hash function if needed in the future.

wyhash has been deprecated by its owner, it may be counterproductive to use unmaintained code.

@DHowett
Copy link
Member

DHowett commented Jul 9, 2024

Thanks!

#pragma once doesn't prevent a header from being included in every translation unit; it just prevents a header from being included more than once in the same TU!

I'm glad that it was easy to update. That is definitely a point in its favor!

FWIW, though, on maintenance: does wyhash need maintenance? Will rapidhash need maintenance? I'm firmly of the belief that a project can be "done". If a vulnerability is discovered somehow in the hashing algorithm that would put us at risk, the work is the same to fix it as it is to move to a different hash function. Why move now, from a project that is entirely complete, in the absence of such a vulnerability, for the use cases we have?

@Nicoshev
Copy link
Author

Nicoshev commented Jul 9, 2024

It is unclear if wyhash or rapidhash will benefit from maintenance.

New language standards may introduce qualifiers or notions from which the hash functions could benefit.

I do agree that this change is not absolutely necesary, although it does leave the codebase modernized and will produce a slight battery life improvement on devices using the terminal.
Additionally, having available the lastest version of the hash function may be useful for a future arising need.

I can also change the PR to implement rapidhash in-place within hash.h, if that's preferable.

Thanks,
Nicolas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants