Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sha2-mb-scalar for sha2-streaming #670

Merged
merged 47 commits into from
Nov 14, 2022
Merged

Use sha2-mb-scalar for sha2-streaming #670

merged 47 commits into from
Nov 14, 2022

Conversation

msprotz
Copy link
Contributor

@msprotz msprotz commented Nov 4, 2022

This gives a significant performance improvement and paves the way for removing the older, less efficient sha2 implementation authored by yours truly.

HUGE thanks to @polubelova and @R1kM for helping out with those pesky proofs.

Next:

  • switch over sha2/384 and sha2/512 DONE
  • port over EverCrypt.Hash to use those functions
  • ditch Hacl.Hash..SHA2.

@msprotz msprotz requested a review from a team as a code owner November 4, 2022 18:11
@msprotz msprotz changed the title Use sha2-mb-scalar for sha2/224 and sha2/256 in streaming Use sha2-mb-scalar for sha2-streaming Nov 4, 2022
@msprotz
Copy link
Contributor Author

msprotz commented Nov 7, 2022

I pushed a temporary commit that duplicates some code. The goal before merging this PR (after discussing with Marina) is to:

  • expose init_X, update_nblocks_N, update_last_N and finish_N inside of Hacl.SHA2.Scalar32.fst, NOT inline_for_extraction
  • rewrite one-shot hashes to use these functions (instead of inlining them away) --> one-shot hash more readable
  • have streaming use these functions (with a wrapper pattern, like I did already in my most recent commit)
  • hide Hacl.SHA2.Scalar32 behind the streaming interface

@msprotz
Copy link
Contributor Author

msprotz commented Nov 11, 2022

Some performance numbers for reference:

// Franziskus' M1
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
Sha2_256_Streaming       5063 ns         5062 ns       135619 // old
Sha2_256_Streaming       3163 ns         3160 ns       219353 // new

// My 2019 macbook pro (intel core i9)
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
Sha2_256_Streaming                 5069 ns         5062 ns       139356 // old
Sha2_256_Streaming                 4559 ns         4554 ns       152422 // new
Sha2_256                           4742 ns         4727 ns       148652 // old, one-shot
Sha2_MB_256                        4150 ns         4146 ns       165216 // new, one-shot
EverCrypt_Sha2_256_Streaming       5331 ns         5322 ns       130475 // uses old sha2 update, but also has dynamic agility

// Karthik's M1 max
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
Sha2_256_Streaming       5048 ns         5046 ns       138955 // old
Sha2_256_Streaming       3224 ns         3223 ns       216871 // new

// Franziskus' AMD Ryzen 
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
Sha2_256_Streaming       4103 ns         4103 ns       170520 // old
Sha2_256_Streaming       4101 ns         4101 ns       170714 // new

I'm observing some pretty big variation from one run to the next (5%) so take these with a grain of salt. But overall the improvements are clear, especially on M1.

@msprotz
Copy link
Contributor Author

msprotz commented Nov 11, 2022

I did the suggested changes.

 dist/gcc-compatible/Hacl_Ed25519.c                      |   17 +-
 dist/gcc-compatible/Hacl_Ed25519.h                      |    1 -
 dist/gcc-compatible/Hacl_SHA2_Vec128.c                  |  342 +++++++++-------
 dist/gcc-compatible/Hacl_SHA2_Vec256.c                  | 1030 +++++++++++++++++++++++++++---------------------
 dist/gcc-compatible/Hacl_Streaming_MD5.c                |    1 +
 dist/gcc-compatible/Hacl_Streaming_SHA1.c               |    1 +
 dist/gcc-compatible/Hacl_Streaming_SHA2.c               | 2506 ++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------------------------

so it's removing quite a bunch of C code (which is good).

One important change: Hacl_SHA2_Scalar32.c is now bundled inside of Hacl_SHA2_Streaming to make sure the individual helper functions are static inline (which helps perf).

@msprotz
Copy link
Contributor Author

msprotz commented Nov 11, 2022

New benchmars on my intel 2019 macbook pro:

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
Sha2_256_Streaming                 4380 ns         4377 ns       159997 // with the latest code restructuring, slightly faster
Sha2_MB_256                        4160 ns         4156 ns       169533

At least on this machine, sha2-256-mb-streaming is now on-par with OpenSSL-noasm.

Copy link
Contributor

@karthikbhargavan karthikbhargavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Copy link
Contributor

@polubelova polubelova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@msprotz msprotz merged commit 08af70b into main Nov 14, 2022
@msprotz msprotz deleted the protz_sha2mb_streaming branch November 14, 2022 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants