benchmarks: Sanity check of benchmarks. #337

duesee · 2023-01-16T08:54:57Z

Examine timing anomalies:
- P256
- K256
- ChaCha20 und ChaCha20Poly1305 in OpenSSL look inefficient
  - (result != 1)
- OpenSSL_Ed25519_Verify seems to be very slow?
- OpenSSL_K256_ECDSA_Sign seems to be slow?
- OpenSSL_K256_ECDSA_Verify seems to be slow?
- OpenSSL_K256_ECDH seems to be slow?
- EverCrypt_HKDF_SHA2_256_extract_expand is faster than SHA2-512 variant.
  - -> Use sha2-mb-scalar for sha2-streaming hacl-star/hacl-star#670
- HACL_x25519_51_{,base} seems off.
Vectorized variants are slower? Measure with bigger messages to accomodate for CPU init?
- Measure with bigger message sizes
- hmac_blake2b, blake2b_vec256 too slow?
- -> Only on AMD systems.
Erroneous setups:
- Missing hacl_init_cpu_features()
- Missing EverCrypt_AutoConfig2_init()
- Use Setup(...)
Copy&Paste Bugs:
- Wrong call to DRBG free
  - Use BENCHMARK_CAPTURE to prevent these errors
Usecases:
- Ensure we test for the same usecases when comparing benchmarks. This is important because APIs might differ significantly. Define a specific usecase, e.g., "Alice wants to verify a given (message: &[u8], signature: &[u8], public_key: &[u8])" and benchmark the whole usecase. Pay attention to missing (or extraneous) checks.
- Do not use sanity tests while measuring or always use (the same) sanity tests while measuring (state.PauseTiming() could come in handy, although it is not recommended)
- ChaCha20Poly1305 ciphertext != expected_ciphertext
- Move new_raw_private in OpenSSL_Ed25519_Sign into benchmark?
- Move precomputation in HACL_NaCl_precomputed_combined before loop to accomodate usecase?
- Move precomputation in HACL_NaCl_precomputed_detached before loop to accomodate usecase?
Chore:
- Agree on benchmark naming scheme
- Agree where to put BENCHMARK
- Agree on benchmark order. What comes first, OpenSSL, EverCrypt, or HACL?
- for (auto _: state) { v.s. state.keepRunning()
  - Some users report slightly different results. We should just agree on one.
Unsure:
- Is there a better way to implement OpenSSL_blake2b_oneshot?
- Is there a better way to implement OpenSSL_blake2s_oneshot?

The text was updated successfully, but these errors were encountered:

duesee · 2023-01-23T13:05:51Z

@franziskuskiefer, do you want to provide input on this? Otherwise I could just start and use what I think is appropriate. Naming and order is not super important but let's do that as long as it's easily possible and makes the next steps easier.

franziskuskiefer · 2023-01-23T15:07:50Z

Some thought, but just go ahead

I don't see any anomalies in times anymore, but please double check.
Try to make sure that only the actual function in question is measured, i.e. a little other things like allocations etc. For example, we don't care about the time it takes to load a public key to verify a signature. We don't have much influence on that. We want to know how long the function takes to verify a signature.

duesee · 2023-01-23T15:20:39Z

Try to make sure that only the actual function in question is measured, i.e. a little other things like allocations etc. For example, we don't care about the time it takes to load a public key to verify a signature. We don't have much influence on that. We want to know how long the function takes to verify a signature.

I think it would help to talk about comparisons (as with OpenSSL) and regression testing separately: To have comparable benchmarks, we should make sure that we "do the same" for HACL and, e.g., OpenSSL. For example, when the API call in HACL is unified such that it always hashes a message before signing, and OpenSSL hashes and signs in two steps, we should make sure to include the hashing step in OpenSSL. Otherwise we compare hashing+sign with sign-only.

For regression testing, I agree with your comment. We can put as much things as needed in the setup and only measure the single function we don't want to regress.

For some benchmarks comparisons and regression testing align, but not for all. I will take a look and point out these cases.

This was referenced Jan 16, 2023

benchmarks: DRBG. #336

Merged

benchmarks: HKDF. #335

Merged

duesee changed the title ~~benchmarks: Unify benchmark names (and add a short README).~~ benchmarks: Sanity check of benchmarks. Jan 19, 2023

duesee self-assigned this Jan 23, 2023

duesee mentioned this issue Jan 24, 2023

benchmarks: Make benchmarks more robust and cleanup. #348

Merged

duesee removed their assignment Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks: Sanity check of benchmarks. #337

benchmarks: Sanity check of benchmarks. #337

duesee commented Jan 16, 2023 •

edited

Loading

duesee commented Jan 23, 2023

franziskuskiefer commented Jan 23, 2023

duesee commented Jan 23, 2023 •

edited

Loading

benchmarks: Sanity check of benchmarks. #337

benchmarks: Sanity check of benchmarks. #337

Comments

duesee commented Jan 16, 2023 • edited Loading

duesee commented Jan 23, 2023

franziskuskiefer commented Jan 23, 2023

duesee commented Jan 23, 2023 • edited Loading

duesee commented Jan 16, 2023 •

edited

Loading

duesee commented Jan 23, 2023 •

edited

Loading