Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmarks: Sanity check of benchmarks. #337

Open
12 of 30 tasks
duesee opened this issue Jan 16, 2023 · 3 comments
Open
12 of 30 tasks

benchmarks: Sanity check of benchmarks. #337

duesee opened this issue Jan 16, 2023 · 3 comments

Comments

@duesee
Copy link
Contributor

duesee commented Jan 16, 2023

  • Examine timing anomalies:

    • P256
    • K256
    • ChaCha20 und ChaCha20Poly1305 in OpenSSL look inefficient
      • (result != 1)
    • OpenSSL_Ed25519_Verify seems to be very slow?
    • OpenSSL_K256_ECDSA_Sign seems to be slow?
    • OpenSSL_K256_ECDSA_Verify seems to be slow?
    • OpenSSL_K256_ECDH seems to be slow?
    • EverCrypt_HKDF_SHA2_256_extract_expand is faster than SHA2-512 variant.
    • HACL_x25519_51_{,base} seems off.
  • Vectorized variants are slower? Measure with bigger messages to accomodate for CPU init?

    • Measure with bigger message sizes
    • hmac_blake2b, blake2b_vec256 too slow?
    • -> Only on AMD systems.
  • Erroneous setups:

    • Missing hacl_init_cpu_features()
    • Missing EverCrypt_AutoConfig2_init()
    • Use Setup(...)
  • Copy&Paste Bugs:

    • Wrong call to DRBG free
      • Use BENCHMARK_CAPTURE to prevent these errors
  • Usecases:

    • Ensure we test for the same usecases when comparing benchmarks. This is important because APIs might differ significantly. Define a specific usecase, e.g., "Alice wants to verify a given (message: &[u8], signature: &[u8], public_key: &[u8])" and benchmark the whole usecase. Pay attention to missing (or extraneous) checks.
    • Do not use sanity tests while measuring or always use (the same) sanity tests while measuring (state.PauseTiming() could come in handy, although it is not recommended)
    • ChaCha20Poly1305 ciphertext != expected_ciphertext
    • Move new_raw_private in OpenSSL_Ed25519_Sign into benchmark?
    • Move precomputation in HACL_NaCl_precomputed_combined before loop to accomodate usecase?
    • Move precomputation in HACL_NaCl_precomputed_detached before loop to accomodate usecase?
  • Chore:

    • Agree on benchmark naming scheme
    • Agree where to put BENCHMARK
    • Agree on benchmark order. What comes first, OpenSSL, EverCrypt, or HACL?
    • for (auto _: state) { v.s. state.keepRunning()
      • Some users report slightly different results. We should just agree on one.
  • Unsure:

    • Is there a better way to implement OpenSSL_blake2b_oneshot?
    • Is there a better way to implement OpenSSL_blake2s_oneshot?
This was referenced Jan 16, 2023
@duesee duesee changed the title benchmarks: Unify benchmark names (and add a short README). benchmarks: Sanity check of benchmarks. Jan 19, 2023
@duesee duesee self-assigned this Jan 23, 2023
@duesee
Copy link
Contributor Author

duesee commented Jan 23, 2023

@franziskuskiefer, do you want to provide input on this? Otherwise I could just start and use what I think is appropriate. Naming and order is not super important but let's do that as long as it's easily possible and makes the next steps easier.

@franziskuskiefer
Copy link
Member

Some thought, but just go ahead

  • I don't see any anomalies in times anymore, but please double check.
  • Try to make sure that only the actual function in question is measured, i.e. a little other things like allocations etc. For example, we don't care about the time it takes to load a public key to verify a signature. We don't have much influence on that. We want to know how long the function takes to verify a signature.

@duesee
Copy link
Contributor Author

duesee commented Jan 23, 2023

Try to make sure that only the actual function in question is measured, i.e. a little other things like allocations etc. For example, we don't care about the time it takes to load a public key to verify a signature. We don't have much influence on that. We want to know how long the function takes to verify a signature.

I think it would help to talk about comparisons (as with OpenSSL) and regression testing separately: To have comparable benchmarks, we should make sure that we "do the same" for HACL and, e.g., OpenSSL. For example, when the API call in HACL is unified such that it always hashes a message before signing, and OpenSSL hashes and signs in two steps, we should make sure to include the hashing step in OpenSSL. Otherwise we compare hashing+sign with sign-only.

For regression testing, I agree with your comment. We can put as much things as needed in the setup and only measure the single function we don't want to regress.

For some benchmarks comparisons and regression testing align, but not for all. I will take a look and point out these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants