Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ChaCha20 implementation with an optimized version #215

Merged
merged 1 commit into from
Jul 23, 2016
Merged

Replace ChaCha20 implementation with an optimized version #215

merged 1 commit into from
Jul 23, 2016

Conversation

aead
Copy link
Contributor

@aead aead commented Jul 23, 2016

No description provided.

@codecov-io
Copy link

codecov-io commented Jul 23, 2016

Current coverage is 77.82% (diff: 73.33%)

Merging #215 into master will decrease coverage by 0.31%

@@             master       #215   diff @@
==========================================
  Files           147        145     -2   
  Lines         13586      13445   -141   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          10615      10463   -152   
- Misses         2687       2695     +8   
- Partials        284        287     +3   

Powered by Codecov. Last update 85d6e1a...63ca613

@v2ray
Copy link
Collaborator

v2ray commented Jul 23, 2016

  1. Do you have benchmark data for your implementation?
  2. Are you going to write asm code for x86 and arm as well?

@aead
Copy link
Contributor Author

aead commented Jul 23, 2016

To 1:
The important for you:

  • BenchmarkCipher64-8        455.39 MB/s
  • BenchmarkCipher16K-8     1043.57 MB/s
  • BenchmarkXORKeyStream64-8     146.72 MB/s
  • BenchmarkXORKeyStream16K-8    1032.01 MB/s

The ChaCha20Poly1305 AEAD:

  • BenchmarkSeal64B-8    61.81 MB/s
  • BenchmarkSeal1K-8    421.12 MB/s
  • BenchmarkOpen64B-8    64.90 MB/s
  • BenchmarkOpen1K-8    377.11 MB/s

On Intel i7-3632QM - Better on my Skylake Notebook

To 2.
May I'll find the time to build an 386 asm version, but there's an reference implementation (for x86 and arm), which won't be slower than the old one in the internal package...
ARM assembly is not planned (from my side - maybe some else do this), because of the lack of time and a missing ARM setup (Pi + Go-ARM etc.)

@aead
Copy link
Contributor Author

aead commented Jul 23, 2016

If someone want's to verify the benchmark results:
go get github.com/aead/chacha20
cd $GOPATH/src/github.com/aead/chacha20
go test ./... -v -bench=Benchmark

@v2ray v2ray merged commit 08526a3 into v2ray:master Jul 23, 2016
@v2ray
Copy link
Collaborator

v2ray commented Jul 25, 2016

I have to rollback this change, as the chacha20 (with 8 bytes nonce) encryption are reportedly broken. I am not sure what went wrong. Will reply back after investigation.

@aead
Copy link
Contributor Author

aead commented Jul 25, 2016

Oh! Very surprising! - If checked the implementation with 8-byte nonces right know again and pushed some test vectors for it... Seems all okay...
Do you encrypt more than 64 x (2^31 -1) bytes ?! The implementation can't handle that, because it is designed for the IETF-ChaCha20 version....

An endian issue maybe?! Do you use a big-endian machine?!

@v2ray
Copy link
Collaborator

v2ray commented Jul 25, 2016

I found a difference between my implementation and yours. Here is the code and output on 64 bit macOS: https://gist.github.com/v2ray/9f767596bbd671d441b9a3f41c2e1128

Although I have no clue which one is correct.

@aead
Copy link
Contributor Author

aead commented Jul 25, 2016

Thanks for the report 👍
It was a bug in my implementation: In a special corner case (twice or more 128 byte blocks) the counter is incremented only by 1, not by 2 ...
Fixed it right now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants