Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ChaCha20Poly1305 implementation #221

Merged
merged 1 commit into from
Jul 18, 2016
Merged

Replace ChaCha20Poly1305 implementation #221

merged 1 commit into from
Jul 18, 2016

Conversation

aead
Copy link
Contributor

@aead aead commented Jul 17, 2016

Improve AEAD speed with slightly faster poly1305 implementation.
Avoid memory allocations whenever possible. (AEAD)
But currently missing AVX2 support.

BenchmarkSeal64B-8      40.97 MB/s
BenchmarkSeal1K-8       183.82 MB/s
BenchmarkSeal64K-8    406.37 MB/s
BenchmarkOpen64B-8     45.79 MB/s
BenchmarkOpen1K-8     181.14 MB/s
BenchmarkOpen64K-8   417.22 MB/s

Improve AEAD speed with slightly faster poly1305 implementation.
Avoid memory allocations whenever possible. (AEAD)
But currently missing AVX2 support.

BenchmarkSeal64B-8     1561 ns/op       40.97 MB/s
BenchmarkSeal1K-8      5570 ns/op      183.82 MB/s
BenchmarkSeal64K-8     161271 ns/op    406.37 MB/s
BenchmarkOpen64B-8     1747 ns/op       45.79 MB/s
BenchmarkOpen1K-8      5741 ns/op      181.14 MB/s
BenchmarkOpen64K-8     157116 ns/op    417.22 MB/s
@codecov-io
Copy link

codecov-io commented Jul 17, 2016

Current coverage is 86.81%

Merging #221 into master will decrease coverage by 0.08%

@@             master       #221   diff @@
==========================================
  Files            67         67          
  Lines          4129       4132     +3   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           3588       3587     -1   
- Misses          312        314     +2   
- Partials        229        231     +2   

Powered by Codecov. Last updated by 0412926...233bd5e

@aead
Copy link
Contributor Author

aead commented Jul 18, 2016

The Travis-CI build seems to fail because of a network error...

unable to access 'https://go.googlesource.com/tools/

@lucas-clemente
Copy link
Member

Thanks for this PR! I restarted the travis build and it worked, so I'm going ahead with merging this now.

I've had a quick look at your implementation, and it indeed looks better with regard to allocations. It also allows us to use in-place encryption, which will probably be done in our current milestone (see #217).

However, I'm not sure if you're aware that we're only using AES-GCM right now. We switched since it is around an order of magnitude faster (with the current implementations, see below). Due to a lack of time I didn't build proper negotiation for chacha yet (the client gets to decide, see #201), but I will do this soon-ish.

Just pasting my benchmark results for future reference (of course they don't cover GC pressure):

Old implementation:
BenchmarkChaCha20Poly1305Seal1K-8         500000          2804 ns/op     365.13 MB/s
BenchmarkChaCha20Poly1305Open1K-8         500000          2808 ns/op     364.60 MB/s

New implementation:
BenchmarkSeal1K-8                 500000          2745 ns/op     373.03 MB/s
BenchmarkOpen1K-8                 500000          2732 ns/op     374.76 MB/s

AES-GCM:
BenchmarkAESGCMSeal1K-8      3000000           466 ns/op    2196.10 MB/s
BenchmarkAESGCMOpen1K-8      3000000           440 ns/op    2326.29 MB/s

@lucas-clemente lucas-clemente merged commit f1766f6 into quic-go:master Jul 18, 2016
@aead
Copy link
Contributor Author

aead commented Jul 18, 2016

Thanks for merging!

It's quite hard to beat AES-GCM on amd64 with AES-NI - Maybe AVX2 has a chance (but both poly1305 and chacha20 must use it).
I'm working on that as soon as I have some time left.

Thanks for the benchmark reference - quite helpful!

@aead
Copy link
Contributor Author

aead commented Jul 21, 2016

Significant performance improvement for poly1305 through amd64 assembly.
But still not comparable with AES-NI:

BenchmarkSeal64B-8       57.89 MB/s
BenchmarkSeal1K-8       298.51 MB/s
BenchmarkSeal64K-8     688.23 MB/s
BenchmarkOpen64B-8     47.62 MB/s
BenchmarkOpen1K-8     302.07 MB/s
BenchmarkOpen64K-8   678.04 MB/s

@lucas-clemente
Copy link
Member

Very cool work! 👍 On my machine it's around 420, up from 370.

BenchmarkSeal1K-8                1000000          2435 ns/op     420.48 MB/s
BenchmarkOpen1K-8                 500000          2431 ns/op     421.14 MB/s

Once QUIC gets adopted by mobile browsers (who like choosing chacha over AES) this will become really useful :)

@aead
Copy link
Contributor Author

aead commented Jul 25, 2016

Thanks 😄
I've some working (experimental) AVX2 code for ChaCha20 giving a great speedup again, but only go-1.7 will support AVX2 asm instructions (VPSHUFD etc.). The current go 1.7beta-3 have still some problems with AVX2 instructions - so readable asm-code requires the final go 1.7 release.

The main problem for you now is the compilation. If I add the AVX2 code to the repo (as soon as the release is available), only the 1.7 builds will pass ( <= 1.6.3 will fail). As far as I know Go does not support something like "version depended compilation", so I cannot fix this...
Do you want to keep support for older versions of Go ( <= 1.6.3) ?

Currently i think i'll create a AVX2 branch for chacha20 and as far as possible (without breaking other's code) make it the master...

Some benchmarks for AVX2 code on a Skylake i7-6500U:
Without AVX2:

  • BenchmarkSeal1K-4 565.44 MB/s
  • BenchmarkOpen1K-4 559.03 MB/s

With AVX2:

  • BenchmarkSeal1K-4 791.02 MB/s
  • BenchmarkOpen1K-4 779.34 MB/s

@lucas-clemente
Copy link
Member

lucas-clemente commented Jul 25, 2016

I would like to keep support for 1.6 for the near future.

However, I think it's possible to use Go's build tags for that purpose, see https://golang.org/pkg/go/build/#hdr-Build_Constraints.

In short, if you put // +build go1.7 into the AVX2 files, they will only be compiled on go 1.7 or up. Similarly for the old implementation.

@aead
Copy link
Contributor Author

aead commented Jul 25, 2016

Thanks for the info - perfect 👍
Okay, AVX2 will be supported as soon as 1.7 is stable...

@aead
Copy link
Contributor Author

aead commented Jul 26, 2016

SSSE3 support available: aead/chacha20@b2542ac
Performance improvement about 8-10% on my machine
Further fixed a critical bug detected by v2ray See aead/chacha20@0302457
Dependency update necessary...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants