Replace ChaCha20Poly1305 implementation #221

aead · 2016-07-17T21:25:14Z

Improve AEAD speed with slightly faster poly1305 implementation.
Avoid memory allocations whenever possible. (AEAD)
But currently missing AVX2 support.

BenchmarkSeal64B-8 40.97 MB/s
BenchmarkSeal1K-8 183.82 MB/s
BenchmarkSeal64K-8 406.37 MB/s
BenchmarkOpen64B-8 45.79 MB/s
BenchmarkOpen1K-8 181.14 MB/s
BenchmarkOpen64K-8 417.22 MB/s

Improve AEAD speed with slightly faster poly1305 implementation. Avoid memory allocations whenever possible. (AEAD) But currently missing AVX2 support. BenchmarkSeal64B-8 1561 ns/op 40.97 MB/s BenchmarkSeal1K-8 5570 ns/op 183.82 MB/s BenchmarkSeal64K-8 161271 ns/op 406.37 MB/s BenchmarkOpen64B-8 1747 ns/op 45.79 MB/s BenchmarkOpen1K-8 5741 ns/op 181.14 MB/s BenchmarkOpen64K-8 157116 ns/op 417.22 MB/s

codecov-io · 2016-07-17T21:27:46Z

Current coverage is 86.81%

Merging #221 into master will decrease coverage by 0.08%

@@             master       #221   diff @@
==========================================
  Files            67         67          
  Lines          4129       4132     +3   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           3588       3587     -1   
- Misses          312        314     +2   
- Partials        229        231     +2

Powered by Codecov. Last updated by 0412926...233bd5e

aead · 2016-07-18T08:18:45Z

The Travis-CI build seems to fail because of a network error...

unable to access 'https://go.googlesource.com/tools/

lucas-clemente · 2016-07-18T11:00:24Z

Thanks for this PR! I restarted the travis build and it worked, so I'm going ahead with merging this now.

I've had a quick look at your implementation, and it indeed looks better with regard to allocations. It also allows us to use in-place encryption, which will probably be done in our current milestone (see #217).

However, I'm not sure if you're aware that we're only using AES-GCM right now. We switched since it is around an order of magnitude faster (with the current implementations, see below). Due to a lack of time I didn't build proper negotiation for chacha yet (the client gets to decide, see #201), but I will do this soon-ish.

Just pasting my benchmark results for future reference (of course they don't cover GC pressure):

Old implementation:
BenchmarkChaCha20Poly1305Seal1K-8         500000          2804 ns/op     365.13 MB/s
BenchmarkChaCha20Poly1305Open1K-8         500000          2808 ns/op     364.60 MB/s

New implementation:
BenchmarkSeal1K-8                 500000          2745 ns/op     373.03 MB/s
BenchmarkOpen1K-8                 500000          2732 ns/op     374.76 MB/s

AES-GCM:
BenchmarkAESGCMSeal1K-8      3000000           466 ns/op    2196.10 MB/s
BenchmarkAESGCMOpen1K-8      3000000           440 ns/op    2326.29 MB/s

aead · 2016-07-18T12:25:32Z

Thanks for merging!

It's quite hard to beat AES-GCM on amd64 with AES-NI - Maybe AVX2 has a chance (but both poly1305 and chacha20 must use it).
I'm working on that as soon as I have some time left.

Thanks for the benchmark reference - quite helpful!

aead · 2016-07-21T13:22:50Z

Significant performance improvement for poly1305 through amd64 assembly.
But still not comparable with AES-NI:

BenchmarkSeal64B-8 57.89 MB/s
BenchmarkSeal1K-8 298.51 MB/s
BenchmarkSeal64K-8 688.23 MB/s
BenchmarkOpen64B-8 47.62 MB/s
BenchmarkOpen1K-8 302.07 MB/s
BenchmarkOpen64K-8 678.04 MB/s

lucas-clemente · 2016-07-21T16:47:26Z

Very cool work! 👍 On my machine it's around 420, up from 370.

BenchmarkSeal1K-8                1000000          2435 ns/op     420.48 MB/s
BenchmarkOpen1K-8                 500000          2431 ns/op     421.14 MB/s

Once QUIC gets adopted by mobile browsers (who like choosing chacha over AES) this will become really useful :)

aead · 2016-07-25T14:55:07Z

Thanks 😄
I've some working (experimental) AVX2 code for ChaCha20 giving a great speedup again, but only go-1.7 will support AVX2 asm instructions (VPSHUFD etc.). The current go 1.7beta-3 have still some problems with AVX2 instructions - so readable asm-code requires the final go 1.7 release.

The main problem for you now is the compilation. If I add the AVX2 code to the repo (as soon as the release is available), only the 1.7 builds will pass ( <= 1.6.3 will fail). As far as I know Go does not support something like "version depended compilation", so I cannot fix this...
Do you want to keep support for older versions of Go ( <= 1.6.3) ?

Currently i think i'll create a AVX2 branch for chacha20 and as far as possible (without breaking other's code) make it the master...

Some benchmarks for AVX2 code on a Skylake i7-6500U:
Without AVX2:

BenchmarkSeal1K-4 565.44 MB/s
BenchmarkOpen1K-4 559.03 MB/s

With AVX2:

BenchmarkSeal1K-4 791.02 MB/s
BenchmarkOpen1K-4 779.34 MB/s

lucas-clemente · 2016-07-25T15:35:49Z

I would like to keep support for 1.6 for the near future.

However, I think it's possible to use Go's build tags for that purpose, see https://golang.org/pkg/go/build/#hdr-Build_Constraints.

In short, if you put // +build go1.7 into the AVX2 files, they will only be compiled on go 1.7 or up. Similarly for the old implementation.

aead · 2016-07-25T16:10:34Z

Thanks for the info - perfect 👍
Okay, AVX2 will be supported as soon as 1.7 is stable...

aead · 2016-07-26T18:44:03Z

SSSE3 support available: aead/chacha20@b2542ac
Performance improvement about 8-10% on my machine
Further fixed a critical bug detected by v2ray See aead/chacha20@0302457
Dependency update necessary...

lucas-clemente merged commit f1766f6 into quic-go:master Jul 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace ChaCha20Poly1305 implementation #221

Replace ChaCha20Poly1305 implementation #221

aead commented Jul 17, 2016 •

edited

Loading

codecov-io commented Jul 17, 2016 •

edited

Loading

aead commented Jul 18, 2016

lucas-clemente commented Jul 18, 2016

aead commented Jul 18, 2016

aead commented Jul 21, 2016

lucas-clemente commented Jul 21, 2016

aead commented Jul 25, 2016

lucas-clemente commented Jul 25, 2016 •

edited

Loading

aead commented Jul 25, 2016

aead commented Jul 26, 2016 •

edited

Loading

Replace ChaCha20Poly1305 implementation #221

Replace ChaCha20Poly1305 implementation #221

Conversation

aead commented Jul 17, 2016 • edited Loading

codecov-io commented Jul 17, 2016 • edited Loading

Current coverage is 86.81%

aead commented Jul 18, 2016

lucas-clemente commented Jul 18, 2016

aead commented Jul 18, 2016

aead commented Jul 21, 2016

lucas-clemente commented Jul 21, 2016

aead commented Jul 25, 2016

lucas-clemente commented Jul 25, 2016 • edited Loading

aead commented Jul 25, 2016

aead commented Jul 26, 2016 • edited Loading

aead commented Jul 17, 2016 •

edited

Loading

codecov-io commented Jul 17, 2016 •

edited

Loading

lucas-clemente commented Jul 25, 2016 •

edited

Loading

aead commented Jul 26, 2016 •

edited

Loading