Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vmess: Use aes-128-gcm by default on arm64 platform #812

Closed
wants to merge 1 commit into from
Closed

Vmess: Use aes-128-gcm by default on arm64 platform #812

wants to merge 1 commit into from

Conversation

astrataro
Copy link
Contributor

AES instructions are available on most arm64 phones since ARMv8-A. Modern browsers like Chrome also prefers aes-128-gcm on arm64 platform, leaving chacha20-poly1305 preferred only on 32-bit arm platform.

Speed test on Snapdragon 821 (MSM8996AB Pro):

# openssl speed -elapsed -evp chacha20-poly1305
You have chosen to measure elapsed time instead of user CPU time.
Doing chacha20-poly1305 for 3s on 16 size blocks: 12813542 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 64 size blocks: 7206714 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 256 size blocks: 2782860 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 1024 size blocks: 762759 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 8192 size blocks: 97398 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 16384 size blocks: 48783 chacha20-poly1305's in 3.00s
OpenSSL 1.1.0f  25 May 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/lib/ssl\"" -DENGINESDIR="\"/usr/lib/arm-linux-gnueabihf/engines-1.1\""
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
chacha20-poly1305    68338.89k   153743.23k   237470.72k   260355.07k   265961.47k   266420.22k

# openssl speed -elapsed -evp aes-128-gcm
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-gcm for 3s on 16 size blocks: 21963425 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 64 size blocks: 14156835 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 256 size blocks: 6218991 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 1024 size blocks: 1874332 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 8192 size blocks: 247873 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 16384 size blocks: 122891 aes-128-gcm's in 3.00s
OpenSSL 1.1.0f  25 May 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/lib/ssl\"" -DENGINESDIR="\"/usr/lib/arm-linux-gnueabihf/engines-1.1\""
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-gcm     117138.27k   302012.48k   530687.23k   639771.99k   676858.54k   671148.71k

PS. Maybe we should let the go runtime check CPU instructions sets rather than CPU architectures when choosing default cipher. On many old amd64 CPUs without AES-NI, chacha20-poly1305 is also way faster than aes-128-gcm.

AES instructions are available on most arm64 phones since ARMv8-A. Modern browsers like Chrome also prefers aes-128-gcm on arm64 platform, leaving chacha20-poly1305 preferred only on 32-bit arm platform.

Speed test on Snapdragon 821 (MSM8996AB Pro):
# openssl speed -elapsed -evp chacha20-poly1305
You have chosen to measure elapsed time instead of user CPU time.
Doing chacha20-poly1305 for 3s on 16 size blocks: 12813542 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 64 size blocks: 7206714 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 256 size blocks: 2782860 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 1024 size blocks: 762759 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 8192 size blocks: 97398 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 16384 size blocks: 48783 chacha20-poly1305's in 3.00s
OpenSSL 1.1.0f  25 May 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/lib/ssl\"" -DENGINESDIR="\"/usr/lib/arm-linux-gnueabihf/engines-1.1\""
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
chacha20-poly1305    68338.89k   153743.23k   237470.72k   260355.07k   265961.47k   266420.22k

# openssl speed -elapsed -evp aes-128-gcm
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-gcm for 3s on 16 size blocks: 21963425 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 64 size blocks: 14156835 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 256 size blocks: 6218991 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 1024 size blocks: 1874332 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 8192 size blocks: 247873 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 16384 size blocks: 122891 aes-128-gcm's in 3.00s
OpenSSL 1.1.0f  25 May 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/lib/ssl\"" -DENGINESDIR="\"/usr/lib/arm-linux-gnueabihf/engines-1.1\""
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-gcm     117138.27k   302012.48k   530687.23k   639771.99k   676858.54k   671148.71k

PS. Maybe we should let the go runtime check CPU instructions sets rather than CPU architectures when choosing default cipher. On many old amd64 CPUs without AES-NI, chacha20-poly1305 is also way faster than aes-128-gcm.
@codecov
Copy link

codecov bot commented Jan 13, 2018

Codecov Report

Merging #812 into master will decrease coverage by 0.13%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #812      +/-   ##
==========================================
- Coverage   74.38%   74.24%   -0.14%     
==========================================
  Files         193      193              
  Lines        8857     8857              
==========================================
- Hits         6588     6576      -12     
- Misses       1696     1710      +14     
+ Partials      573      571       -2
Impacted Files Coverage Δ
common/protocol/headers.go 86.36% <100%> (ø) ⬆️
transport/internet/websocket/connection.go 65.95% <0%> (-8.52%) ⬇️
proxy/vmess/vmess.go 87.5% <0%> (-8.34%) ⬇️
app/proxyman/mux/session.go 71.01% <0%> (-7.25%) ⬇️
proxy/vmess/outbound/outbound.go 73.91% <0%> (-2.18%) ⬇️
app/proxyman/inbound/dynamic.go 68.86% <0%> (-1.89%) ⬇️
proxy/socks/server.go 75.78% <0%> (-1.57%) ⬇️
proxy/socks/client.go 81.15% <0%> (-1.45%) ⬇️
app/proxyman/mux/mux.go 68.24% <0%> (-0.86%) ⬇️
proxy/vmess/inbound/inbound.go 76.87% <0%> (-0.63%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72e9ef8...e232504. Read the comment docs.

@xiaokangwang
Copy link
Contributor

Currently, Golang does not support AES acceleration found on arm64 and is actively working on this.

golang/go#18498

This PR needs to be at least postpone to the merge of related upstream PR.

@DarienRaymond
Copy link
Contributor

Same as @xiaokangwang . The change at Golang side will probably released in Go 1.10 (at around Aug 2018), and we will follow afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants