Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neon impl of ChaCha20 (better size & perf) #9701

Open
wants to merge 17 commits into
base: development
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Introduce chacha20_neon_inc_counter
Signed-off-by: Dave Rodgman <[email protected]>
  • Loading branch information
daverodgman committed Oct 15, 2024
commit b0a9055c7a54f937be48aef1c22f07078428f699
15 changes: 10 additions & 5 deletions tf-psa-crypto/drivers/builtin/src/chacha20.c
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,14 @@ static inline uint32x4_t chacha20_neon_vrotlq_7_u32(uint32x4_t v)
return vsriq_n_u32(x, v, 25);
}

// Increment the 32-bit element within v that corresponds to the ChaCha20 counter
static inline uint32x4_t chacha20_neon_inc_counter(uint32x4_t v)
{
const uint32_t inc_const_scalar[4] = { 1, 0, 0, 0 };
const uint32x4_t inc_const = vld1q_u32(inc_const_scalar);
return vaddq_u32(v, inc_const);
}

static inline void chacha20_block(uint32x4_t a,
uint32x4_t b,
uint32x4_t c,
Expand Down Expand Up @@ -330,14 +338,11 @@ int mbedtls_chacha20_update(mbedtls_chacha20_context *ctx,
uint32x4_t c = vld1q_u32(&ctx->state[8]);
uint32x4_t d = vld1q_u32(&ctx->state[12]);

const uint32_t inc_const_scalar[4] = { 1, 0, 0, 0 };
const uint32x4_t inc_const = vld1q_u32(inc_const_scalar);

/* Process full blocks */
while (size >= CHACHA20_BLOCK_SIZE_BYTES) {
chacha20_block(a, b, c, d, output + offset, input + offset);

d = vaddq_u32(d, inc_const);
d = chacha20_neon_inc_counter(d);

offset += CHACHA20_BLOCK_SIZE_BYTES;
size -= CHACHA20_BLOCK_SIZE_BYTES;
Expand All @@ -348,7 +353,7 @@ int mbedtls_chacha20_update(mbedtls_chacha20_context *ctx,
/* Generate new keystream block and increment counter */
memset(ctx->keystream8, 0, CHACHA20_BLOCK_SIZE_BYTES);
chacha20_block(a, b, c, d, ctx->keystream8, ctx->keystream8);
d = vaddq_u32(d, inc_const);
d = chacha20_neon_inc_counter(d);

mbedtls_xor_no_simd(output + offset, input + offset, ctx->keystream8, size);

Expand Down