Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
Commit 5172d322 authored by Eric Biggers's avatar Eric Biggers Committed by Herbert Xu
Browse files

crypto: arm/blake2s - add ARM scalar optimized BLAKE2s

Add an ARM scalar optimized implementation of BLAKE2s.

NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too
small for NEON to help.  Each NEON instruction would depend on the
previous one, resulting in poor performance.

With scalar instructions, on the other hand, we can take advantage of
ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an
implementation get runs much faster than the C implementation.

Performance results on Cortex-A7 in cycles per byte using the shash API:

	4096-byte messages:
		blake2s-256-arm:     18.8
		blake2s-256-generic: 26.0

	500-byte messages:
		blake2s-256-arm:     20.3
		blake2s-256-generic: 27.9

	100-byte messages:
		blake2s-256-arm:     29.7
		blake2s-256-generic: 39.2

	32-byte messages:
		blake2s-256-arm:     50.6
		blake2s-256-generic: 66.2

Except on very short messages, this is still slower than the NEON
implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8,...
parent bbda6e0f
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment