Speed was probably not the criteria otherwise blake3 seems like an even better c...

tptacek · on Dec 31, 2021

I think speed is definitely a big part of this, though the speedup comes primarily from getting rid of superfluous calls to RDRAND. Blake2s is already in the kernel (after a fashion) for WireGuard itself; I don't think Blake3 is. An additional nerdy point here is that the extraction phase of the LKRNG is already using ChaCha (there's a sense in which a CSPRNG is really just the keystream of a stream cipher), and Blake2s and ChaCha are closely related.

pedrocr · on Dec 31, 2021

So should we expect a blake3 switch sometime in the future? It seems to be a refinement of blake2 to make it more amenable to optimization while keeping most (all?) its qualities. Being well suited for optimization across architectures would also make it ideal for the kernel and it seems the reference implementation has already done a lot of the heavy lifting.

tptacek · on Dec 31, 2021

I doubt it, but who knows?

Kubuxu · on Dec 31, 2021

Blake3 will only result in small incremental perf improvement due to reduced number of rounds (10 to 7).

Majority of Blake3's perf benefit manifests from merkle-tree structure and SIMD processing of multiple input streams at the same time.

oconnor663 · on Dec 31, 2021

> multiple input streams at the same time

Just in case this isn't clear, BLAKE3 breaks a single large input up into many chunks, and it hashes those chunks in parallel. The caller doesn't need to provide mulitple separate inputs to take advantage of the SIMD optimizations. (If you do have multiple separate inputs, you can actually use similar SIMD optimizations with almost any hash function. But because this situation is rare, libraries that provide this sort of API are also rare. Here's one of mine: https://docs.rs/blake2s_simd/1.0.0/blake2s_simd/many/index.h....)