Lessons from the History of Attacks on Secure Hash Functions

zackmorris · on Feb 25, 2017

From what I understand, the Feb 23rd SHA-1 attack was possible because they figured out how to get the internal state (160 bits or 5 words of 32 bits) to match from two separate pieces of data. After that, additional data could be appended to the first two pieces as long as it was identical.

The internal state would play back the same sequence from there on, just like two random number generators starting from the same seed.

Here is a comparison of internal state sizes:

https://en.wikipedia.org/wiki/SHA-0#Comparison_of_SHA_functi...

SHA-256 is susceptible to this same flaw, it would just take longer because it has about about 128 bits of security instead of less than 80 for SHA-1. It looks like only SHA-3 really "gets it" with a 1600 bit state size.

After all of the effort put into making highly pseudo-random hash functions, it's a wonder that the state size was only the size of the hash. By comparison, Mersenne Twister's state size is 19937 bits (624 words of 32 bits minus 31 bits):

http://www.quadibloc.com/crypto/co4814.htm

Does anyone know why hash algorithms keep using such small state sizes, leaving us vulnerable to this same issue?

sebcat · on Feb 25, 2017

128 bits are 2^48 (~2.81*10^14) times larger than 80 bits (80 bits is the brute force, and SHAttered was less than that), and comparing SHA-2 to a Mersenne Twister is just picking two numbers and saying "this one is less than that one" without context.

EDIT: "The computational effort spent on our attack is estimated to be equivalent to 2^63.1 SHA-1 calls" [1] Comparing that to 2^128 (birthday paradox SHA-256) is... It doesn't apply.

[1] http://shattered.it/static/shattered.pdf

meta_AU · on Feb 25, 2017

A 1:1 state to input ratio is just the optimal performance value (more input than state lowers secury, and the other way lowers efficiency). SHA-3 still has that ratio, it just supports consuming more bytes at a time. Increasing this size too much reduces efficiency for small input sizes.

SHA-512/256 is in a pretty good spot, as the internal state is twice the recommended hash security size, and the output is equal to the recommended size (256bit for "128bit security level). Common tree hash constructions will want to take 2 inputs and reduce to 1 output, which is the exact ratio for SHA-512/256.

magila · on Feb 26, 2017

What you really care about for tree hashes is the block size to output ratio. SHA-512/256 has a 1024 bit block size so it is less optimal than SHA-256, which has a 512 bit block size, for tree hashes. This is assuming you don't need any extra data in the internal nodes.

ribasushi · on Feb 26, 2017

Could you expand on this comment and/or provide links to relevant reading?

Much appreciated!

ctz · on Feb 25, 2017

> Does anyone know why hash algorithms keep using such small state sizes, leaving us vulnerable to this same issue?

Because small internal state is not actually the issue. SHA-1 was broken with 2^63 work, not 2^80 work. That discrepancy is why SHA-1 was deemed broken back in 2005, not because people regularly do 2^80 work.

jstrom · on Feb 25, 2017

I can't speak specifically to SHA-1/2, but typically the criteria for the standards takes into account lower-end hardware that must be able to execute the algorithms with limited memory and processing (especially embedded systems).

For example, smart cards and terminals for the payments/banking industry. These algorithms came out in the 90's when you often had 8-bit processors and very little memory to work with.

wyager · on Feb 25, 2017

This is a consequence of the Merkle–Damgård construction. It has little to do with the state size.

SHA-3 uses a different construction, and so is immune to this particular type of extension attack.

wyager · on Feb 26, 2017

I thought about this more and realized I was wrong. Sponge construction doesn't help in this case. It's too late to edit.

benchaney · on Feb 25, 2017

This issue only means that once you find a collision, you kind find more collisions by extending it. Generally speaking it is expected that once a collision is found in a hash function it has already been mostly phased out, so this flaw is not seen as particularly important.

hannob · on Feb 25, 2017

> Another interesting pattern that I perceive in these results is that maybe sometime between 1996 (Tiger) and 2000 (Whirlpool), humanity learned how to make collision-resistant hash functions,

I actually feel that this can be even more generalized: At some point people learned to create unbreakable algorithms. There is literally no mainstream crypto algorithm beyond the 2000s that has seen any significant breakage. And very likely there never will be, with one exception: quantum computers will break modern ECC.

I think there's simply a dark age of crypto research with 90s algos and earlier. Which isn't surprising: Back then people were fighting whether it's even legal to do that kind of research.

meredydd · on Feb 25, 2017

This is far too optimistic - just look at the "History" chart. The average age of 90s hashes when they were broken was 10-15 years. It's equally probable that the "modern" algorithms are just too young for us to see them broken.

hannob · on Feb 25, 2017

There's some aspect you miss: For both major hash breakages there was a ~10 year warning phase (for md5 the first breakthrough was 1996, for sha1 2004), where it was basically clear these hashes were bad, just noone had done the full attack yet. There's no such warning from any modern hash yet.

pvg · on Feb 25, 2017

SHA-2 was originally published in 2001. The digest sizes are also much bigger. I don't think it's really 'equally probable'. There are even cryptographers who think it's not at all probable, ever.

lisper · on Feb 25, 2017

> quantum computers will break modern ECC

And RSA. And Diffie-Helman.

hannob · on Feb 25, 2017

They're far older than 2000.

My point was: Modern ECC (X25519 and friends) are the only post-2000 invented mainstream algs where any realistic breakage is in sight.

CiPHPerCoder · on Feb 25, 2017

And only if large quantum computers are possible. If it turns out they're not, X25519/Ed25519 are safe.

But I'm not willing to be the farm on that risk. PQCrypto, please. :)

interfixus · on Feb 25, 2017

"Unbreakable algorithms"? Save for the OTP - in the mathematical, not the rubber hose sense - as far as we know, there is no such thing.

hannob · on Feb 25, 2017

It should've been clear from my comment that I don't mean "provably theoretically unbreakable", but "practically unbreakable". Meaning no human will build a machine in the near future to break them.

interfixus · on Feb 26, 2017

Oh, my comment was actually meant as a simple clarification. It may have come out a bit harsher than that, I now realize in the clear, cold light of morning.

tromp · on Feb 25, 2017

The broken SPHINCS links should perhaps point to the paper https://sphincs.cr.yp.to/sphincs-20150202.pdf

stavros · on Feb 26, 2017

I'd never heard of this before. Is it as good as it looks?

jepler · on Feb 26, 2017

There's a second hypothesis that explains why (second) preimage attacks are not commonly published: after a hash collision is produced, the crypto community has no further interest. How much was ever published about, say, MD4 after the 1995 document that this article cites, showing a practical piece of software for generating collisions? I haven't tried to answer that question, but I suspect the answer is "not much".

jepler · on Feb 26, 2017

I could be wrong about this. For instance, http://dl.acm.org/citation.cfm?id=2148250.2148477 https://pdfs.semanticscholar.org/868b/443aec1e87951c8a896b12... "Improved preimage attack on one-block MD4" is a 2012 publication improving the preimage attack on MD4 from 2^107 to 2^95. So 17 years after a practical collision attack on MD4, academics ARE still working on it as a cryptographic primitive, but despite that they have NOT found a practical preimage attack on it.

chronial · on Feb 26, 2017

> Hash-based digital signatures are secure (resistant to forgery) as long as the hash function they are built on has second-pre-image resistance

I am not very experienced with this, but isn't this clearly wrong?

If I have a controllable collision (like SHA1), I can get someone to sign document A, then destroy all evidence of document A's existence and claim they signed document B.

Isn't it essential that a digital signature scheme is immune against such an attack?

x1798DE · on Feb 26, 2017

I think generally you're signing things you generated, like your own binaries, your code, that sort of thing. The part where collisions matter is if you're signing someone else's documents, like a CA signing TLS certs, or someone in the web of trust signing the keys of someone based on their ID.

_d4bj · on Feb 26, 2017

Generally people won't sign garbage or random documents, so Document A has to be of a very specific subset of documents a person would sign. Then finding Document B constitutes a second pre-image attack.

hueving · on Feb 26, 2017

You're right as long as the format doesn't contain a place to hide all of the garbage (e.g. Crap not rendered in a PDF by the PDF viewer).

loup-vaillant · on Feb 26, 2017

ed25519 is listed as sensitive to hash collision.

I believe that's an error: the official site¹ puts "collision resilience" in the list of features.

(1) http://ed25519.cr.yp.to/