The SHA-256 "weakness" is a bit disingenuous. As far as I'm aware in the 10 years since that paper was published the "weakness" hasn't been extended _at all_. The main reason for the SHA3 competition was because we were all concerned at the time that SHA2 would be broken and there'd be nothing to replace it with. However as it happens, SHA2 has remained rock solid, so the outcome of SHA3 was to pick something (Keccak) maximally different from SHA2, since in the intervening years cryptographic diversity became more desired than actual replacement. Both SHA2 and SHA3 are fit for use in production systems -- I would personally pick SHA2 unless I had some specific reason to benefit from SHA3 (e.g. incremental hashing modes that the sponge construction makes possible).
The problem here is that the green cells mean "considered strong" and the yellow cells mean... something else. But among practitioners, SHA-2 is still very much considered strong.
I'm also not clear on what the chart claims is going on with RIPEMD-160.
Probably 80bit collision resistance would be my guess. But that's not a concern for applications that require preimages to attack (e.g. Bitcoin 1... keys).
It looks like it's important to switch to a variant of SHA-2 that is truncated (like SHA-384 or SHA-512/256 for example), since they are more robust. No trivial length extension attack since they don't put the whole internal state in the output.
This would be if one cares about how violently you fall if SHA-2 is broken more, but I think you should care.
That's not how I would put it. Existing protocols that use full-width SHA-2 aren't inherently suspect for doing so, but new designs that use SHA-2 directly might benefit from doing so. It's worth remembering that most uses of SHA-2 in real protocols occur in MACs and PRFs, where length extension either isn't a problem or is addressed specifically.
Another way to look at it is, not needing HMAC is a feature of SHA-3.
In any case: SHA-2 --- in any of its variants --- remains strong. JP Aumasson, one of the Blake2 designers, is fond of saying that SHA-2 will probably never be broken algorithmically.
Using a truncated SHA-512/256 also has the advantage of being considerably faster than SHA-256 on 64-bit CPUs. The performance difference is close to 50%. As such, I see no reason to use SHA-256 for anything[1], you're better off just using SHA-512/256 as the net storage is the same as well (32-bytes).
[1]: Assuming you're running on a 64-bit platform...
For hashing large data, yes. For password derivation schemes or hash trees, no. SHA512 is only faster per byte because the block size is larger. SHA256 has faster rounds.
EDIT: I realized you were asking about "design your protocols against length extension", not "what is length extension". Sorry. Ignore this, but I'll leave the response for anyone else who is curious
---
SHA-{1,2} are vulnerable to length extension attacks, because of the way their Merkle-Damgard construction works. Basically, if you have the result HASH(X), you can calculate HASH(X+P), knowing only 'HASH(X)', without knowing 'X' itself. This is problematic because it means you can effectively "extend" some cryptographically hashed data with an arbitrary suffix.
You can immunize SHA-2 in one of two ways to a length extension attack:
- Truncate the hash value. For example, use SHA-512, but truncate the result to 256 bits (this is called "SHA-512/256"). This works because the output of your HASH function which is 256-bits, cannot be reused as the initial starting state for a new round of HASH (e.g. SHA-512 needs a 512-bit starting value, but the output is only 256, so this makes it impossible)
- Although it is not standard SHA-2 anymore, you can use a trick from Neils Ferguson and John Kelsey (from 2001!) to immunize it -- simply XOR a constant into the final block round before running the final compression function. Unfortunately this suggestion was not adopted for SHA-2: http://www.cs.utsa.edu/~wagner/CS4363/SHS/dfips-180-2-commen...
Alternatively just use a modern hash design like BLAKE2, SHA-3, or a number of others -- these avoid the entire problem. I'd suggest just sticking to one of those two.
I object to the notion that SHA-3 and Blake2 are "modern" hash functions and that SHA-2 isn't. SHA-2 remains a first-line hashing recommendation among crypto designers. The Noise protocol framework, for instance, includes it, as does Nacl; the authors of both of these systems had hashes available that didn't have length extension.
If I had to rank those three hash functions, SHA-2 wouldn't be in last place.
I wouldn't do any such thing without thorough analysis. It very well might have unexpected consequences.
You can concatenate hash functions, so you get at least the best security from both. But the question is: Why would you want to do any such thing? Just choose one of them. Both are safe.
Almost nobody has SHA-3 capable hardware though, and "getting it" is a lot of work (either pay money or work on an FPGA I guess). I'd argue SHA-3 being efficient in hardware is almost completely irrelevant for 99% of all users of cryptographic software. For the vast majority of people, I think fast software implementations are way more important -- especially as systems like NFV and SDN come into play at large scale (people want bog-standard x86 boxes for this stuff).
I almost wish SHA-3 had been a dual pick between a fast software hash and a fast hardware hash. As it stands, Keccak being so slow in software is majorly limiting IMO. The more interesting aspect of SHA-3 is the sponge, so you can really turn Keccak into an entire swiss-army knife of crypto tools, if you know what you're doing.
But as it stands, if I have to pick a modern hash, I almost always pick BLAKE2 instead of SHA-3, primarily because I rarely need the sponge design and also because it's dramatically faster in software. Stuff like this is really important on my Cortex-M4...
Unless my math is off, the combined power of the bitcoin network could find collisions in seconds (ignoring SHA-1 vs SHA-256). It isn't too unreasonable to assume that kind of hardware power would be available to nation states.
And the next fifteen years of Moore's Law will take that down to, what, 1 GPU month even without further algorithmic improvements? Which are anticipated?
I still see things that use 2-digit years, twenty years after the last millennium bug should have been fixed.
"Next fifteen years of Moore's Law?" The recent failure of Intel's "tick-tock" alternation of process shrinkage and new architecture suggests that however performance improves in the next 15 years, projecting the last 15 years' Moore's Law forward is a bad idea. For crypto stuff, I'd think about how quantum computing may advance by the 2030s.
> Explain why a simple collision attack is still useless, it's really the second pre-image attack that counts
Why is this the "non-expert reaction?" It's correct, right?
And why go to the trouble of making a timeline with "Broken" and "Collision found" without a "Second pre-image found"?
I'm genuinely puzzled (and have asked it here on HN before [1]), but I unfortunately suspect that an honest answer lies somewhere along these lines: "Second pre-images are a hell of lot more difficult to find than collisions, so if we waited around for a second pre-image to be found we'd never get to dance around like headless chickens and talk about really scary shit (which typically requires a second pre-image) as 'now practical'. That would make the whole field a lot less sexy, and cut into our 'expert' consulting fees..."
EDIT: I'm of course not advocating staying with SHA-1. There's absolutely no good reason to. Even years ago when I was last involved in choosing a cryptographic hash function (even truncated) SHA-256 was obviously a much better choice.
I had a similar thought, but actually prefer this way.
People who understand things well enough will know what first collision means, so can moderate their response.
Others who are less familiar with the ridiculous levels of subtlety around this sort of thing are better off being given the simple message that sha1 is now legacy in all cases. Helps to avoid mistakes.
Simple "truths" are sometimes useful. I can see that.
I'm more surprised though at the resentful attitude towards the actual truth (in the non-alternative sense): collisions are useless in many cases and the necessary second pre-image is much more difficult to find.
At least in a forum like HN I'd expect intellectual honesty to prevail.
EDIT: Removed the incorrect example out of pure shame! :P
/u/pvg linked below how the ability to generate collisions for MD5 was used to obtain a fake CA certificate. It's not obvious to me that this would not work with SHA-1 certificates, and that no other important things we use have similar weaknesses. (Neither do I know for sure that it would work, but "collisions are useless" seems like a dangerous simplification in the other direction. I suspect for many, simply replacing SHA-1 with something deemed better is easier than thoroughly evaluating the risks involved with not doing so)
Now we're getting somewhere in terms of intellectual discussion! :)
I of course agree we should replace SHA-1. But I still think a more intellectually honest discussion is meaningful. For example, after reading the link you referred to I'm rather convinced that X.509 is pretty seriously flawed, and could easily be redesigned to be collision resistant. Why not talk about that? Why not do it?
As I understand it, CAs have mitigated the collision attacks by forcing a random serial number they generate into the certificates. Since that's part of the hash, collision attacks are no longer practical.
Doing x509 still means having to parse ASN.1 though, and nobody seems to actually like it.
Unless your system is very simple (i.e. even if it's moderately simple) it's non-trivial to really be sure that a 2nd-pre-image attack is truly necessary to break any part of your system, and that a collision attack truly cannot break any part of your system.
Is it...? I've designed a system that used the SHA-256 hash of an RSA public key to establish trust between a client and a server. Client gets the SHA-256 hash as part of its static configuration (out of band). When the client connects to the server over DTLS it gets its public key, and checks that the key hashes to the configured value.
Isn't it rather obvious that no third party can attack this system even if they can create an SHA-256 collision?
Sure, you can dream up some weird scenario where someone could set up two servers with different RSA public keys that would hash to the same value, so that the client could be fooled into connecting to one when it thought it was connected to the other. But to me it seems quite obvious that that's outside the treat model / irrelevant.
I'd say this is a "very simple system". Real use-cases are more complex and have to deal with various issues you brushed aside by saying "out of band".
Especially in situations where you don't trust what's giving you the hash (e.g. because you haven't authenticated them yet, or because the application inherently means you can't trust them even if you have authenticated them) then collision resistance most likely necessary.
Well, looking around the last few days, this SHA-1 collision is breaking various pieces of software (svn in particular). You can't count on two documents with the same hash being the same document.
Sure, there is broken, and then even-more broken, but SHA-1 is already, for most practical purposes, useless.
Sure, if the claims stopped at SHA-1 being "useless" then I have no complaints. But generally the headless chicken dance tends to quickly move on to all kinds of practical exploitability; which simply isn't true.