Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are 2^256 potential outputs for SHA-256, while the number of potential inputs is infinite. Therefore, the same output can be generated with different inputs, although finding such "collisions" by chance is extremely unlikely


The claim is not that every output has a unique input, which would not be correct, and seems to be what you are addressing.


at 1:08 in the video, that is exactly what he claims:

"So every piece of data in the world has its own unique hash digest."

This is false for the reasons apeescape describes: every piece of data in the world has its own hash digest, but these hash digests are not unique.


Yes that sentence is technically incorrect, but practically correct. We've never found a collision and though we expect it to be theoretically possible, even common if you consider "all possible inputs" and the pigeonhole principle, for practical purposes hash outputs are unique because nobody considers "all possible inputs" when evaluating probabilities.

I'm saying that for a layman explanation, it's reasonable to say that hash outputs are unique. Because following that with "technically, it's more 'practically' unique, theoretically there are collisions but you won't encounter them with probability > 2^-256" (or whatever it is) just confuses the topic to them more than just summarizing. You have to admit that most people won't go on a 200h adventure to learn about the state space of 256+ bits and how to conceptualize tiny statistical probabilities, so there must be a point where you have to cut the explanation to an approximation of the truth. This is true in every field.


I don't like to leave holes like this in people's comprehension. It's OK if people don't end up with an intuitive feeling for how relatively unlikely different things that don't actually happen are, but I want them to be aware of that category as distinct from things which can't happen because the type of argument needed is different.

The air molecules in the room you're in can't all gather in one corner because that's not possible, it's forbidden by conservation rules.

But they won't gather in two opposite corners only because that's so tremendously unlikely, it would be allowed by conservation but statistically it's ludicrous.

The same is true at the opposite end of the spectrum. Almost all real numbers are normal (in all bases) but the nature of "Almost all" in mathematics is different in an important way from "All" and I want people to grasp this difference when I'm discussing properties of numbers. It definitely is not true that all real numbers are normal, you probably rarely think about any normal numbers at all.


> I don't like to leave holes like this in people's comprehension.

I agree. I think this wording would be better than in my previous comment, what do you think?

    it's reasonable to say that hash outputs are *almost surely* unique


> I'm saying that for a layman explanation, it's reasonable to say that hash outputs are unique. [...] theoretically there are collisions but you won't encounter them

You could have said exactly the same thing about MD5 right up until you couldn't. Then you could have said "oh yeah well MD5 is broken, but it's safe to assume you'll never find one for SHA-1", right up until we did. So if you say "oh yeah well SHA-1 is broken, but it's safe to assume you'll never find one for SHA-256", I disagree.

It would be one thing if collisions in hash functions were found by just repeatedly hashing things until you find a collision. If that were the case, then yes, I'd agree with you on those 1-in-2^256 odds, at least for a while. But by and large, that's not what happens. Over time, weaknesses are found in algorithms which allow you shrink the search space, which significantly changes your odds.


Kind of agree w you, but still feel adding a few words by way of a disclaimer about collisions is much better than presenting as plain truth something that merely approaches it.


On the other hand, if we can count "every piece of data in the world" then we can estimate the probability of having a collision.


I see what you mean, but it sounds like the output is unique, and we probably agree that in this field you need to use sentences that cannot be easily misinterpreted.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: