The bloom filter will tell you whether _the hash of your password_ is in the set...

lightbyte · on June 15, 2018

>Imagine that password hashes are six characters, and `abcdef` is the only pwned password we know about. A smaller version of this tool would reject your password if it hashes to anything that starts with `abc` (such as `abc000`, `abcdee`, `abcdef`, etc).

Doesn't that not work very well at all though? If these "abcXXX" strings are hashes, than just because they share a prefix doesn't mean the unhashed password are even remotely similar.

"good-password-123-dog-xxx..." could hash to "abcdeg" and "password" could be the thing that hashed to "abcdef".

HellsMaddy · on June 16, 2018

You're completely right, but I think you're misinterpreting the usefulness of this bloom filter. It would most likely serve only as the first layer in a password strength check.

Because there is no possibility of a false negative, you know that if the bloom filter does not include the hash you gave it, the password definitely does not exist in the original corpus. In this case, the password is probably at least semi-unique. If you're using this bloom filter in the context of a web application registration process, this would likely be the case for anyone trying to sign up with a strong/unique password, e.g. a random one generated by a password manager.

However, if the bloom filter maybe contains the hash, you would likely fall back to some more expensive strength check, like querying the full HIBP database to see if this exact password definitely exists in there.

psobot · on June 15, 2018

Similarity between passwords doesn't really matter for the purposes of this checker - all it's looking for is a _possible_ exact match. If your password hashes to "abcXXX", then there are two possibilities:

- "abc" is in the bloom filter, and so your password _might_ be in the list (depending on what the remaining characters past "abc" are)

- "abc" is not in the bloom filter, so your password _is definitely not_ in the list.

This results in a high rate of false positives (times when a password is incorrectly thought to be part of the list, but it's not) but that's deemed to be an acceptable tradeoff for this algorithm.

sbarker · on June 15, 2018

Collisions in modern, properly implemented algorithms are rare. http://everydayinternetstuff.com/2015/04/hash-collision-prob... https://www.google.com/amp/s/www.theregister.co.uk/AMP/2017/...

lightbyte · on June 15, 2018

Collisions for the entire hash are rare, but this is just a prefix of the hash. I imagine it is significantly more common for a collision in the prefix.

ubernostrum · on June 15, 2018

I maintain a library that talks to the Have I Been Pwned password database, which uses a 5-hexadecimal-digit prefix of the SHA1 hash.

Someone filed a PR asking to add a cache for HIBP's responses, and I did some quick math and came up with ~1200 as a birthday-paradox number for the first five digits of a SHA1 hex digest (assuming uniform/random distribution of hex digits).

tylersmith · on June 15, 2018

If the bloom filter returns positive you check the HIBO DB by getting all the hashes with that prefix and check the returned list for the full hash.

basch · on June 15, 2018

The average number of hashes returned is 478

The smallest is 381 (hash prefixes "E0812" and "E613D")

The largest is 584 (hash prefixes "00000" and "4A4E8")

https://www.troyhunt.com/ive-just-launched-pwned-passwords-v...

FroshKiller · on June 15, 2018

Little code running in the real world any given moment is modern or properly implemented.