Password checkup: from 0 to 650k users in 20 days

moviuro · on April 4, 2019

Troy Hunt already nailed the perfect service and API. Google's solution here is not documented, and clearly not as usable as Troy's [0] (anyone with openssl(1)+curl(1) can check if they've been pwned right from the CLI [1])

[0] https://haveibeenpwned.com/API/v2#PwnedPasswords

[1] https://gitlab.com/moviuro/pass-hibp/blob/master/hibp.bash

kerng · on April 4, 2019

Why not team up with haveibeenpwned that has been around for years? Seems like this is doing the same thing

wheelerwj · on April 4, 2019

[flagged]

harikb · on April 4, 2019

They try to do a decent job. They don't send your password to Google. Given them credit where it is due. Unlike, say, another sv company who decided to ask the user's email password to auto-verify emails.

fouc · on April 4, 2019

This is a reference to Facebook https://news.ycombinator.com/item?id=19559617

bradknowles · on April 3, 2019

How does this compare to https://haveibeenpwned.com/Passwords ?

vichu · on April 3, 2019

It does seem very similar to the implementation of HIBP's Pwned Passwords by Junade Ali[0]. I haven't delved into the nitty gritty details/differences between the two, but they do seem to use similar techniques to guarantee k-anonymity.

A key difference, at a glance, is the inclusion of usernames to be paired with the leaked passwords.

[0] Junade Ali's write-up https://blog.cloudflare.com/validating-leaked-passwords-with...

codeddesign · on April 4, 2019

Wow..that was the longest article I have ever seen that has absolutely nothing to do with the title.

zaroth · on April 4, 2019

First they create a lookup table of encrypted (blinded) hashes of each (username, password) that they've found on the darknet, and index this table by the first two bytes of the unencrypted hash of the username and password.

   // H = Argon2(username + password)
   Lookup[H[0:1]] = H^b;

The hash function is Argon2 run with time cost 3 and RAM cost 256MB. They claim a 100M record database took 1200 compute days to process, and the actual database is 4 billion records, so presumably they spent 48,000 cpu-days initializing the database (but they don't state that explicitly). One run of Argon2(3, 256) takes about 1 second. [1]

H[0:1] is a 16-bit hash prefix of the (username, password) which is used to query the dataset. That means each query will select for approximately 1 in 2^16, or for a 4 billion large set should be expected to return about 60,000 results for each query. They state the query actually returns ~1MB of data for each lookup.

[Side Note: I like that they have chosen to use a just a 16 bit prefix, versus the HIBP prefix which is 20 bit, which to me is a little too selective if someone is pre-screening an online attack].

Here's where it gets a little neat.

You send the first two bytes of the hash of your username and password, along with an encryption of your full hash, call it 'H^a'. The server returns all the H^b for your prefix, along with your (H^a)^b which is H^ab.

When you get back your H^ab along with all the cracked H^b, you can unblind H^ab back to H^b using your 'a' (which is random, ephemeral) because the EC encryption is communicative. So cool. Now you have your hash (which Google may have never seen before!) in the form of H^b, and Google never saw the plaintext.

Essentially Google remotely encrypted your plaintext with their key, and you never saw their key, and they never saw your plaintext. But this allows you to now check if your hash (in the form of H^b) is in the set of ~60,000 H^b that Google returned. If it is, then they have your username and password in their dataset.

This is different than HIBP because Google is taking the risk of holding the actual username, password tuples in the form of... essentially... a keyed hash. Troy was explicitly not willing to take that risk.

This also lets anyone in the world essentially perform an online attack against Google's 4 billion record database however fast they can run the Argon function on any candidate {username, password} values that they might want to test, plus or minus any additional rate limiting. The blog post says they rely on the Argon2 for the rate limiting.

But I was just able to use this to make 20 guesses against my dad's password (based on his email address) before finding a match, meaning that password has been leaked at some point, and may still be in use. Of course it had my brother's name in it.

If services like this become pervasive, it may be a valid argument for not using per-service usernames, e.g. plus addressing, assuming the canonicalization doesn't strip that out, because it lets attackers use the service to target not just your (username, password) but effectively (site, username, password) directly. However, their dataset is, after all, leaked/cracked passwords, so your creds are already up for grabs at that point, and I'm not sure why attackers would use Google's service versus just building their own database from the available sources.

[1] - https://gist.github.com/Indigo744/e92356282eb808b94d08d9cc6e...

andreareina · on April 4, 2019

What's the benefit of doing it this way instead of how Troy does it?

zaroth · on April 4, 2019

Troy tells you that someone, somewhere, used that same password and it was cracked.

Google tells you that your username password combination specifically was cracked.

One is a yellow caution flag. The other is a 3 alarm fire. Both are useful!

toyg · on April 4, 2019

This post right here is why I still read HN. Thanks!

_b8r0 · on April 4, 2019

If anyone wants to do something like this offline in an AD environment at work, Safepass[1] does some pretty cool things to get better password coverage.

[1] - https://safepass.me/

keyle · on April 4, 2019

Question is, how do you turn this into a profitable outcome without alienating users?

sokoloff · on April 4, 2019

I think in general Google wins the better (more trustworthy and low friction) internet experience of random users is. They can profit from increased internet usage and trust driving additional traffic to paid-search and AdSense sites.

Seems it could be an overall win for users and Google.

wheelerwj · on April 4, 2019

this is a story 1) that has nothing to do with its title, and 2) is about a team who WORKS AT GOOGLE got 600k chrome extension users in the first few week and how they “secure” the app from their own company learning your password.