Hacker News new | past | comments | ask | show | jobs | submit login

Any developer today that is developing an application and isn't using something like Argon2, Bcrypt, or Scrypt should be considering a plan to move away from whatever they're currently using yesterday. There is no reason to be using anything less than those three and continued use is in my mind negligence.

If at all possible you shouldn't be storing passwords to begin with and instead relying on another service for authentication.

This should be the takeaway from this article.




> If at all possible you shouldn't be storing passwords to begin with and instead relying on another service for authentication.

Should we all be using "Login with LinkedIn", then?

Passwords are always difficult to deal with even when using bcrypt. Who knows if bcrypt is still considered secure in 5 years? How long would it take to implement a change which updates the hashing algorithm for new logins while still using the old algorithm for old logins? When should you erase all passwords from inactive who haven't logged in and thus still use the old algorithm. (If you are interested in this problem, Django's user model uses a pretty straight forward and good approach[1]).

Outsourcing them is not the answer. It is a good idea to add that for the user's convenience but I hate it when websites only offer the option to login with "your" favorite social media. But even then, by outsourcing the passwords, you are risking your users' privacy by giving them to Google/Facebook/etc. This even discriminates users' privacy when they are not using Facebook for authenticating because facebook can see that user X visited your website (and sometimes even all URLs from that website you have visited). This is because those "Login with" and "Like" buttons always hit Facebook's and Google's servers with every webpage.

[1]: https://docs.djangoproject.com/en/1.9/topics/auth/passwords/

Edit: Forgot the link, thanks!


> Outsourcing them is not the answer

It very much is, if you're outsourcing to someone who can do it with greater competence than the average team can. Keeping current on the crypto, designing with the ability to sunset algorithms in mind, continuous pen testing, investing in physical security/network security/HSMs/you name it definitely isn't cheap or easy. Unless you're in the business of doing _that_ you're almost certainly better off having someone do it for you.

That said, I'm with you on the social logins front. I have/had? hope for OpenID Connect as an alternative so it would be great if someone neutral like Mozilla jumped on the bandwagon.


You are right, if the company you outsource your task to is actually better then you. In LinkedIn's case, outsourcing was the wrong decision because they used pretty bad "tools". You should also only outsource if you trust the other company to be "competent" and protect your interests. For instance, I would have trusted Mozilla with their OpenID alternative but not Google, Facebook, and LinkedIn (though I'm pretty sure Google knows how to keep the login data safe, I'm more worried about privacy in that case).

In this case, what I would do is to use a framework that makes getting those things wrong hard. Django is a great example for that. They provide you with a generic user model that does password handling for you. They also add a few middlewares by default to protect you against CSRF, click jacking and many more. While django can be really slow, and hard to use when doing something "unsual", you can learn a lot from it. I don't know many frameworks that make security so much easier. In Go, you can do all those things as well but that requires that you are aware of those security measures to use them, which is not ideal for junior developers or "fast moving startups that don't have time to invest in security measure".


Disclaimer: I work for Auth0.

I couldn't agree more with this comment.

Storing username and passwords is really hard. Using a secure hashing alg is trivial compared to other time consuming problems like handling brute force attacks or other kind of anomalies like a distributed attack trying to use a db of leaked email and passwords from other services.


https://en.wikipedia.org/wiki/Pluggable_authentication_modul...

That is pretty much the first thing I do when I inherit a project with authentication. You don't need to make another company your application's doorman, there are a lot of PAM backends that you can run on premises that "do it for you". If you have the competency to manage a LAMP stack - then you can likely handle a well tested and existing authentication server.

All the years in physical security might have broken my brain, because I am always surprised by how willing people are to leak information that doesn't need to be leaked. One project I was pulled into was on the precipice of uploading millions of customer's addresses to Google's geolocation API - had I not been able to bring the lead to his senses I might have made a run for the network closet.


PAM is great, and it's especially great as a layer of indirection, but I can't agree with your overall point that using PAM = problem solved. To your no harder than LAMP point, most teams can't competently manage a security critical LAMP stack. They're in good company given that big companies/governments get pwned with great regularity. Survival requires defense in depth, and that gets expensive. It's a matter of everything from policy (are there two-man change controls on firewall rules, do separate teams own the route tables and firewall, do separate teams develop/audit/deploy security critical code) to hardware (is private key material stored on an HSM, are sensitive services physically isolated, does entropy come from a hardware RNG). Most small companies aren't thinking about those things.

Also, given that the P is for pluggable, what's the backend? You wouldn't use pam_unix for users outside your org. A DB? Now you're back to square one. LDAP+Kerberos/AD? That beats the DB but it doesn't do anything for your defense in depth requirement.


> ...I can't agree with your overall point that using PAM = problem solved.

I don't think we have the same problem definition. I'm saying that it solves the problem of authentication implementation details - where the just-enough-to-be-dangerous types screw up (salting, keyspace, the keeping current on crypto part). LDAP can certainly be leveraged for defense in depth, authorization vs authentication, but that is much less off-the-shelf. This also provides some separation between the authentication server and braindead PHP scripts that barf the results of ";select * from users;".

> Also, given that the P is for pluggable, what's the backend?

Kerberos is the obvious choice for authentication, LDAP integration for authorization if you're needing a fine granularity. You'd really have to go out of your way to end up with a PAM that dumps right into a DB with a poor crypto policy - I've never seen it. You could use /etc/passwd - but you're right, you wouldn't want to... the option is nice though.

I don't disagree that a company that makes money primarily on identity management could do it better, if you assign a low value to the information that is necessarily leaked. But let me just point out the context in which we are having this conversation: LinkedIn offered such a service, as does Facebook - both have suffered breaches. While that isn't how they made their money, plenty of people used the service - following the letter of your advice, if not the spirit of it.


The point of bcrypt though is that it is (more) future proof in that the hash generation is slow as balls and can be made exponentially slower by increasing the difficulty.

Other than some kind of algorithm weakness this slowness will translate through to the brute force attack thus taking longer for an attacker. BCrypt also has something that means it is harder for a GPU to crack it (mutable RAM memory tables or something).

As for migrating passwords - you can use a fallback password hasher that checks against a previous hash should the main one fail - then once the login is successful re-hash with the newest algorithm!


You can't really know how future proof it is, though. As far as I know, nobody has proven it to be unbreakable. Right know we can't break it, we can only brutal force it but maybe tomorrow a mathematician finds some properties to calculate all possible inputs of a certain length for the hash within a reasonable time. Or maybe they find another way that doesn't include brutal force (statistics, ...).

What I'm saying is that passwords are hard in general (to store, to enforce policies properly, ...) and just because bcrypt is "unbreakable" right now, doesn't mean it has to be in 5 years.

Your last paragraph describes nicely what needs to be done, but is your code ready for that? Maybe your database password/salt column only allows X characters, now you need to rebuild the database. Maybe something else expects the passwords to have that specific format (another micro service, some script, ...). Re-hashing a password can be hard if this possibility was not considered from day one.


> How long would it take to implement a change which updates the hashing algorithm for new logins while still using the old algorithm for old logins?

As long as you remember to store the cost parameter along with each hash it's just a matter of increasing the default cost and reusing the old ones.


I was talking about the algorithm. Let's say bcrypt is considered insecure in 5 years (even with 100 iterations) and you are supposed to use another hash algorithm. Is your code and database able to handle that? If so, great but I don't think many people implement this and depending on your code, this can get difficult.


> Django's user model uses a pretty straight forward and good approach[1]

You forgot to post the link.


Not OP, but here's the official Django doc on the topic, including a section further down about upgrading the hash without needing a login:

https://docs.djangoproject.com/en/1.9/topics/auth/passwords/...

Here's a blog post which covers the same topic in an easy-to-understand form, including why computationally-expensive password hashing is important:

http://tech.marksblogg.com/passwords-in-django.html


> ...instead relying on another service for authentication.

If you are suggesting something like on-premises Kerberos, I agree - better yet: just design with PAM in mind. But if you are suggesting something like "login with facebook" then I have to disagree - unless your goal is to add a serious point of failure, major dependency, and huge source of information leakage all in one step.


Unsalted hashes in 2012 was negligence already...


I complained to my bank that their 12 character password limit suggests they are storing passwords. Their reply was little more than don't worry about it, you aren't responsible for fraud. I asked for them to add some kind of second factor authentication (I'm a fan of TOTP systems) and was told they are thinking about making that available for their business accounts.

It bothers me that my most valuable login is probably my weakest.


I'm glad they fixed this, but until relatively recently (last year), Charles Schwab had the following password requirements:

* between 6 and 8 characters

* alphanumeric

* no symbols

* case-insensitive

[1] is a nice writeup of exactly how broken this was until they changed it recently.

[1] - http://www.jeremytunnell.com/posts/swab-password-policies-an...


> If at all possible you shouldn't be storing passwords to begin with and instead relying on another service for authentication

This is wrong. Using another login service might introduce all sorts of new issues, a big one is privacy.


> There is no reason to be using anything less than [Argon2, bcrypt or scrypt] and continued use is in my mind negligence.

PBKDF2 is fine too.


How is this better than double salted sha256. I.e sha256("secret", sha256("secret2", $pwd))

The double sha/md5 would even give rainbow tables a super hard time right?


> How is this better than double salted sha256. I.e sha256("secret", sha256("secret2", $pwd))

ITYM hmac-sha256("secret", hmac-sha256("secret2", $pwd)), but that's neither here nor there; the operation requires either two (your version) or four (my HMAC version) SHA256 operations, which really isn't much: your version would make an exhaustive password search twice as expensive; mine would make it four times as expensive. Neither is very much.

Note too that salts are not secret; they must be stored with the password.

> The double sha/md5 would even give rainbow tables a super hard time right?

If you're using even a single high-entropy salt, a rainbow table is useless. What's not useless is just trying lots and lots and lots of potential passwords: first sha256("salt", sha256("salt2", "12345")), then sha256("salt", sha256("salt2", "hunter2")), then sha256("salt", sha256("salt2", "opensesame")), then sha256("salt", sha256("salt2", "love")) and on and on and on.

'But that will take forever!' you might cry. Not really: the article noted a 209.7 million hashes per second rate. At that rate, one could try every possible date in a century (36,525 possibilities: lots of people use birthdays or anniversaries in passwords) in U.S., British, German, French date formats (x4) with the top 100 male and female names prepended and appended (x400: 100 each, before & after), with and without spaces between (x2) in approximately half a second. If one adopted your approach, it'd slow it down to just over a second; under my approach, it'd be 2¼ seconds. Not very good.

PBKDF2, bcrypt, scrypt & Argon2 all attempt to slow this down by not requiring one or two or four hash operations, but rather by requiring thousands or hundreds of thousands of operations. scrypt goes even further by requiring lots of memory access, since memory access is slow on GPUs while just hashing is very expensive. Argon2 goes even further still.

Under any of the above, it might require a full second on even a GPU cluster to verify a password, which would mean that it'd take 3 years and 8 months to try all of those possibilities earlier. While a real-world system wouldn't be tuned to take a full second on a GPU cluster (it'd be slower on a server), it might very well be tuned to take say 10 or 100 milliseconds, which is still relatively slow for someone trying every single hash but relatively fast for validating a user login.


PBKDF2 is not for password storage.


PBKDF2 is absolutely for password storage. In fact RFC 2898 specifically notes that use case (for KDFs in general):

> Another approach to password-based cryptography is to construct key derivation techniques that are relatively expensive, thereby increasing the cost of exhaustive search.


Do you have a source for that? It was my understanding that PBKDF2 is 'good enough for now', but not necessarily the most future-proof of techs, given how easily the algorithm is optimised for GFX cards.


Most of the attacks described in this article are not solved by any of those though right? They protect against hacking one person but if you just do these advanced dictionary attacks you can still crack people with weak passwords.

Maybe I'm missing something?


No. You cannot effectively use the techniques you can use against SHA/MD5 to attack the three I mentioned.

SHA and MD5 can be calculated entirely in a CPU's registers without having to rely on RAM. Bcrypt for example requires the use of a matrix as part of its calculation, slowing down the process. A GPU has so few channels from the processor to memory that it cannot be effectively done in parallel.

All one has to do with Bcrypt is just adjust the difficulty and any advances in GPU technology or whatever can be nullified.


Thanks. I thought I saw an article about a recent leak with salted bcrypted passwords where they cracked weak passwords. The conclusion was there is no way to prevent a weak password from being compromised so to prevent it require longer more complicated passwords.


A weak password is a weak password no matter how good the hashing is. I think that you're referring to this bit from last year:

http://www.pxdojo.net/2015/08/what-i-learned-from-cracking-4...

The author of this piece was doing about 156 hashes per second and after just over five days, he had only gone through and cracked 4,000 account passwords--we're talking 0.0001% of Ashley Madison's supposed userbase here. To run just over the 14,000,000 passwords from RockYou.txt on every single user account from AM, it would take up to at least two billion years.


Weak passwords are still weak, but things like bcrypt buy you time. A bcrypt configuration might have hashing take 250ms on a server's Xeon CPU (as opposed to nanoseconds with a sha-based hash). You can try the password 'password' on say 50k hashes that were dumped, but it will take about 3.5 hours. You can get this time down by parallelizing or with better hardware like GPUs / FPGAs / ASICs, but you're still looking at a base time of 3.5 hours per attempt to go through each account. Or if you're only targeting one account, 3.5 hours to try 50k words.

If the server had bcrypt configured to take a second per password, multiple everything by 4. And so on. What I think is needed as a supplement to "use bcrypt / scrypt" is a "and use it this way so you don't accidentally open your server to DoSing or give a poor experience", because a 10 seconds-to-compute-on-a-Xeon hash is great from a security standpoint if your hashes get leaked, it sucks from a user experience to have to wait at least 10 seconds to login, and if you have to service multiple logins at once your server's not going to be able to do anything else if you just use bcrypt/scrypt synchronously.


If the server had bcrypt configured to take a second per password, multiple everything by 4. And so on. What I think is needed as a supplement to "use bcrypt / scrypt" is a "and use it this way so you don't accidentally open your server to DoSing or give a poor experience", because a 10 seconds-to-compute-on-a-Xeon hash is great from a security standpoint if your hashes get leaked, it sucks from a user experience to have to wait at least 10 seconds to login, and if you have to service multiple logins at once your server's not going to be able to do anything else if you just use bcrypt/scrypt synchronously.

If you're in a situation where you're needing to rely on hashing a user's password for every action, your application has far worse problems than what password storage method is in play. Moving from your current password storage method that is inadequate to one that would be better also takes into account that you haven't done something completely wrong with session states.

[edit]

I misread the poster I was responding to assuming that they meant that the user was re-authenticating on each request. My thought process was that if they're storing credentials in a cookie in lieu of a session ID then that needed to be addressed first before even going down the avenue of correcting password storage.


I wasn't talking about verifying a password for every action, but simply having more than one user of your service at once.


They're not "solved", but they're made 3 to 6 orders of magnitude more effort:

See https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a27...

"8x Nvidia GTX 1080 Hashcat Benchmarks"

TL;DR:

Hashtype: MD5 Speed.Dev.#.: 200.3 GH/s

Hashtype: SHA1 Speed.Dev.#.: 68771.0 MH/s

Hashtype: bcrypt, Blowfish(OpenBSD) Speed.Dev.#.: 105.7 kH/s

Hashtype: scrypt Speed.Dev.#.: 3493.6 kH/s

Hashtype: PBKDF2-HMAC-SHA512 Speed.Dev.#.: 3450.1 kH/s

Hashtype: PBKDF2-HMAC-MD5 Speed.Dev.#.: 59296.5 kH/s

You can't protect against people using "password123" or "dadada" as their passwords, but for those of us using long randomly generated passwords bcrypt makes cracking it well outside the realms of possibility for anyone short of nation-state attackers. (I bet the NSA can get quite a lot more than 100kH/s for bcrypt if they're determined enough, but I wonder if even _they_ can throw 6 orders of magnitude more compute at the task?)


Some of these don't make sense. The point of bcrypt, scrypt, and pbkdf2 is the difficulty of them is configurable.

Digging into the comments on that gist, it says that the bcrypt benchmark used a workfactor of 5 (= 32 rounds). The lowest possible bcrypt workfactor is 4 (= 16 rounds).

For comparison, the default for the login hashes on OpenBSD is 8 (= 256 rounds) for standard accounts and 9 (= 512 rounds) for the root account, and OpenBSD's bcrypt_pbkdf(3) function uses a workfactor of 6 (= 64 rounds) and that's intended to be called with multiple rounds itself eg. in signify(1) it uses 42 rounds (a bit over the equivalent of bcrypt with a workfactor of 11) and ssh-keygen(1) defaults to 16 (roughly equivalent to bcrypt with a workfactor of 10 (= 1024 rounds).

The point I'm trying to make here is that bcrypt benchmark uses a ridiculously low workfactor, and it looks like the scrypt and pbkdf2 ones did to.


Yep - and even in it's "ridiculously low work factor" configuration, it's around 6 orders of magnitude slower than SHA. In context, the author of that piece was trying to show off the hashrate, so would be expected to err on the side of making his monster 8 gpu rig look better rather than worse.

Any of bcrypt, scrypt, or PBKDF2 can easily (as in, by design in the hash function parameters) be made however much slower is needed for you (so long as your normal login process is then still "fast enough", I had a WordPress site a while back where I couldn't wind the bcrypt plugin up much past 11 before the inexpensive webhosting would time out before login succeeded... (And yeah, feel free to mock me for "securing" WP with bcrypt - it was mostly because I wanted to be confident if/when the site got exploited, the hashes in the DB weren't going to be too easily attackable for anyone who'd used a decent password))


bcrypt uses unique salts per-password. The hash outputted by the bcrypt function is actually several delimited fields that have been joined into a single string.

This means that if multiple users use a common password like 'pass123', the hashes stored in the database will still each be unique. Any attacker trying to reverse all of the password hashes will have to reverse every single one individually in a targeted manner rather than using pre-computed rainbow tables or generating hashes from a wordlist and scanning the database for matching hashes.


What are the advantages of bcrypt compared to SHA based hashes with unique salts?


General-purpose cryptographic hash functions like the (now-broken) MD5, SHA1, SHA256, etc. are designed to be computationally easy, ie. fast.

Salting protects against rainbow tables [1], but it doesn't change the fact that computing a SHA256 hash is fast.

Password hash functions like PBKDF2, bcrypt, scrypt, Argon2 are designed to be computationally expensive, to make a password-cracking endeavor take even longer.

Argon2, the winner of the Password Hashing Competition and the current state-of-the-art, for example, has two ready-made variants: Argon2d is more resistant to GPU cracking, while Argon2i is more resistant to time-memory tradeoff attacks [2].

[1] https://en.wikipedia.org/wiki/Rainbow_table

[2] https://github.com/p-h-c/phc-winner-argon2


bcrypt is significantly slower to compute. Something like 5 or 6 orders of magnitude slower. (se my other comment in here with numbers for cracking various hash types on an 8gpu rig...)


Can I achieve the same by applying SHA x times?


If X is millions (or even billions), maybe, but you shouldn't. Just use one of the real password algorithms. Never ever roll your own hashing system assuming it's secure enough. It won't be.


Of course. I just wanted to get an idea of the reasons without going too much into mathematical details. For projects I would just use argon 2 or bcrypt.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: