Hacker News new | past | comments | ask | show | jobs | submit login
A New, Simple Way to Salt your Hashes (codespatter.com)
22 points by pyroman on Aug 4, 2008 | hide | past | favorite | 69 comments



Don't do this. This is no better than using a single fixed salt for an entire site: An attacker who has access to your password file can iterate through possible passwords and for each password perform one computation to check if that password matches any of your users.

The whole point of salting is to reduce an attacker to checking one possible user/password pair at once. You don't get this benefit if you use the method described in this article.

The only attacker I can see this defending against is an attacker who has a precomputed table of password MD5s and doesn't have (hasn't bothered to create) a precomputed table of password-repeated-twice MD5s. So technically the method described is slightly more secure than simply storing (user, MD5(password)) pairs, since it avoids the "google to reverse MD5 hashes" pseudo-attack. But if you're going that low, you're pretty much lost already.

Don't do this.


... and, I assume you'd agree, if your password storage mechanism involves a single ultra-fast SHA1 or MD5 hash, salts are deck chairs on the Titanic anyways.


I'm not sure I agree, or else I'm missing your point.

Assuming a scenario of using a single MD5 hash action on a user password, and a user with a password of "password1", no salt means that the simplest dictionary + number rainbow attack will find your password early on.

Add in a randomly generated multi character salt, and now you'd need a 100% coverage 12 or 16+ character/byte rainbow file, which even with a single MD5 sum hash is I believe still well beyond the abilities of a modern comp farm.

So while I don't recommend using MD5 anymore for password hashing, a randomized multi-character salt still makes the difference between easily hackable, and virtually impossible to extract the password.

Or am I missing something?


Or am I missing something?

Yes: Users pick stupid passwords; and each user's salt is stored along with the hashed password.

Suppose someone has a password of "a" and you get access to a password file containing (user, salt, MD5(salt || password)) tuples. You start with single-letter passwords, and you promptly find this user's password. The salt meant that you had to check if each user's password was "a" individually instead of checking them all at once -- but if the password function is fast enough and a user's password is simple enough, you'll break it anyway.


Sure, but if someone has a password of 'tim5RaWk5', you'll waste a year trying to crack that one account.

With out any salt you could have 30-50% of all the accounts cracked using an existing rainbow table in about an hour.

Also, most sites require minimum password lengths (while this does shrink the overall attack space, it dramatically increases the minimum attack space per password). So brute forcing a salted password of 'a' seems like an unreasonable example. I agree, salted or unsalted, a password of 'a' is a bad password. The differnce is when it's a salted or unsalted password of 'my1pass' or 'T1mmy' much less an actually secure/random/long password.


The space of passwords consisting of two english-y words, possibly encoded in one of several elitespeak alphabets, seperated by a non-alphabetic character? Still much smaller than the 2^56 you get from fully random isprint() passwords.

Again: we can go back and forth about how likely it is I will crack your excellent passwords, and I hear you, but from 1991 through ~2004, iterative password cracking was the only way it was done, and it was and is freakishly effective.

Salts don't do anything at all to slow down iterative cracking tools. bcrypt kills them completely.


Yes. You're missing the attack that actually happens in the real world:

aardvark<salt> ac23b37db0039dda62896bb21f312755

abandon<salt> 5877f7f5836225442bf103c2e7d41a3f

abdominal<salt> 93e399d54bcf903ce686122118a5f52f

etc etc etc etc

The problem is that MD5 is lightning fast, as is SHA1 and even, for this application, SHA256. Unix password schemes have been designed for decades with the goal of not being as fast as message digest functions, so that attackers can't burn through entire dictionaries on the fly in reasonable amounts of time.


Wait. You're suggesting that a fast algorithm allows a real-time comprehensive dictionary + variant attack with modern hardware?

My understanding is that a Xeon 3 GHz proc can do about 4 million hashes per second, and a quad core intel core2 can do about 4.3 million hashes per second, so for a potentially mixed case, alpha numeric password of 8 chars, that's at least 2 years. Figure that you'll find it by the midway point, that's 1 year of CPU burnt on a single account.

I'm not sure what you mean by the attack that happens in the real world. Do you mean instead of doing lookups against a rainbow table, people just brute force the password with billions of login attempts? Or something else?

I know people who have used rainbow tables for attacks in the real world using either password files that weren't protected, or db table backup dumps they gained access to. Those attacks would have been useless with any sort of salt. Without a salt, they were trivial (in my personal experience with large corporate databases, 2-3 out of ten m5d hashed passwords can be found by Googling for the hash (and hence finding it in someone's rainbow table).

Can you further explain what you mean by a real world attack, and how a salt doesn't help?


http://majuric.org/software/cudamd5/

According to the calculations on the above site you could exhaust the 8 character key space in 16 days.


The part you're missing: /usr/share/dict/words is 234,000 (thousand, with a 't') "hashes" long.


1) That's only useful if the password for the account you're trying to hack uses a word in dict/words. Obviously attack spaces can be reduced by rules and whatnot, but I'm not sure that's really the part I'm missing.

2) The "thousand, with a 't'" remark seems unnecessarily snarky. I don't have thin skin here, but from my end I've been trying to communicate my understandings of password attacks, my confusion, and either educate or learn in the process.

3) I will certainly look at bcrypt for future projects. It looks interesting. I still think salting hashes is a good idea. I don't agree that the impact of salting is so minor as to be compared to deck chairs on the titanic. For corporations which have a standard/requirement to use a SHA (or god forbid MD5) hashing algorithm, hashing is a VERY VERY VERY good idea.


(1) The point is, if you're considering "average length 8 character passwords", there's a temptation to assume you're working with 2^56 bits of entropy. In reality, you're getting nothing even close to that; only a tiny minority of users use truly random passwords. If you're not using random passwords, the speed of your hash starts to matter a lot.

(2) Sorry. I meant to sound emphatic, not nasty.

(3) These are all hashing schemes; I'm arguing against using blazingly-fast hashing for password storage. If you're stuck with SHA, Colin's right; use a "stretched" version that iterates several thousand times (just re-hash the hash in a loop), and use a 32 bit nonce (err salt), generated on the fly for each user every time their password is changed, stored in an integer column in the user table.


Now you're making things complicated. :-)

You're absolutely right that a slow password hash should be used -- I left that part out because I wanted to make the non-saltiness of the "salt" clear without confusing people by talking about all the other things that they could do wrong.


Something else I think you'll probably agree with, even though it sounds hyperbolic: if this stuff sounds complicated, it's because it's supposed to be complicated.

There's an "easy" answer, which is simply "don't hack this stuff up yourself". The free options here are strong. Why waste your time with trivia?


Doesn't having a hash function that takes, say, 100x as long to compute as md5, only increase the cracking time by a factor of 100? Is that all there is to it?


Is that all there is to it?

Yes, but typically password hashes are constructed to be 1000x slower or more. A common construction is to iterate the hash, e.g., to store MD5(MD5(MD5(...(MD5(salt || password)...))) -- this allows you to trade password verification performance against password cracking time however you like, just by varying the number of times you iterate the hash.


Well take the case of bcrypt and md5. Assuming bcrypt takes about 300 times as long^ as md5, then that would be the difference between cracking a password in a day versus a year.

You'd need someone very patient to wait a whole year for your reddit password. And if you change your password every 6-12 months anyway, they are most likely screwed.

^ I picked 300 out of a hat, I think that bcrypt is usually 2-3 orders of magnitude slower, but no reason you can't recursively apply it to stretch it out even further.


The time it takes bcrypt to verify one password is a tunable parameter; that's the fundamental feature of the algorithm. If you want a password check to take one full second, you can do that. A dictionary attack against one six-character password on that system, even offline against a captured hash, would take years.


I don't say this enough, this is why I love this forum!! Thanks, Colin!


Can anyone take a stab at explaining the fetish webdevs have for password salt schemes? I don't get it. Colin will probably agree with me: people who build authentication systems don't spend a lot of time thinking about salts. Of course, most of them wouldn't waste their time redesigning a password storage system; they'd use Mazieres/Provos "bcrypt" and be done with it.

If an article like this makes even a couple of neurons in your head light up, you have better things to do than reinvent the password storage wheel. Use somebody else's good system. This is almost security 101.


Colin will probably agree with me: people who build authentication systems don't spend a lot of time thinking about salts.

I agree, in the sense that people don't spend a lot of time thinking about breathing. We just do it. Grab 256 bits from /dev/random (or from your internal entropy pool, if you have one) and stick a salt or nonce onto password you hash and every protocol packet you send.

Of course, most of them wouldn't waste their time redesigning a password storage system; they'd use Mazieres/Provos "bcrypt" and be done with it.

For a straight-forward "users logging into a website", absolutely. But most people who design authentication systems have more stringent requirements, like "allow users to authenticate themselves in such a way that a bogus server can't steal their credentials", at which point things get a bit more interesting. :-)


It would be great if we could just give a consistent, clear recommendation for people to use bcrypt here. You're a BSD person, I assume you like Mazieres and Niels as much as I do. MD5(MD5(MD5(etc))) is still pretty silly. Why hack? Do it right.


MD5(MD5(MD5(etc))) is still pretty silly. Why hack? Do it right.

I wasn't suggesting that people should use iterated MD5 has a password hash. I was giving it as an example of how a password hash can be made more brute-force-resistant by doing extra work.

It would be great if we could just give a consistent, clear recommendation for people to use bcrypt here.

I'm not arguing with that. Hey everybody, do what Thomas says and use bcrypt!

But -- maybe due to my background in academia -- I think it's really important for people to understand why the authentication schemes they keep on inventing are bad. You're doing a good job of jumping in and telling everybody to use bcrypt -- so I'm taking care of explaining the cryptography so that people will understand what they're doing wrong and what bcrypt does right.


You're also stronger on crypto than I am. The background is neat, but we probably don't want to fall into the trap that Schneier did, documenting a lot of trivia and failing to provide clear explanations, so that we wind up with applications that use CAST and Interlock instead of TLS. =)


Probably because salting is one of the easier crypto ideas to grok and one of the safer ones to get wrong? Salting is just a constant factor optimization (assuming password length is bounded and short). It's designed to remove the opportunity to do a dictionary lookup and force the attacker to actually compute the password hash. But it doesn't change the algorithmic time of the attack at all. So it's "safe" to pontificate about.

At worst, you're going to make an attack available only to a person with a botted cluster of machines accessible to a single guy with a laptop. Meh.


For all you ruby/rails nuts out there, ruby already has a good gem that makes using bcrypt easy:

http://bcrypt-ruby.rubyforge.org/

It even handles the salting for you, so you don't have to think about it.


I've seen the misconception that a salt is a fixed string before, I don't know where people get that from. You don't pick a salt and use it site-wide, you pick a different salt for each thing you're encrypting, so the standard salting mechanism is more secure than this


Early on, people used fixed string salts. Some people learned and advanced.


Can you cite this? I think you're making it up. It makes no sense for people early on to have misunderstood salts so badly. That kind of misunderstanding takes years to develop.


I am mistaken. (Faulty memory.) The early Unix salts were random, though they were only two characters.


To repeat the comment I made on the site:

The problem with this is you have created a pattern which could (and probably does) introduce a cryptographic weakness. If someone knew or could guess you were using this tactic then they might be able to exploit it.

While I don’t have an example of a specific weakness to MD5 to hand one of the basic rules of exploiting algorithms is knowing some of the source material or knowing about patterns within it.

The point of using a salt is to use a piece of unknown material which is unique in each string. It’s common to generate random junk to use rather than anything meaningful.

Personally I always take "new insight" into security tactics with a pinch of salt until they have been hammered on by a few people.


This for some reason struck me as a Homer Simpson moment - where he suggests every car come with one of those ribbons on the antenna so you can find it easier!


That's really a great analogy.


I use a unique random salt when hashing passwords in a db.

This is combined with another salt I keep in my codebase.

If the data in the db gets out, they still do not have salt stored in my codebase (unless the entire server has been compromised).

Right or wrong? Any thoughts?


Definitely better than what I was suggesting on the site.


So the unique salt has to be transmitted from the server to the client in order to do authentication? Why not just use a hash of the username?


why would you need to transmit the unique salt? why would you transmit anything other than the form to the client? all the checks would be done on the server side.

a) user enters username/password

b) server looks up in a file the unique string and appends it to the password

c) server finds username's salted-password hash

d) server extracts hash from stored salted-password hash (possibly the first 4 characters or whatever)

e) server concatenates salt, plaintext password and unique string and hashes via sha-N/md5

f) server compares to stored salted-password and if same performs login. otherwise sends generic "invalid username/password combination" message.


That's exactly what I'm doing.

I think stcredzero is concerned about the plaintext password passing through the pipes.

In this case, unique salts and hashes are used to simply protect the database; when the data is stolen, a rainbow attack will not effectively work.

Hashing those passwords also protects the user who uses the same email/id/password everywhere; if your db is compromised, you don't also compromise the user's identity (e-mail accounts, banking, etc).

It depends on the app, but I also think sending salts to the client and doing some md5/javascript in the browser is generally overkill.


SSL certs are like $15/year. Not sure why people don't just get them instead of playing with md5/javascript...


Could not agree any more strongly about this. You still need to use bcrypt to store passwords, because you will eventually screw up Unicode and SQL on a complicated form somewhere and give up your whole password database. But if you're willing to spend ever a quarter person/day on password auth, there is no economy at all to skipping SSL.


You can't rely entirely on SSL. You're still vulnerable to man in the middle attacks, among other things.


No, the whole point of SSL is that it is not vulnerable to MITM attacks, because the client carries anchor keys.


If you're connecting to a website that looks the same, but the URL is slightly different, it won't help. What's more, the URL doesn't have to be different for the entire user's session.


If you want to call that a "man in the middle attack", I won't argue, but as far as I'm concerned, that's a UI problem.


Could you explain further what sort of MITM attack you mean for which SSL doesn't help? (Are you assuming users who don't insist on an SSL connection with a recognized domain?)


DNS poisoning enables a MITM attack even if you're connecting to your safe, legitimate domain.


The certificate needs to match the domain and bear a signature chain that traces back to your root certificate store. SSL was designed to assume that the DNS is totally insecure, which of course it always has been.


SSL throws all sorts of warnings if the cert doesn't match the domain.


The problem with Man in The Middle attack is that it's a fundamental problem. It's on the same level as the Two Generals problem. In a way it's a variation of it. The only way to deal with it is to have a 2nd channel. Everyone who has to deal with security would be well advised to actually study this, and not blithely wave it away:

http://www.webmasterworld.com/website_technology/3711575.htm

Read the whole thread and learn. SSL can only protect savvy users. Depending on how sensitive your data is, this can still be a problem.

Combining this with DNS poisoning would be particularly bad.


This isn't true at all. Mutual key agreement between two unrelated parties talking for the first time is hard problem. SSL gets rid of that problem by having every participant in the system rendezvous with 10+ CA's before they talk to anyone.

The SSL problem is mutual key agreement between two related parties communicating for the first time, and it's an easy one to solve.

There is a really persistant meme that SSL breaks when the DNS breaks, because all that happens when your certificate doesn't match or verify is that you get a warning. That warning says SSL isn't working anymore. You're not supposed to click through it. Real applications that use SSL under the hood don't pop up warnings: they freak out and quit.


"There is a really persistant meme that SSL breaks when the DNS breaks, because all that happens when your certificate doesn't match or verify is that you get a warning. That warning says SSL isn't working anymore. You're not supposed to click through it."

I didn't know it was this bad, Browsers should "freak out" and totally refuse to proceed with the page then.


The problem with this is that SSL certificates can become nonverifiable through innocuous circumstances --- for instance, by expiring, or by moving. Most providers and most users would not accept a hard failure in this case.

And there you have one of the biggest problems with DNSSEC --- without a massive software revamp, there's no "soft" failure mode. gethostbyname() doesn't have a warning channel.


OK, you agree SSL does protect savvy users from MITM attacks.

You're talking about users who can be tricked into what are essentially non-SSL or degraded SSL (certificates from untrustworthy authorities) sessions, because they misread the URL bar or ignore warnings. That's a legitimate concern, most users are easily tricked, but that's not a vulnerability of SSL itself.


My assertion is that you can't just use SSL and leave it at that. In that case, you're only going to protect savvy users, and not even all of those.


That may or may not be true, but what is certain is that additional mucking around with salts, Javascript crypto, and protocols isn't going to improve the story you get with HTTPS.


a real attack might be like this

find rainbow table with a char space of

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQURSTUVWXYZ0123456789!@#$%^&*()_+=-';:"`

you now have a working hash map of 27-80gb of data.

upon looking through the user/pass table you see that the user/pass table has a salt key for each record. at this point you know that some portion of the passwords will be reversible with the salt being a confirmation of finding the correct password.

i know some of the rainbow tables out there are really fast at looking up a reverse md5 or sha1.

pass look up in rainbow table you will at least find a collision, and if you are lucky, you will find the pass concatenated to the salt entry in the table.

so if a hacker has your user/pass table, yikes.... ;-/ you are better off restricting read/write privileges to separate services. i would hate for a hacker to connect to your DB and to write their own salt and pass.

we were working on this for one of the defcon games last year.

NEVER REPLACE OBFUSCATION FOR REAL SECURITY.

fighting rainbow tables requires larger hash functions that generate larger hashspaces that are less likely to fit on a single machine. i still wonder how long it will take for a proper cluster of GPUs to virtualize the hash space of a rainbow table into computational power.


Thought exercise:

Then why not salt your hashes using some kind of lookup table on the password? With a bigger lookup table, you have a better chance of having a unique salt for most of your users. In fact, the bigger the table the better. But that takes up space, so why not use a function? But what kind of function? It should be cryptographically secure. Wait, I know, a hash function would be perfect!

Basically, this is just adding another gimpy home-grown round to your hash function. It will make the attacker's job slightly harder, but as others have pointed out, you can still match any password from the file.

I conclude that the best thing to do is to use a hash of the username as the password's salt.


OK, I don't know that much about cryptography and salting (I let users of my scripts set a long 'pass-phrase' which gets used as the salt), but surely using a hash of the username would be no better than the method discussed in the post?

Given that usernames are public - you know, for instance, that mine is jem - what level of security would that provide?


Using a known salt isn't great, but it still makes extracting the password orders of magnitude more difficult.

Hashed passwords are typically attacked using a rainbow file. Basically you use a dictionary file to hash words from the dictionary file and store the plaintext word and the resulting hash together in a file/db/whatever. After you've hashed every word in the dictionary, you start doing combinations: wordword, wordnumber, wordnumberword, etc... Of course each time your add another element, you're exponentially increasing the time it takes to generate your rainbow file.

Using a multi-character hash is basically adding another element to the password on behalf of the end user. Basically if we say it takes 100 hours (totally made up number, I suspect it's much higher-depending on password rules) to generate a comprehensive enough rainbow file to find an average password, if you add a 1 character alpha numeric salt, it now takes 3,600 hours to find that password. If you add 2 characters, it now takes 129,600 hours. And so on.

Using a public username (and having your attackers know that's what you're using) means that you have to generate a full rainbow file for EACH user. So while it's not as good as a decent sized random salt, you can no longer take 100 hours of work and hope to extract 8,000 passwords from a site's password db/file. You have to do 100 hours of work PER user, and still have a 30-50% success rate. So to get the same 8,000 passwords you'd need to do 1,600,000 hours of work (assuming a 50% hit rate).

So assuming you personally are not a specific target, the site itself is numberOfUsers* more time consuming to extract X passwords from, than a site that just uses MD5 with no salt.


Hashed passwords are not normally attacked with rainbow tables. This is a huge misconception. The overwhelming majority of password attacks are done with iterative password crackers. "Rainbow table" crackers are a recent phenomenon, but password crackers are one of the oldest and most effective tricks in the book.


How does an iterative password cracker work?


The salt is "439a". The MD5 hash of a<439a> is "57ec1dc4757768c9961ec5a46eb2a36f". Did that match? No? The MD5 hash of aa<439a> is "91b0655c6ab36ee9fc9fea0bda7f78ab". Did that match? No? The MD5 hash of aal<439a> is "9c9e018c05d4d7a36b31eb17e3b71c7d". Did that match? Yes? You win.

The point is, MD5 is extremely fast. Speed is one of the top 3 design goals of MD5, SHA1, and SHA256.


As I mention in my other post, with current hardware, my understanding is that that type of attack would take on average one year per account.

Obviously strict password rules (minimum length, etc...) shrink the attack space dramatically, but it's still a long time per hash/account. Given the number of sites/applications out there that don't use salts, and hence ARE vulnerable to a 1 hour attack using an existing rainbow file, I'm not sure why a cracker (who isn't targeting your site/app over every other site/app specifically) would bother.


You might direct that question to Alec Muffett, who wrote Crack, and Solar Designer, who wrote John the Ripper, both of which have cracked salted hash passwords for over a decade.


How about using 2048 bit RSA as a hash function? That's going to be a lot slower.


Using RSA directly as a block transform is extremely dangerous; the thing you have to remember about RSA is that, more than most other crypto operations, RSA is just a simple math problem. You are vanishingly unlikely to use RSA directly in your code without introducing a vulnerability.

That said, two responses:

(1) Yes, MD5+RSA is slower (and thus better) than just MD5.

(2) But we're quibbling, because bcrypt is tunably slower, designed specifically for this problem, free, and available for most dev environments.


Source of confusion: bcrypt is also the name for a utility using Blowfish. bcrypt-ruby looks pretty cool.


Here's what I do:

$password = hash_hmac(sha256,$_POST['password'],"pound_on_your_keyboard_here");

256 bits of hashed goodness. Goodluck breaking that with a rainbow table.


While on the topic of password best practices, are there any characters which should not be allowed in passwords?

I've always assumed anything is safe since it's getting hashed anyway before hitting the db, but this got me wondering.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: