Beat hackers at their own game: Part 2
Marty Weiner | Pinterest engineering manager, BlackOps
Most people are surprisingly uneducated on how to store passwords.
Let’s do some engineering. What are the requirements for a good password storage system? I propose these simple two:
- 1. Users should be able to authenticate or create a new account by providing a unique identifier (e.g., email) and a plaintext password.
- 2. Password recovery difficulty should be maximized.
A non-requirement — you should never need to recover the exact password a user used. The most common reason cited for password recovery is being able to email that password to the user if they lose their password. Many sites do this, and it’s bad, bad, bad. Any time I see a site that can send me my password, I KNOW they’re storing their passwords weakly (and so do the bad guys). Refactor your account recovery system to create a new random password and send that to the user.
Looking at the last several major leaks, here are a few common ways passwords are stored:
Wow. This makes me cry. This means that the passwords are stored with absolutely no encryption / hashing / anything (at least make it sporting for the hacker!). Yahoo! Voices was guilty of this sin.
I believe storing plaintext passwords may violate requirement two. Let’s stop here and cry in our beers for a few minutes that this is ever done.
2. 1-way cryptographical hash
A 1-way cryptographical password hash is intended to convert your password to a long string of characters that is very hard, if not impossible, to reverse. So, if a hijacker gets the hash of your password, hopefully they’ll never be able to recover it. Examples include MD5 and SHA1 and look like this:
MD5 (“123456") = e10adc3949ba59abbe56e057f20f883e
MD5 (“seinfeld”) = 9fab36cc63eac0a9951edd4e6a6ac6f8
SHA1 (“123456”) = 7c4a8d09ca3762af61e59520943dc26494f8941b
This is the most common way passwords are stored incorrectly. LinkedIn was guilty of this sin. Hell, Microsoft got it wrong with their ubiquitous NT hash and LM hash.
So why is this a bad way to store passwords? Here’s where the fun begins.
Have you heard of rainbow tables? If so, you’re golden, skip this paragraph. If not, read on. A rainbow table is a very large hash table of password hashes back to the unhashed password (smarty description here, one I can read here). With a rainbow table, I can recover any one SHA1 password in picoseconds. It turns out you can construct a rainbow table for all alphanumeric passwords of nine characters or less and store it in just 864GB in a few days (with this). Or, you can go download one from www.freerainbowtables.com (no, really!). With such a table and modern EC2 offerings (see r3.8xlarge on this table), you should be able to hit 50 to 100 million passwords converted per second. How many users does your site have?
3. 1-way cryptographical hash with salt
Do you avoid salt in real life? Here’s some light reading that might be important to your health.
If your password storage is also low in salt, read on.
Password salting is the practice of adding an extra bit of data to the password before you protect it. For instance:
MD5 (“seinfeld|81837210385") = 864069725cb607a13f097ef623f7eb75
MD5 (“seinfeld|93982716363") = 2d12282c267774a57368f6470b16650f
Then, you store the hash and the salt in the password field. Given a decently large salt that’s different for every account, you cannot reasonably make a rainbow table to store all salted hashed passwords. Great!
But, by definition cryptographical hashes are FAST to compute. There’s a reason for this. They were never meant for security. They were meant for error detection in transitions and storage. If you transmit a 4GB file and even 1 bit changes, the MD5 of the file changes vastly:
MD5 (“seinfeld”) = 9fab36cc63eac0a9951edd4e6a6ac6f8
MD5 (“Seinfeld”) = 5b01e11129355ca058d358dfcae3ce2c
Great for error detection, but the speed of these hashes is so fast, you can create a rainbow table in hours to days. A rainbow table of all seven or fewer character passwords takes minutes to compute. If your cryptographical hashed and salted set of passwords gets out, your overall user base is still very at risk, especially your high-risk targets like celebrities or employees.
4. 1-way slow hash with salt
Let’s make computing the hash slow! I swear, this is the end of the ride.
With salt, making a globally useful rainbow table is damn near impossible. Using a slow hash makes creating a rainbow table for any one user exceptionally difficult or brute force crunching nearly impossible. Like years or decades or millennia impossible. Even better, as computers get stronger, these methods include a difficulty amount called the cost parameter. If computational strength increases by 10x, pump up the cost! Well, in theory that will work (quantum computers? la la la la la la la).
At this point, hijackers are down to brute forcing. Your high target users with the most common passwords are still at risk, so consider disallowing the most common passwords.
Bcrypt has been around since 1999 and has matured quite a bit. It’s available in most languages and has a lot of eyeballs on it. But, it’s starting to show a little age. In particular, the computational strength stored with each Bcrypt password cannot be dialed up. So, in five years when computer power increases substantially, you’ll have a harder time pushing up the power for all of your passwords. Not too big of a deal — worst case, if something better comes along you can always wrap your Bcrypt passwords with the next big thing.
If you’re feeling feisty, have a look at Scrypt and friends. Scrypt, in particular, is gaining in popularity especially among crypto-currencies. But, like any relatively new technologies, there are risks. The many implementations have not had as much time to mature, and there are fears for the underlying maths. Currently, I’m not aware of any websites that use Scrypt to store passwords.
If I started a new site now, I’d go with Bcrypt.
How do I use Bcrypt?
It’s super easy. In Python, you can create a new Bcrypt password for user registration or password changes with the following:
bcrypt_pw = bcrypt.hashpw(plain_password, bcrypt.gensalt() )
And you would store the bcrypted_password in your database. To verify a password the user just entered, do the following:
allow_login = bcrypt.hashpw(plain_password, bcrypt_pw) == bcrypt_pw
Bcrypt passwords look like this:
bcrypt.hashpw(“123456”, bcrypt.gensalt()) = '$2a$12$HqXXZ1TTb4Z0OV9jYUS7X.A1tYCPWRn6FhSVcsGDxjXm92BWw35yS'
For the curious, the 2a defines the format. 12 is the cost factor. The goop before the period is the salt, and the goop after is the hash.
Here are some Bcrypt libraries you can use:
- Java: http://www.mindrot.org/projects/jBCrypt/
- Python: pip install bcrypt
- Ruby: gem install bcrypt-ruby
- Node: https://github.com/ncb000gt/node.bcrypt.js/
- Go: import “code.google.com/p/go.crypto/bcrypt”
- C/C++: http://openwall.com/crypt/
- Perl: Here, but seriously… Perl?
How do I convert my current site to Bcrypt fast?
My prime audience is those of you out there who are reading this and realizing that your 10+ million users’ passwords are SHA-1’d or one of the other weak methods. The good news is converting your entire system to Bcrypt can be done overnight quite easily.
If your passwords are not salted, creating and verifying new passwords will now be a bcrypt of your previous method. If already using SHA-1, you’ll now do the following:
bcrypt_pw = bcrypt.hashpw(sha1(plain_password), bcrypt.gensalt() )
allow_login = bcrypt.hashpw(sha1(plain_password), bcrypt_pw)==bcrypt_pw
You don’t gain or lose any security by keeping the SHA-1 password around, but you can now convert all your passwords immediately. You don’t have to brute force discover all of your users’ passwords, force reset or wait for them to login so you can convert the password.
If you have a system with a per account salt, such as SHA-1(pw+salt), it’s slightly tougher. You now need to store the bcrypted password and the previous salt:
bcrypt_pw = bcrypt.hashpw(sha1(plain_password+sha1salt), bcrypt.gensalt() )
And store bcrypt_pw and sha1salt in your database so you can do the following:
allow_login = bcrypt.hashpw(sha1(plain_password+sha1salt), bcrypt_pw)==bcrypt_pw
Be sure that you don’t log plaintext passwords. Most common plaintext passwords are found when sites log errors during login, log user edits, etc.
If you have backups of old database dumps (hopefully you do) or dump your database for use in map-reduce, consider wiping whatever you don’t need or cleansing what you do need soon.
That’s all you need to know! Go to it! Write me at email@example.com when you’re done. I’d love to hear your experience and challenges!
Last Last Thoughts
For the sake of fun story telling, I left off one other way passwords are commonly stored incorrectly:
5. 2-way encryption
Encryption is not necessary except if you want to recover a specific user’s account. This violates requirement 2 of good password storage from above.
Adobe was guilty of storing their passwords this way. They used 3DES, which is an encryption method. 3DES is also unsalted and therefore susceptible to decrypting via rainbow table attacks. Worse, if your hacker has recovered your encryption key, your database is now as good as plaintext.
It gets worse. Adobe also stored hints. I think this XKCD comic says it best:
Marty Weiner is a manager on the spam and abuse engineering team