(Skip down to How NOT to store passwords if you want to jump to the punchline)
You’re Welcome, Hackers
Passwords are the greatest gift we ever gave to the hacker community. An elegant combination of human nature and poor dev choices brings Christmas in July for hackers. And January. And February. And well, every month, yes even December.
We’re witnessing a sharp rise in the number of credentials being stolen and accounts being hijacked across the web. Each new set of hijacked credentials gets a spammer into the inner sanctum of your friends on every social platform where they can spam as you. Spammers create content with links to their websites where they entice people with ads, knockoff products, and scams. Many times, surprisingly, the knockoff products do actually work as promised. In one study, Berkeley researchers purchased many products from these sites and actually received them. In the case of even the knockoff Viagra, the product contained the proper active ingredients!
Spammers are awesome marketers. While their marketing is illegitimate, spammers are highly motivated. They’re great at posting targeted content, getting a lot of views. They typically make a LOT of money by posting content to a site which links back to spammy sites where they monetize users.
To post content, spammers first need a registered account on your site. There are two ways to get accounts. (1) They can make or buy fake accounts to post content. To be successful in this game requires building legitimate looking accounts and a fair amount of effort to get users to click on spammy content. A spammer may need to make “friends” with many good users so their posts are visible. (2) Alternatively, spammers can hijack your good account. Now spammers can get their content right in front of your friends’ faces, which typically increases visibility and propensity to click. In this game they must acquire passwords, and they must use them before their competitors use them and before anti-spam security teams discover them.
In one well constructed estimate, spammers can make up to $2.4M in a month. To spam with hijacked accounts, spammers purchase accounts from hackers or employ hackers who try to breach company databases. In the last three years, Adobe leaked 150M emails and passwords, LinkedIn leaked 6.5M, eHarmony 1.5M, Snapchat 4.6M, Gawker 1.3M, Yahoo Voices .5M, Last.FM 2.5M, and Forbes 1M (cite). And there are more. These companies do protect user passwords, however this protection is often disastrously weak in the face of how passwords are used by people and how they’re stored by devs.
Problem the first — Humans
The human brain is a complicated meat sack with many competing inputs and many things to remember. Passwords are a tolerated nuisance. This sack of fats and lipids is also amazingly good at underestimating risk. The result is we users often (A) have simple passwords, or (B) use the same password in many different places, or (C) both.
It’s also common for people to have a set of 3 passwords:
1. Their simple password for Farmville and Candy Crush and Cranberry Silo Security Simulator 2014
2. Their moderate password for Twitter and Facebook and Pinterest and Gmail and Secret
3. Their strong password for their bank (though “strong” is typically an over-exaggeration)
If a hijacker has a user’s password for one site, they generally have it for the rest where it has been “shared”. If I’m a spammer wanting to maximize profits, damn straight I’m going to try these credentials on all sites I can get my hands on. Your website included.
Problem the second — Devs
I listed several sites that leaked their passwords. They all used less than ideal methods for storing passwords. Why? My theory is this sort of thing is just not taught in school or discussed often. Commonly, whoever first wrote user authentication used something that sounded okay. Since that day, nobody has wanted to touch the authentication layer. Is it worth the engineering effort to perform a big scary change in preparation for an event that may never happen?
I think yes — if you are a product manager who makes the decision to indefinitely delay fixing your passwords, you should be liable for when you hand your user’s keys, which happen to match their bank keys, to a Russian spammer.
If we can fix either the human element or how devs store passwords, we could make hijackers’ jobs much harder. Let’s talk solutions.
One solution is to fix humans or get rid of them. The problem with this approach is it’s generally impossible or bad for business. Even pushing your users to have stronger or unique passwords can decrease sign up rates in significant ways. You’ll have to weigh that balance with your growth czar. I’d at least recommend disallowing some set of 10000 most common passwords, using the username or variants as the password, and some passwords unique to your site (like ‘linkedin’ for LinkedIn). By disallowing these passwords, you make brute force against your database much harder.
I’m not going to dive further into fixing the human element until I perfect my global mind-control ray (patent pending).
Solution two, don’t use passwords. Companies like Toopher and Clef are now offering ways to use your Phone-As-Your-Identity (PAYI). Facebook and Google offer single sign-on (SSO) options where they manage user authentication data so you don’t have to. At Pinterest we offer SSO via Facebook and Google and are now exploring PAYI mechanisms. A spammer sitting 6,500 miles away in Bulgaria who needs a million or more accounts to be successful would need to hack into as many phones. That’s far harder since the spammer must hunt for cracks in venerable Telco/Google/Apple security rather than for the databases of a website that has not yet spent many person-years on security. It gets better.
With passwordless login, the insecure human aspect is removed almost entirely. Users can’t give away their credentials for chocolate (caveat caveat caveat) because users don’t have access to the credentials buried in their phones. Better, users can’t use simple passwords or share them on multiple sites even if they wanted to. Needless to say, I’m closely following and pushing for this trend, but until passwordless login is the norm we must continue to clean up the password mess our forefathers left us.
Solution three, store your users’ passwords properly. We, the devs, can solve this whole problem by simply storing passwords in a way that does not allow a hijacker to use them. Encryption, hashing, something…
Databases love to get leaked especially if they contain awesome data. Naturally you should work toward adopting a strong security stance to protect your database in the first place. But, should a hacker get a quick view of your database, make that view unappealing. Remember rule #1 of spamdom: “spammers want money or power”. A database with strong password protection and a bunch of recipe descriptions isn’t worth much. If hackers find you store your passwords correctly, they are less likely to extract your data — why bother? If your passwords still do leak, let’s minimize the risk to your users, your neighbors’ users, as well as the risk to your PR reputation.
How NOT to store passwords
Most people are surprisingly uneducated on how to store passwords.
Let’s do some engineering. What are the requirements for a good password storage system? I propose these simple two:
1. Users should be able to authenticate or create a new account by providing a unique identifier (e.g., email) and a plaintext password
2. Password recovery difficulty should be maximized
A non-requirement — you should never need to recover the exact password a user used. The most common reason cited for password recovery is being able to email that password to the user if they lose their password. Many sites do this, and it’s bad bad bad. Any time I see a site that can send me my password, I KNOW they’re storing their passwords weak (and so do the bad guys). Refactor your account recovery system to create a new random password and send that to the user.
Looking at the last several major leaks, here are a few common ways passwords are stored:
Holy shit. Plaintext. This means that the passwords are stored with absolutely no encryption / hashing / anything. Yahoo Voices did this.
I believe storing plaintext passwords may violate requirement 2. Let’s stop here and cry in our beers for a few minutes that this is ever done.
2) 1-way cryptographical hash
A 1-way cryptographical password hash is intended to convert your password to a long string of characters that is very hard, if not impossible, to reverse. So, hopefully if a hijacker gets the hash of your password, they’ll never be able to recover your password. Examples include MD5 and SHA1 and look like this:
MD5 (“123456") = e10adc3949ba59abbe56e057f20f883e
MD5 (“seinfeld”) = 9fab36cc63eac0a9951edd4e6a6ac6f8
SHA1 (“123456”) = 7c4a8d09ca3762af61e59520943dc26494f8941b
This is the most common way passwords are stored incorrectly. LinkedIn was guilty of this sin. Hell, Microsoft got it wrong with their ubiquitous NT hash and LM hash.
So why is this a bad way to store passwords?
Here’s where the fun begins. Have you heard of rainbow tables? If so, you’re golden, skip this paragraph. If not, read on. A rainbow table is a very large hash table of password hashes back to the unhashed password (smarty description here, one I can read here). With a rainbow table, I can recover any one SHA1 password in picoseconds. It turns out you can construct a rainbow table for all alphanumeric passwords of 9 characters or less and store it in just 864GB in a few days (with this!). Or, you can go download one from www.freerainbowtables.com (no really!). With such a table and modern EC2 offerings (see r3.8xlarge on this table), you should be able to hit 50 to 100 million passwords converted per second. How many users does your site have?
3) 1-way cryptographical hash with salt
Do you avoid salt in real life? Here’s some light reading that might be important to your health.
If your password storage is also low in salt, read on.
Password salting is the practice of adding an extra bit of data to the password before you protect it. For instance:
MD5 (“seinfeld|81837210385") = 864069725cb607a13f097ef623f7eb75
MD5 (“seinfeld|93982716363") = 2d12282c267774a57368f6470b16650f
Then you store the hash and the salt in the password field. Given a decently large salt that’s different for every account, you cannot reasonably make a rainbow table to store all salted hashed passwords. Great!
But, by definition cryptographical hashes are FAST to compute. There’s a reason for this. They were never meant for security. They were meant for error detection in transitions and storage. If you transmit a 4GB file and even 1 bit changes, the MD5 of the file changes vastly:
MD5 (“seinfeld”) = 9fab36cc63eac0a9951edd4e6a6ac6f8
MD5 (“Seinfeld”) = 5b01e11129355ca058d358dfcae3ce2c
Great for error detection, but the speed of these hashes is so fast, you can create a rainbow table in hours to days. A rainbow table of all 7 or fewer character passwords takes minutes to compute. If your cryptographical hashed and salted set of passwords gets out, your overall user base is still very at risk, especially your high risk targets like celebrities or employees.
4) 1-way slow hash with salt
Let’s make computing the hash slow! I swear, this is the end of the ride.
With salt, making a globally useful rainbow table is damn near impossible. Using a slow hash makes creating a rainbow table for any one user exceptionally difficult or brute force crunching nearly impossible. Like years or decades or millenium impossible. Even better, as computers get stronger, these methods include a difficulty amount called the cost parameter. If computational strength increases by 10x, pump up the cost! Well, in theory that will work (quantum computers? la la la la la la la).
At this point, hijackers are down to brute forcing. Your high target users with the most common passwords are still at risk, so consider disallowing the most common passwords.
Bcrypt has been around since 1999 and has matured quite a bit. It’s available in most languages and has a lot of eyeballs on it. But, it is starting to show a little age. In particular, the computational strength stored with each Bcrypt password cannot be dialed up. So, in 5 years when computer power increases substantially, you will have a harder time pushing up the power for all of your passwords. Not too big of a deal — worst case if something better comes along you can always wrap your Bcrypt passwords with the next big thing.
If you’re feeling feisty, have a look at Scrypt and friends. Scrypt, in particular, is gaining in popularity especially among crypto-currencies. But, like any relatively new technologies, there are risks. The many implementations have not had as much time to mature, and there are fears for the underlying maths. Currently I’m not aware of any websites that use Scrypt to store passwords.
If I started a new site now, I’d go with Bcrypt.
How do I use Bcrypt?
It’s super easy. In Python, you can create a new bcrypt password for user registration or password changes with the following:
bcrypt_pw = bcrypt.hashpw(plain_password, bcrypt.gensalt())
And you would store the bcrypted_password in your database. To verify a password the user just entered, do the following:
allow_login = bcrypt.hashpw(plain_password, bcrypt_pw) == bcrypt_pw
Bcrypt passwords look like this:
bcrypt.hashpw(“123456”, bcrypt.gensalt()) = '$2a$12$HqXXZ1TTb4Z0OV9jYUS7X.A1tYCPWRn6FhSVcsGDxjXm92BWw35yS'
For the curious, the 2a defines the format. 12 is the cost factor. The goop before the period is the salt, and the goop after is the hash.
Here are some Bcrypt libraries you can use:
Python: pip install bcrypt
Ruby: gem install bcrypt-ruby
Go: import “code.google.com/p/go.crypto/bcrypt”
Perl: Here, but seriously.. Perl?
How do I convert my current site to Bcrypt fast?
My prime audience is those of you out there who are reading this and realizing that your 10+ million users’ passwords are SHA-1’d or one of the other weak methods. The good news is converting your entire system to Bcrypt can be done overnight quite easily.
If your passwords are not salted, creating new passwords and verifying new passwords will now be a bcrypt of your previous method. If using SHA-1, you will now do the following:
bcrypt_pw = bcrypt.hashpw(sha1(plain_password), bcrypt.gensalt())
allow_login = bcrypt.hashpw(sha1(plain_password), bcrypt_pw)==bcrypt_pw
You don’t gain or lose any security by keeping the sha1 password around, but you can convert all your passwords now. You don’t have to brute force discover all your users’ passwords, force reset, or wait for them to login so you can convert the password.
If you have a system with a per account salt such as SHA-1(pw+salt), it’s slightly tougher. You now need to store the bcrypted password and the previous salt:
bcrypt_pw = bcrypt.hashpw(sha1(plain_password+sha1salt), bcrypt.gensalt())
And store bcrypt_pw and sha1salt in your database so you can do the following:
allow_login = bcrypt.hashpw(sha1(plain_password+sha1salt), bcrypt_pw)==bcrypt_pw
Be sure that you don’t log plaintext passwords. Most commonly, plaintext passwords are found when sites log errors during login, or log user edits, etc.
If you have backups of old database dumps (hopefully you do!) or dump your database for use in map-reduce, consider wiping whatever you don’t need or cleansing what you do need soon.
That’s all you need to know! Go to it! Write me at email@example.com when you’re done. I’d love to hear your experience and challenges!
Last Last Thoughts
For the sake of fun story telling, I left off one other way passwords are commonly stored incorrectly:
5) 2-way encryption
Encryption is not necessary except if you want to recover a specific user’s account. This violates requirement 2 of good password storage from above.
Adobe was guilty of storing their passwords this way. They used 3DES, which is an encryption method. 3DES is also unsalted and therefore susceptible to decrypting via rainbow table attacks. Worse, if your hacker has recovered your encryption key, your database is now as good as plaintext.
It gets worse. Adobe also stored hints. I think this XKCD comic says it best: