How really to store and check passwords (and API tokens, which are passwords)

Dean Valentine
Leonard Cyber
Published in
11 min readSep 19, 2020

A cliché in posts detailing password storage schemes is to finish by telling the syadmins and generalist web developers not to store credentials in-house at all. I disagree with this prescription, mostly because I understand the practical reasons that make this difficult. Identity and access management is infinitely more expensive to course-correct when you aren’t the one authenticating your users. Writing auth is tricky, yes, but not only are you otherwise resigning yourself to lock-in off the bat, the size of a company’s information security department has not historically been a good indicator of how well credentials are secured. I think the type of devs who are conscientious enough to deliberately hand over their users’ fates to the “sign in with Twitter” buttons are also conscientious enough to use Argon2 instead of SHA2, and are not going to qualify for a lot of the benefit. There’s also people like me who will outright refuse to use a site that forces them to connect some social media profile or email in order to login. You’re better off learning how to build or at least identify a passable authentication pipeline now, and leave yourself the option of expanding or modifying it based on business need later.

Before we start, let’s talk about what we’re actually preventing through well-thought-out password storage and update policies. The types of attacks you need to be seriously worried about as a developer are conditional on what you’re protecting. From a threat modeling perspective you can loosely consider three categories of web applications, when controlled for size:

  1. Applications that don’t handle money or other things of extractable monetary value (like server time, physical products, etc.).
  2. Applications that do handle money or other things of extractable monetary value.
  3. Applications that handle cryptocurrency, which is money but with built-in money-laundering for whoever steals it.

If your product is in category #1, and you‘re not being entrusted with other apps that fall into categories #2 or #3, portions of this post may be overkill. Use your own judgement. Your resources might be better spent towards preventing SQLi, or wormable XSS, or horrible admin panel compromises, or some social engineering venue of total site compromise instead. I would still follow these guidelines anyways, because it’ll be a small amount of investment for something that’s going to be hard to change when you’ve got lots of users, but I can’t fault you for not really caring. Just remember that while you personally may not be guarding anything important, lots of your users almost certainly reuse passwords other places, and they care about your security.

When we get to the second category, all of those in-group memes that information security professionals parrot to each other to feel important and economically necessary actually begin to coincide with reality. As with anything else, prioritize where necessary— but I think the measures I talk about in this post are ones that can qualify as “necessary-but-not-sufficient”.

If your app is in category #3, and any implementation in this blog post is something you haven’t done or replaced with a better alternative, God help you.

Provision #1: Cryptographic Hashing

Hopefully if you’re making these types of business and architectural decisions you know that passwords are not supposed to be stored as-is in your database. If you don’t, now you do. You also need to make sure that you are not, as is sometimes common, inadvertently logging or recording the passwords your users sign up with anywhere in persistent storage. A rule of thumb you can use is that if your app is writing unencrypted credentials from your users to disk at any point in the signup or login process, via a database or anything else, something needs to be fixed.

The reason this isn’t done is because it’s unnecessary for your website to actually know the password. All that the site has to do during login is verify that the password your users entered is the same as the one they signed up with, which does not require that it be written down anywhere. Instead, what should be stored is the “cryptographic hash” of the password, the result of a one-way function that turns the password into a fixed-size string. When users log in again, the hash of what password they entered can be compared with the hash stored on disk. This way, if anyone gets access to or leaks the site’s datastore, they don’t have the raw passwords, they have a “hash” of the password which they must then try to reverse by running lots of possible passwords through whatever hashing algorithm you chose.

And hopefully if you’ve heard of cryptographic hashing, you’ve also heard that you shouldn’t store passwords without something called a “salt”, which is an additional random input to the hashing algorithm used to augment the hashing process. If you use a cryptographic hashing algorithm by itself, hackers can use a publicly available “rainbow table”, which is just a giant pre-computed map of passwords to hashes for that algorithm. With a random and long enough salt, each password has to be cracked individually, and hackers can’t share or generate these tables in advance.

Unfortunately when many web application developers choose a cryptographic hash for their password hashing they tend to choose something like SHA-1/2/3, or maybe PBKDF2. Instead, you should use a recent “password hashing algorithm” like Argon2 or scrypt. The difference between the two is manyfold:

  1. PHAs are designed to be difficult to perform, and regular cryptographic hashing algorithms are designed for speed. The magic of AWS or my botnet can try trillions of SHA1 hashes after an hour for a couple hundred dollars. Using a good password hashing algorithm can cripple that to maybe hundreds of thousands, and allows you to specify how much RAM, parallelization, and CPU time to use so that the algorithm is maximally difficult but still feasible for the hardware you run it on. A 100ms delay is almost unnoticable to users who are signing in, but paralyzing to someone that has to try 50,000 other options before they get to “p@ssw0rd2!” in their wordlist.
  2. While not usually important in this context, some PHAs feature better side-channel consideration. Side-channel attacks are means by which you can glean sensitive information from the implementation side effects of a cryptographic operation, like through timing information, power consumption, or hardware usage. Normally, these require local access on the machine and intimate knowledge of hardware, but not always.
  3. PHAs are also designed to limit assymetry in the hash power of attackers and regular users. In the case of SHA1/2/3, specific ASIC (application specific integrated circuit) chips are sometimes developed to perform massive amounts of hashes at one time. Password hashing algorithms attempt to limit the advantage that can be gained through this approach, and make cracking proportional to the amount of general purpose hardware that the attacker has.

As opposed to the standard SHA3+Salt scheme, password hashing algorithms make database password dumps almost as difficult to leverage as online password cracking attacks, where the attacker just tries different logins with a bot via your login page.

Provision #2: Symmetric Encryption with a well-secured key

Another cliché in password storage how-to’s is to first give encryption as an example of a bad solution, before introducing password hashing and presenting it as the correct alternative. Guess what? There’s no reason you can’t do both. You can start by hashing the passwords users give you to signup, and then before storing or retrieving those hashes, encrypt or decrypt them with a site-wide key.

Sometimes its suggested to me that because the one has some datastore service (Heroku PostgreSQL, Amazon S3) that “encrypts all database contents at rest”, using a second decryption key is unnecessary. The problem is that your users table is not encrypted from your application. Those features are for keeping a few types of compromises of Heroku or AWS or Google Cloud Platform from leading to database dumps, but they don’t help keep it secure from people who find a SQL injection, unsecured admin panel, or employee laptop. Manual decryption prevents the hackers that get sideways access to your datastore from being able to grab the hashes in the first place, even if they’re making the same SQL select statements that your team does in your application code.

Then it’s suggested that this is pointless, because whatever attacker that compromises a password database almost certainly has access to the key used to decrypt those password hashes during login. This is historically not correct; the adobe breach indicated many terrible things about Adobe’s security, but the key used to (AES-ECB, regrettably) encrypt their passwords were never found. I think this underlies a misconception about how this is done in practice: you shouldn’t store the key used to encrypt items in your database in that database. If you do, that really does mitigate a lot of the benefit. Pass the key into your app via an environment variable, and use whatever credential management solution (Hashicorp Vault, etc.) you’ve got going to store Postgres passwords or API keys to store the decryption key long term.

Provision #3: Signup restrictions

Password requirements get a bad rap — and generally they’re implemented poorly — but the idea is sound. The best way to ensure a password isn’t cracked is to use a good password. So don’t let your users use “qazwsxedc” as a password, or else your fancy hashing is useless (at least for those really uncooperative users). The real problem with these requirements is that the site is only loosely approximating “entropy of the password” through their 8-characters-and-one-number-and-one-special-character rules. People hate these naive measures of password strength, and rightly so; whenever I generate a 64 character alphanumeric password for KeePassXC, and it gets rejected, I immediately wonder what the devs were smoking.

The solution here is simple. Now that you’re using a good, password hashing algorithm, and it will take most good solid attackers around ten millisecond per password try, just download a wordlist of the top ten million passwords or so. It doesn’t really matter that it’s representative of your users in particular, so long as its compiled from actual leaks, because we can afford five million or so in buffer. When your customers try to use a password on that list, reject it and explain that it’s on this file.

People might complain about not knowing in advance whether or not their password is blacklisted while they signup, but this way you can be sure that your users are at least not going to be using one of those first ten million common passwords if someone gets the hashes. Perhaps you will be unlucky and your users will concentrate around the next couple million and some site-specific ones, but for the most part you’ll ensure that even uncooperative users’ passwords take days to be cracked cold, assuming the attacker managed to get the decryption key.

And by this point you’ve basically removed most of the danger that comes from offline password cracking attacks. If I know someone is using this scheme, I’m more worried about the attack vector that was used to dump the users table than I am the actual passwords that got leaked in the process. Or other data like API tokens or “reset password” nonces that should be hashed but aren’t. That’s not going to prevent the media from publishing news articles pushing the free credit monitoring like crazy, though, so maybe keep diligently using those prepared statements.

Provision #4: HIBP

This one might be controversial, but I’m putting it here anyways.

Password reuse is now your largest remaining problem when it comes to securing user credentials, if you’ve been following this guide so far. Other, less savory sites are going to eventually leak plaintext or SHA1 versions of passwords associated with the same email address or username that’s your users signed up with. Most of your users will have dozens of accounts on dozens of different sites, and unless you’re a cryptocurrency exchange (and thus clearly not in need of this guide, right?) most of them will not use password managers.

Have I Been Pwned is a site that keeps a database of 500 million breached passwords, and it is updated regularly with new breaches. I suggest, if your users’ password shows up clear on the initial 10M wordlist, that at signup you check that it doesn’t exist in this database via their published, no-rate-limit API. The API works by asking you to SHA1 hash your password, and send it the first 5 characters of the hexadecimal representation. It will respond in turn with the SHA1 hashes of every password in its database that has that prefix, which you can check for exact matches. This way, according to HIBP, by performing this lookup you don’t have to give them your password directly — you just give the SHA1 prefix, which could be the prefix to annnyyyything. They even allow you to set fixed padding, so an observer can’t make a guess at which section of the database you’re looking at based on how large the response is.

Obviously some people will grumble at the imagined scenario of having their nice new Argon2 hashes partially compromised, because maybe an attacker who can connect these queries to accounts, can maybe throw out every possible password that doesn’t have a cousin with 3BC0F as a SHA-prefix. But I think this is more than worth it. There are a couple things about this service that mitigate this risk (and a couple things you can do to further mitigate it):

  1. The HIBP people seem like pretty standup guys. I trust them. Don’t you?
  2. There aren’t any API keys required to access this endpoint, which means HIBP doesn’t necessarily have to know it’s you who’s calling. If HIBP keeps logs of these queries (I sure hope they don’t), and they ever get leaked, it’s an extra step to tie them to your service based on IP address or some other metadata. It’s also an extra step to tie them to a particular session. You can even try to proxy your connection, if you want to be extra about it.
  3. If you don’t keep logs of these queries (please don’t), it’s somewhat unrealistic for any attacker to be able to time-correlate signups with what you asked for on the HIBP API. Without those logs that you keep and they steal, the fact that you asked if the user had a password of “X” but also maybe “Y” and “Z” that one time is just not very helpful info.
  4. You’re not actually letting users sign up with these passwords. Sure, they might just append them with a 1 or ! once you reject them and move on, but your attacker has to deal with finding the several dozen mutations surrounding the hashes in HIBP’s database rather than just the password itself.
  5. Nothings in their TOS says you can’t throw out a random string or five to HIBP every time you check a password, just to keep them on their toes.

Again, use your own judgement, but in return, any password your user might have gotten leaked in the past, or that another person used and got leaked in the past, they won’t use on your service. Can’t ask for easier, more thorough password reuse prevention than that.

The cliche response that you shouldn’t store passwords is hinting at something, though, and I think it’s that you shouldn’t leave your users with just one form of authentication. If you are a #2 service, encourage your accounts to have a second factor of authentication setup: ideally a Yubikey, but barring that Duo or Timed One-Time-Passwords. A password can be as random as you want, but ultimately they will be compromised through a number of different ways that have often little to do with your site’s security. I don’t think I’ve ever even heard of someone leaking Google Authenticator or Authy keys. It just never happens. Don’t let this inevitably lost nonce be instead the only barrier between me and total account takeover.

--

--