Password and Credential Management in 2018 🔒

State of the art security for the most valuable secrets

Photo by Samuel Zeller on Unsplash

Introduction

Many people will read the headline and probably think: “No, not another piece of advice that I should hash passwords — uh”. But stop. You’ll learn a lot more here. Promised.

We will cover the “perfect” (Nothing is absolutely perfect and of course I would be more than happy for any suggestions for improvements in the comments 😉) way to handle password credentials from the moment a user types them into a form on the client-side, till the moment they are stored in the database. Furthermore, we will look into common errors, that happen when developers store other credentials like Tokens/Secrets/etc.

Chapters

  • Password Handling in 2018
  • Other Credentials (Token, Secrets, …)
  • Credits, License, References

Password Handling in 2018

Many developers know that they should hash passwords. Most know that they should use a per-password-salt to mitigate rainbow table attacks (What is a rainbow table?). Also most developers know that they shouldn’t use SHA-*, but instead a KDF— a hashing function specifically designed for password-hashing. [1][2][3]

In this short “my best practices” we will cover the things mentioned above and a bit more. Firstly, we will permute the password on the client-side. Secondly, we will encrypt the final hash before writing it into our database, similar to how Dropbox does it. Following the “Dropbox-Way” of presenting your password protection I generated a diagram that shows all the cryptographic layers. In the figure below, everything regarding encoding is emitted (e.g. base64, hex, etc.) [4]

Multiple layers of protection for passwords ✔️

Although I really like the onion-diagram above, I think for explanation purposes another figure, based on the flow through the system, is easier to understand.

The password flow

Everything starts with our users entering the password into our website and submitting the login. Here comes the first layer most developers think is irrelevant. Before we send the username and password over the wire we perform a single SHA3-512 round on the plain-text password plus a unique name for our service (for example the domain — this means if the user would use the login system at auth.example.com we would do the following: SHA3-512("plain-text-password-from-user" + "auth.example.com")). Why we add this public, well-known, for every user of the service equal, “salt” is explained later.

Password flow through the software system, with ever increasing security 🔒

Right now most developers think: “Don’t we have https for keeping the password secure?”. And that’s right. But keeping the password secure from eavesdropping, etc. was never the intention of this step. In fact try to see this step as converting-function — after this round the permuted version (e.g. the hash) becomes the user’s so-to-say “actual” password, which gets submitted to the server-side.

Why should we do this?

Simple: It shows respect for the user’s password and that you are aware of the fact that, in most cases, is not exclusive to your software. Additionally we gain a few smaller security bonuses (castle approach): There is no way we could ever accidentally store the user’s plain-text password in our logging system, unlike GitHub and Twitter, which both admitted in May 2018, that they have found plain-text passwords in their logging systems. Also the user password would be slightly protected in a MITM attack or a compromised server. “Slightly” because the strength of a single SHA3-512 hash, purely depends on the user’s input, which is admittedly most times not very good. The last point is that client-side hashing is the only simple way to prove you are not “farming passwords” ✔️ [5][6][7][8]

Let’s think about the last statement and our assumption about user’s reusing their passwords: If now every site would start to use the client-side hashing approach with SHA3-512(password) the complete idea of respecting the user’s password privacy would be destroyed, as every service could use the hash against other services (as currently with plain-text passwords). Therefore this approach wouldn’t give as any enhancement if deployed widely. However if every login-system would add a global unique salt (e.g. domain — SHA3-512(password + domain)), each website’s server-side would get different “permuted” passwords, even if the user takes the same password for every service.

The only two drawbacks to this approach are that:

  • you cannot enforce password policies on your server-side. Although, whether password policies make sense or not is a different question anyway. I’ll cover that in a later post.
  • in case you ever change your company’s domain, you need to ether keep your old domain in the hashing code or you do hashing scheme updates transparently during user authentication (e.g. you do the client-side hashing with the old and the new domain, send both to the server, check if the old domain-calculated hash matches the database entry and update it with the new domain-calculated value)

Normalization

OK let’s continue with our password flow: The username and the permuted version of the password get transmitted over https (!!) to our server. Normally is is recommended to perform a single round of SHA3-512 now. This would be done to normalize the output to fixed 64 bytes, because a few password hashing functions truncate after N bytes (for example bcrypt truncates its input after 72 bytes or a NUL byte), which reduces the entropy of the password. Other password hashing algorithms (PBKDF2) are vulnerable against DoS attacks, if passwords can be arbitrarily long. [9]

Because the client-side permutation of the password already was a normalization, this shouldn’t concern us, as long as we check whether the client-side provided string is a valid representation of a SHA3-512 hash. If it is we pass it into our KDF — if not we must abort, as we got a tampered malicious input.

Client-side password permutation and normalization

KDF (Password hashing functions)

🔑 Speaking about KDFs: There are a few acceptable algorithms from which you can choose — namely: Argon2, bcrypt, scrypt, PBKDF2.

Argon2 has won the password hashing competition in July 2015, out of 24 candidates. Since then nobody has found a real attack vector against it. Therefore most cryptographers believe that Argon2 is highly unlikely to fall victim to attacks that make it worse in practice than one of the others and subsequently recommend using it. In Argon2 you can not only specify a cost-parameter, like in bcrypt, but rather 3 parameters: number of iterations, memory consumption and number of threads. Despite all the benefits, bcrypt is out there since 1999 — that’s close to 20 years without major vulnerabilities! Therefore it can be seen as much more battle proofed than Argon2. Also not all cryptography libraries provide first class Argon2 support. In these cases you should use bcrypt. [10][11]

In 2009 scrypt, a brypt like function, which requires more RAM, and subsequently makes it more resistant for hardware accelerated attacks, was published. Unfortunately due to its massive memory requirements it‘s very hard to scale and practically not usable for an authentication system. Lowering the memory usage is not feasible as it then becomes, technically, weaker than bcrypt. Therefore its main usage is only in places, where spending hundreds of megabytes of memory and multiple seconds worth of CPU time for a single hash computation, aren’t a problem (e.g. protecting the encryption key for your computers main hard disk).

The most widely deployed algorithm is probably PBKDF2, although it shouldn’t be your choice if you build a new application nowadays, except if you need FIPS-certification.

In the end everything comes to your personal flavour, and how conservative you are (usually you should be, when thinking about cryptography). I personally have only used bcrypt till now, but I will switch to Argon2 for the next project. [12]

OWASP, a big online community that tries to increase web application security, through freely-available articles, methodologies, documentations and tools, recommends the following in their “Password Storage Cheat Sheet”: [13]

- Argon2 is the winner of the password hashing competition and should be considered as your first choice for new applications;
- PBKDF2 when FIPS certification or enterprise support on many platforms is required;
- scrypt where resisting any/all hardware accelerated attacks is necessary but support isn’t.
- bcrypt where PBKDF2 or scrypt support is not available.

Usage of KDF

The usage of KDFs is pretty self-explanatory: The credential-specific salt is loaded from the database and used together with the client-side provided hash to compute the KDF output.

Server-side salting with a strong KDF. On the left the conservative way with bcrypt. On the right the futuristic version with Argon2d
As you have seen in the diagram above, after the initial password permutation in the fronted, I maintain two different branches. The left one which is a little bit more conservative and the right one which is a little bit more futuristic. Both are completely safe and it mainly depends on one’s preferences.

Symmetric Encryption

🔒 After the KDF our password is computationally secure (e.g. implausible to recover — note that nothing is impossible, therefore implausible refers to the computational hardness assumption: is the hypothesis that a particular problem cannot be solved efficiently (where efficiently typically means “in polynomial time”)). [14][15]

Still we perform a last step before persisting it into our database. We encrypt the hash using a symmetric encryption algorithm like AES256-GCM or ChaCha20-Poly1305 as this makes a database dump absolutely worthless for brute-force attacks. That’s a fact that can be inferred from thermodynamics: [16]

“These numbers have nothing to do with the technology of the devices; they are the maximums that thermodynamics will allow. And they strongly imply that brute-force attacks against 256-bit keys will be infeasible until computers are built from something other than matter and occupy something other than space.”, Bruce Schneier

IF we manage to keep the key secure (and of course no significant vulnerabilities are found in the used algorithm and its implementation). The algorithms AES256-GCM and ChaCha20-Poly1305 are used, because these provide AEAD. [17]

Symmetric encryption before persisting the hash into a database. On the left the conservative way with AES256-GCM. On the right the futuristic version with ChaCha20-Poly1305

In order to keep the key as secure as we can (without taking advantage of HSMs) we make use of Hashicorp’s Vault and its ability to do EaaS (Encryption-As-A-Service 😅). Therefore we send the output of the KDF to our Vault instance, get the encrypted hash back and store it inside our database. Next time the user wants to log in we load the encrypted hash from the database, decrypt it with Vault and compare it with the generated hash for this authentication cycle. Don’t forget to do Constant-Time-Comparison (e.g. be resistant against a timing-side-channel attack). Some people probably say it’s not important as we only compare hashes. I would advise you to do it anyways, as it’s a good attitude, if you’re making a comparison related security decisions. For example: the — very good — supplementary cryptography package provided by the Go team also does it in its bcrypt implementation here. [18][19][20][21][22]

The final result

All layers of protection previously explained combined in one picture

Please let me know your thoughts about this way of handling user passwords in the comments. How do you handle it at the moment? Is something new to you or should be explained in more detail? Let me know!

Other Credentials (Token, Secrets, …)

Although most developers nowadays at least care a little bit about how they should store their passwords, many don’t think about other credentials like Token, Secrets, etc. (admittedly I didn’t either in the past). In order to fully understand, why we should also protect those let’s have a short look back upon the reasons why I originally chose to protect the passwords of our users so much. If thought long enough about this question, most come to the following two assumptions:

If a database leak happens, we don’t want the attacker …
  1. … to somehow regain access to our user’s plain text password, because most users use the same password on multiple sites.
  2. … to be able to use the credentials to authenticate against the service. (Of course this point is only correct, if the database leak has come in a different way than a full system compromise. But the underlying assumption in the “hash-your-credentials” field is to prevent an attacker with read-only access from escalating to higher power levels, e.g. to impersonate an actual user [23])

It’s the second point that most developers just don’t think of, when they hand out registration-token, account-recovery-token, API-token, etc. Although they are mostly the same as passwords, as they can be used to authenticate against the service (sometimes they are even more powerful than a password, because they don’t enforce 2FA or MFA).

Let’s think about that it the scenario of account-recovery-tokens:

  1. User requests the recovery-token and server-side generates the token (hopefully from a CSPRNG) and encodes it into something readable (helpful tip for readability of token: [*]) [24]
  2. Stores it inside your database and then sends it to your user
  3. User is very security educated and knows that he should write the token down twice on two pieces of paper and store both in distinct places that are fire-resistant. (This is how I would recommend you to safe your TOTP-Secrets, Recovery Tokens, etc.) [25]
  4. Half a year later a hacker manages to get a complete dump of your account-recovery table. You and your team haven't noticed it yet, so you can’t just invalidate the tokens.
  5. Even though the users haven't done anything wrong regarding their account security, the hacker can enter each account-recovery code into your website, therefore impersonate the actual user and change emails and passwords for each account.
  6. Hopefully you can roll back all the changes from a backup, but earning back the trust of your users definitely will be a challenge

[*] Advice: for encoded strings presented to the user I personally prefer lowercase base32 strings split up in packets of 3 or 4 chars — nr6i vzbv h3so thfc — the reason is rather simple: it’s case-insensitive, easier when writing it down and users are less likely to make an error compared to tokens like: zF61bS1lwnO04eq3

Real world example

Sites including Twitter, Google, etc. support the functionality of recovery-token and they can also show you the token if you missed to write it down in the first place 😲— they store it in plain-text! An absolute no-go for credentials. I have no doubt that a company like Google is capable of securing those tokens, but I would love to hear the argumentation, why they aren’t hashed.

Twitter “Get backup code” ❌ (Screenshot captured: 2018–08–12)
Google “Show Backup Codes” ❌ (Screenshot captured: 2018–08–12)

Implementation

I don’t want to say too much on the implementation site, as most developers can figure it out on their own, if they are aware of such problems. Very short: of course also hash special tokens, API-keys, etc. with including a per-entry-salt.

I personally handle it this way (for account-recovery tokens — e.g. something the user should write down physically):

  • Generate 20 bytes of cryptographic randomness (CSPRNG) for our account recovery token. Depending on the use-case the amount of bytes generated should be adjusted. (For example: when creating API-keys, there is nothing against using 64 bytes of randomness.)
  • Encode it (like described above). This results in a 32 character long string that should look similar to this:
usru kbvj nmvg xly5 4qh3 jnk6 jd2n iadm
  • Generate additional 32 random bytes and use it as salt for hashing your token with ether an KDF (if performance secondary) or plain SHA3-512
  • Store the first 10 characters (6 bytes) of your token — usru kbvj nm— as ID, the salt and the generated hash, here called “secret”. My experience: In an actual implementation I also do, depending on the latency and performance requirements, symmetric encryption on the secret field (same process as for passwords).
Information that is persisted into the database
  • Send the plain token to your user and afterwards erase it from memory

Your user can now write down the Token. If he enters it later your service will take the first 10 characters of the input and try to find the entry with the same ID. If found he will take the salt and the complete user input, hash it and compare it to the secret. Again please do constant-time-comparison (e.g. timing-side-channel attack resistance), as it’s a good attitude, if you do comparison related security decisions.

The End. Thanks for reading, Florian

Credits

A big thank you goes to

  • Lukas Kurz for proofreading the text for linguistic correctness and comprehensibility before I published it.
  • Tim Heckman for giving me technical feedback and advise on many of the article’s chapters before I published it.

Need help?

I always have an open ear — florian@harwoeck.at — just contact me!

License

This article and all images included are subject to the Creative Commons BY-NC-ND 4.0 license. Florian Harwöck 2018


References

[1] Rainbow table, Wikipedia, https://en.wikipedia.org/wiki/Rainbow_table

[2] What is a rainbow table?, crypto.stackexchange.com, 2014–07–29, https://crypto.stackexchange.com/questions/1058/how-can-rainbow-tables-be-used-for-a-dictionary-attack/1063#1063

[3] Key derivation function, Wikipedia, https://en.wikipedia.org/wiki/Key_derivation_function

[4] “How Dropbox securely stores your passwords”, Devdatta Akhawe, 2016–09–21, https://blogs.dropbox.com/tech/2016/09/how-dropbox-securely-stores-your-passwords/

[5] “GitHub Accidentally Recorded Some Plaintext Passwords in Its Internal Logs”, Catalin Cimpanu, 2018–05–01, https://www.bleepingcomputer.com/news/security/github-accidentally-recorded-some-plaintext-passwords-in-its-internal-logs/

[6] “Twitter Admits Recording Plaintext Passwords in Internal Logs, Just Like GitHub”, Catalin Cimpanu, 2018–05–03, https://www.bleepingcomputer.com/news/security/twitter-admits-recording-plaintext-passwords-in-internal-logs-just-like-github/

[7] Man in the middle attack, Wikipedia, https://en.wikipedia.org/wiki/Man-in-the-middle_attack

[8] Password farming and reuse, https://xkcd.com/792/

[9] “Long passwords are good, but too much length can be a DoS hazard”, Dan Goodin, 2013–09–16, https://arstechnica.com/information-technology/2013/09/long-passwords-are-good-but-too-much-length-can-be-bad-for-security/

[10] Password Hashing Competition, https://password-hashing.net/

[11] Argon2 Recommended Parameters, “otus”, 2016–06–18, https://crypto.stackexchange.com/questions/37137/what-is-the-recommended-number-of-iterations-for-argon2/37140#37140

[12] “following the “next big thing” is not generally a good idea in the world of cryptography”, Stephen Touset, 2017–02–01, https://crypto.stackexchange.com/questions/43471/storing-parameters-in-argon2-hash-as-potential-security-issue/43473#43473

[13] Password Storage Cheat Sheet, OWASP, 2018–07–19, https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet

[14] Implausible Terminology, Paul Uszak, 2017–04–24, https://crypto.stackexchange.com/questions/46718/impossibility-vs-implaussiblity/46861#46861

[15] Computational hardness assumption, Wikipedia, https://en.wikipedia.org/wiki/Computational_hardness_assumption

[16] “How much would it cost in U.S. dollars to brute force a 256 bit key in a year?”, “ir01”, 2011–11–09, https://crypto.stackexchange.com/questions/1145/how-much-would-it-cost-in-u-s-dollars-to-brute-force-a-256-bit-key-in-a-year/1160#1160

[17] AEAD — Authenticated Encryption with Associated Data, Wikipedia, https://en.wikipedia.org/wiki/Authenticated_encryption

[18] Hardware security module, Wikipedia, https://en.wikipedia.org/wiki/Hardware_security_module

[19] Vault by Hashicorp, https://www.vaultproject.io/

[20] Constant Time Comparison, Dave Thompson, 2016–08–17 https://crypto.stackexchange.com/questions/39429/why-not-use-or-in-constant-time-comparison

[21] “Timing attack and good coding practices”, “Biv”, 2016–11–21, https://crypto.stackexchange.com/questions/41691/timing-attack-and-good-coding-practices/41698#41698

[22] Golang bcrypto Implementation, https://github.com/golang/crypto/blob/master/bcrypt/bcrypt.go#L111

[23] “Why passwords should be hashed?”, Thomas Pronin, 2011–11–01, https://security.blogoverflow.com/2011/11/why-passwords-should-be-hashed/

[24] “What is the difference between CSPRNG and PRNG?”, Thomas Pornin, 2013–12–18, https://crypto.stackexchange.com/questions/12436/what-is-the-difference-between-csprng-and-prng/12441#12441

[25] TOTP — Time-based One-time Password algorithm, Wikipedia, https://en.wikipedia.org/wiki/Time-based_One-time_Password_algorithm