Passwords 101: The Ins and Outs of Password Authentication

Istvan Lam
Tresorit Engineering
6 min readJun 19, 2020

Nowadays, passwords are used everywhere as a way of authenticating that you are really you. While other types of authentication like fingerprint readers exist, the majority of systems still rely on good old password protection. But how much do people really know about how passwords work? I’m going to start tackling the subject by taking a look at how systems check that you typed the right password — without ever storing it.

The theory behind creating secure passwords is actually pretty simple. Commonly accepted wisdom goes like this; passwords are like underwear so keep them private, don’t share them, and don’t leave them lying around. In other words, users are encouraged to choose different, random, and strong passwords for each and every service they use, and replace them regularly. And that will keep them safe, right?

Well, in practice, it may not be entirely true … or quite that straightforward. So, you need to prepare yourself for some hurdles on the way. I love XKCD’s sarcastic way of poking fun at theoretical and practical crypto mindsets when it comes to passwords.

Source: XKCD

What is authentication anyways?

Let’s start here as the term is often misused or overused. It’s important to clarify that authentication is the process of proving a claimed identity. Nothing more, nothing less. Picture this; when you go through passport control at the airport, and the border guard checks your passport, then looks at you to check it’s the same person — this is a form of authentication. All those times you call up your better half, and say into the phone, “Hey, it’s me”? You wouldn’t think it, but that is actually you using voice authentication. When you log in to Tresorit with your password and 2nd factor, that’s how we know that you are really you.

That being said, there are obvious ways to get around authentications: IDs can be forged, voices recorded, passwords stolen. So when you design an authentication system, you need to do it in a way that minimizes the risk of someone being able to bypass it.

Authentication is a basic security function and therefore a prerequisite of making access control decisions, also known as authorization. But it is important to highlight that authentication is not the same as authorization. Authentication is when border security makes sure that you are on the picture of the passport. Authorization is when based on your nationality, visas, history, etc., they let you in.

There are 3 basic factors which translate to different authentication approaches:

What you know: passwords and the like

What you have: hardware tokens, physical keys

What you are: biometrics, like your face or your fingerprints

You’ve probably come across the term 2-factor authentication. This is where it comes from, i.e. using 2 of the above layers of protection. Although hardware tokens and biometric authentication are quite exciting topics, first we need to understand how passwords work and why 2-factor is important.

Some of the advantages of passwords, to give you a bit of context, are that they are simple, intuitive, and cheap to implement. The keyboard is there anyway, while a fingerprint reader is an extra piece of hardware for you to include in the system.

Unfortunately, there are quite a few disadvantages as well:

1. Passwords must be memorized by the user, so users tend to choose guessable passwords, or use the same password on multiple systems

2. Something about the password needs to be stored by the verifier and this can be used for offline attacks

3. Passwords can be snooped on by keystroke logging, shoulder surfing, etc.

4. Passwords are easy to reveal and share; a user might be attacked by a phishing site, or the user may write it on a sticky note

Model of password-based authentication

There are many ways you can use your password: through a touchscreen, a website, or Wi-Fi to give a few examples. In all cases, what you input is a username and a password (on your smartphone it may seem that you have a PIN without a username, but it’s actually the username is in a hidden field, automatically provided by the system).

The verifier system uses the given username to look up the right record in the password table, where derived data from the password is stored — f(pwd), and simultaneously uses the same function to derive data from the given password. If the stored information matches the derived data, then authentication is successful, otherwise, it doesn’t work.

Model of password-based authentication

For a long time, systems did not use any derivation function like the one pictured above. They just stored the passwords in plain text or they used a very simple derivation function, which returned with the input. The issue here is that if the password table leaks somehow, plain text passwords can get into the wrong hands. Despite this risk, sadly many low budget systems still use such methods.

This is especially dangerous if people are reusing their passwords in multiple systems. Imagine the following: let’s say you have recycled your Twitter password while buying something in small, random webshop which stores passwords in plaintext. For a hacker, it is much easier to hack the small webshop first and then get access to your Twitter account, rather than directly hacking Twitter.

Hashes

A hash function is a basic cryptographic function. There are many algorithms out there, such as MD5, SHA1, SHA256, Whirlpool, etc. (though many like MD5 or SHA1 are not considered to be secure anymore). Hash functions create a “fingerprint” from the input so that there is only one way to revert them: by trying out each and every possible input.

There is a very, very low chance that for a given hash, you will be able to find the original string. To represent how small it is, here is an example: let’s say you choose one random atom in the universe. Anywhere, a billion lightyears away, near, far — it could be any atom really. Then let’s say, you spin your laser pointer while standing on Earth, and randomly point to the sky. It is statistically much more likely that you will successfully point at your selected atom than it is to find one good input by just trying.

Let’s see it in action:

SHA256(“apple”)= 3a7bd3e2360a3d29eea436fcfb7e44c735d117c42d1c1835420b6b9942dd4f1bSHA256(“apple1”)= 6f677963023a2ed99caf05f73ef9797d34022bca02970a2bd98c00366c4b1aa4

As you can see, the output completely changed even though I only altered one character from the original input. That is also a very important property of hashes — the output pseudo-randomly changes as the input changes.

A lot of people then thought, “OK, we can prevent password leakage by storing the passwords by hashing it”. “Hashes are secure, impossible to decrypt”.

Unfortunately, there’s a problem with that too. Search the hash of the word “apple” in Google, or click here. As you can see, Google finds pages where it decrypts the hash to the original “apple”.

Google search for the SHA256 value of “apple”

Hash functions are strong and practically impossible to decrypt only if the variation of the input is huge. But in practice, the potential input variation is small. Let’s take English words for instance. It is easy to find the original input by just simply going through all the words, hashing them, and checking if the outcome is good. You can also store these pairs in a database, and there you go — a quick way to decrypt a few hashes. Storing all variations would require more megabytes of storage than the number of atoms in the known universe — but storing only a subset is, in fact, doable.

There are many ways to crack a hashed password database, including the infamous rainbow table cracking. Even if a user chooses a relatively strong password, hashed password tables can be cracked easily. Never fear though; we are not doomed. These hacks can be prevented with some nifty cryptographic algorithms, but this topic deserves a separate article.

--

--

Istvan Lam
Tresorit Engineering

CEO and co-founder of Tresorit. I have cryptography engineer background, now dealing with mostly business and entrepreneurship.