A Gentle Explanation of Hash Functions

How to crack passwords and speed up your code, all with the power of hashing.

Jasmine Webb
Aug 20, 2020 · 7 min read
Image for post
Image for post

Hashing is an important topic for programmers and computer science students to be familiar with. This article is specifically targeted to students, and programmers with a few months to a year of coding experience.

What Hashing Is

Hashing: generating a value or values from a string using a mathematical function

Hashes are mostly used for three things:

  1. Storing stuff without actually knowing what it is
  2. As a convenient way to remember where you put something
  3. To make sure the thing you received is the thing you wanted

That’s super confusing but bear with me.

How it works

Hashing is otherwise described as doing a non-reversible operation on a thing that turns it into a completely different thing but would turn into the same thing if you did it again with the same input.

It’s a bit like hard boiled eggs. You can’t un-boil an egg, but you know what you’ll get out if you put a raw egg in some boiling water for 6-8 minutes. In much the same way, you can’t un-hash something.

Image for post
Image for post
Photo by Jason Leung on Unsplash

Here’s a sample of a very simple “hashing function” for integers: it divides a number by 10, then takes the remainder:

modulo10 (egg) {
return egg % 10
}

If egg=55 it will give me 5, but I have no way of turning 5 back into 55. For modulo10(), the numbers 9, 23950829, 309 and 29 will all turn up 9. We have an infinite number of values that could have gone through that hashing function and returned the same thing.

When two things have the same hash, it’s called a collision. In a cryptographic hashing function, it should be very improbable for two values to have the same hash.

There are two types of hashing functions which are used for different things. Fast ones and slow ones.

The fast ones are used for when you don’t care if anyone knows that 5 came from a 25. They’re used in a few data structures where you need to look stuff up really fast. An example is a hash table which is pretty neat (I wrote a whole article about them). Fast hashes are also used for verifying data integrity.

Data Integrity

Lets say I torrent a piece of software, the ISO for a Linux distro for example. I might be unsure if what I got is what I meant to download. I could have missed a piece in transit, it could be an older version, or someone may have tampered with it. Lucky for me, I can go to an authority and find a checksum which I can compare my file against. A checksum is the value the developers got when they hashed the file they released. Since I have the ability to hash the file I got in the exact same way and compare the two values, I can verify that I have the correct file.

You can also use slow hashes for data integrity but it’s not a huge deal if you used something too fast like MD5.

Passwords

Slow hashes are for when you need to keep whatever you hashed a secret. Because they’re slow and take a lot of computing power, they’re harder to ‘crack’ or figure out what their original value was. Slow hashes are perfect for passwords. This is why we talk about ‘cracking passwords’.

On some sites, when you enter a password, the site matches what you typed in with what it has on the server. However, it doesn’t actually know what your password is. When you sign up, the site generates a bit of random data (a salt), tacks it on to the password you chose, and puts it through a hashing function. It then stores the result of that hash and the salt it used.

When you want to use your password to log in again, it grabs the salt (which is usually kept in the same place as the password hash), does the same process again, and then compares the two results.

How to crack passwords

Remember, since it’s impossible to know for 100% sure what the original value of a hash was, we have to use our best guess. Most of the time, this involves using a list of common passwords and trying each of them against each hash. To do that you have to compute each one, so the slower the hash, the more expensive it will be for a hacker to guess passwords.

A salt is important too.

Lets say User1 and User2 both used pa$$word as their passwords. The MD5 hash for pa$$word is A61A78E492EE60C63ED8F2BB3A6A0072. Hackers already know what the hashes for all the top passwords are. In fact, you can even look up MD5 hashes on sites like crackstation.net. Additionally, if a password is less common, they can guess it once and then compromise the accounts of everyone else who used that password.

If I add a salt, then the hashes will be different. For example, using usernames as a salt (just an example, not a good idea in practice):

user1.pa$$word = 8CF41DEBA430F88EBC5DDA0936B3435B
user2.pa$$word = 5161758DEEF000FA5C190573574FAFB9 # <-- completely different hash

See? Completely different hashes. If we had used something other than MD5, those user accounts would be as safe as they can be (which is not very because ‘pa$$word’ is a terrible password).

Goodbye MD5

I used a pretty bad example of a slow hash. MD5 was originally designed to be good enough to use on passwords, and it was — up until around 2005. Now it is considered broken and unsafe to use — mostly because it’s too fast. Computers have gotten more powerful so we need stronger encryption. Some better alternatives nowadays are bcrypt and PBKDF2.

When Technology Moves Too Fast

Unfortunately, MD5 is still widely used. If you look at HaveIBeenPwned.com and search for ‘MD5’, lots of results come up from sites that were hacked long after 2005. Why haven’t companies moved away from this highly insecure method?

Part of the problem is that overhauling software, much like cracking secure passwords, can be time consuming and expensive. The other problem is the nature of hashing itself.

If you don’t actually know what anyone’s password is, you can’t just change the hashing method. Since you can’t turn a hash back into a password, you definitely can’t turn a hash into a different hash that works for the same password.

The best method to deal with this is to send out an email and force everyone to change their passwords. Users really hate this, so many companies have opted to re-hash passwords the next time the user logs in, but still support the old method until every password has been replaced. That’s why you’ll see MD5 on some sites which also used another method.

What Hashing Isn’t

Encoding and encryption are two things that may be confused for hashing. They all have one thing in common: they turn data into other data that looks different to a human.

Encrypting

Encryption is different from hashing because it allows you to turn encrypted data back into what it was originally: to decrypt it. To do this you need a special key.

Sometimes you might hear bloggers or tech writers say “passwords are encrypted”, this is not technically the case. Passwords should always be ‘hashed’ with one exception: when they are in transit between your keyboard and the program that hashes them.

Encoding

Students and novice programmers often confuse encoding for hashing or encrypting. This is not good because encoding, like encryption allows you to turn encoded data back into its original form — except you don’t need a key to do it at all. Anyone can decode encoded data provided they know what encoding it currently uses and originally used. Encoding data does not protect it from being seen by prying eyes.

An example is JWTs: JSON Web Tokens.

An example JWT looks like the following: not legible to a human unless you can convert Base64 in your head (I doubt anyone could do that for a string this long).

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

JWTs are pretty cool! However, students and newbies often look at them and think the data is secret because they can’t read it. In reality, JWTs are Base64Url encoded, not hashed or encrypted. This means anyone can read the first and second parts of them (in fact there’s a handy tool for it, try it out). The signature at the end is proof that it really came from where it claims to have come from. You can encrypt JWTs if you want, but they are readable by default.

Does this mean JWTs are insecure? No! This is by design. Just don’t put anything you don’t want the end user or a hacker to see in one.

Summary

Hashing is pretty cool. You can use it to:

  1. make hash tables that can store data in a way that makes it fast to retrieve
  2. store passwords in a way that keeps them super secret
  3. verify the integrity of data in case it was corrupted in transit or tampered with
  4. A whole bunch of other stuff I didn’t cover.

Hashing is not the same as encoding or encrypting and it’s important to understand the difference between these.

The Startup

Medium's largest active publication, followed by +773K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store