If you’ve gotten sucked into the cryptocurrency rabbithole like me, you’ve probably heard of these guys pretty often — and for good reason. Cryptographic hash functions are essentially the backbone of data security within blockchain. As a result, it’s important for anyone thinking of jumping into the cryptocurrency frenzy to understand just how these algorithms will keep our money safe. Intimidated by that lengthy Wikipedia article full of words none of us can understand? Fear not! Here’s the breakdown you needed.
What’s a Cryptographic Hash Function?
Let’s start with the basics! Functions are essentially magic number machines: feed the machine a number, watch it perform a couple tasks that transform the number, and you get an output. Hash functions are no different, except you feed them data of any size and length you want, and the output (a.k.a your hash, or digest) is always only one fixed size no matter how big the input was. For example, the hash function used by Bitcoin is SHA-256, which has — you guessed it — a fixed hash size of 256 bits (bits are the smallest unit of data in a computer).
So what makes cryptographic hash functions so safe? Well, consider that since the input can be of any size, the number of possibilities of inputs is infinite.
As a basic example, think of all the different ways you can add any amount of single digit numbers up to make ten: 5 + 5; 3 + 7; 2 + 2 + 2 + 2 + 2…
There’s a lot, right?
Because of this property, it would be incredibly difficult to find the input of a function just by looking at the hash. On the other hand, we still want the hash function to be easy to verify. That’s why it should take very little time to compute the output given an input, but nearly impossible to compute the input given the output. Just like how adding the numbers to make ten is easy, but knowing which string of numbers was originally fed into the sum machine would be a lot harder!
The Must-Haves for Good Cryptographic Hashes
We’ve already talked about this, but for a cryptographic hash function to be good at what it does, it needs to be easy to compute (for a computer, at least). Meaning, if you know the input already, it should be a breeze to calculate the output. Think about it like this: if you already have the key to your front door, you shouldn’t have to put in much effort beyond inserting the key and turning it to get into your home. On the other hand, getting into your home without your key should remain extremely difficult.
We’re not talking about car crashes here: the phrase “collision resistance” actually refers to collisions between two inputs. Contrary to my previous example of adding numbers up to 10, we actually don’t want to have more than one possible input for any given output, because that would make your bitcoins very vulnerable to theft. For example, if we had input APPLE and input ORANGE, we don’t want the both of them to collide and have the output FRUIT PUNCH.
But wait, you’re thinking. If the possibilities of inputs are infinite, and the output can only be a fixed length, isn’t it obvious that there will eventually be two inputs that make the same output?
Well, you’re right. It’s basically impossible to have a function that’s completely collision-free, but let’s take out the example of the SHA-256 Bitcoin hash again. If a computer tried 2¹⁵⁰ randomly chosen inputs, there’s a 99.8% chance that the computer will find a collision of two inputs for this hash function. But that means the computer would have to check a random input 2¹⁵⁰ times! There are forty-six digits in that number. The whole process would take way too long, and can’t be done by any regular computer anyway.
So the moral of the story is, as long as it’s really really difficult to find a collision, we can say the function’s collision-free and just call it a day.
What the hiding property means is this: given the hash, it should be impossible to find the input (also called a message). But what this entails is that the input needs to be chosen from a very large spread of possibilities, and the function needs to hide any information (clues) about the input from the output. In other words, just by examining the output, it should be completely impossible to determine the length of the input, whether it was an even or odd number, whether it’s divisible by three… you get the deal. This is much like sealing a letter in an opaque envelope, and the only person having the power to open the envelope is the one with the “key”, or input. No matter how you peer at the envelope, it’s impossible to decipher anything written on the letter because it’s completely opaque.
That’s your breakdown on cryptographic hash functions: they’re essentially just functions with an output of fixed length that are easy to compute but hard to crack. I hope you came out of this article understanding a little bit more about them than you did before reading! If you have any feedback or questions, feel free to drop them in the comment box.