What is a Hash?

… simply explained

What is a Hash?
A hash is result of taking in an arbitrary set of data/bits (be it a single character, an video, or the entire Library of Congress) and using an algorithm to transform the input into a fixed size output of data/bits, usually a hexadecimal number.

So, regardless of the amount of information that is placed into the hash, you will always get a unique hash of same length.

Anything goes in and fixed length hash comes out.

More in depth, this algorithm is a one way function that takes in data in one direction, and outputs a wildly unique string of hexadecimal numbers on the other. One change in the input data results in a wildly different output string.

It’s very unlikely that you can guess the output of a hash from its input. It’s even less likely that you can guess the input from the fixed size output.

The same data will always have the same unique hash. However, collusions are unavoidable when working with large data sets. When two inputs have the same hash, this results in a collision.

Not enough room for all the pigeons.

Why do collusions happen? This is explained by the pigeonhole principle. Simply explained, the pigeonhole principle is defined as trying to place n + 1 items (or pigeons) into n spaces (pigeonholes). Logically, there will be at least 1 instance where 2 items (or pigeons) are in the same space. Try doing this with 3 crumbled up pieces of paper (n + 1) and 2 waste baskets (n). You will find that at least one waste basket with two pieces of paper.

So, every hash has collusions. However, good hashing algorithms provide make it hard to find such collisions, also known as collision resistance. Collusion resistance does not mean the absence of collisions, just that they are hard to find.

Some people try to break hashing algorithms with brute force attacks, which is basically computer making really fast guesses. This type of attack is called a hash collusion attack. As computing gets faster and cheaper the risk of older hashes being successfully attacked becomes more likely, like those found by Google Researchers for the SHA-1 hashing algorithm in 2017.