Hash(ing) Everywhere — A Primer On Hash Functions

Till Antonio Mahler
blockwhat?
Published in
9 min readDec 20, 2018
Photo by Nick Hillier

Hello there avid reader, great to have you back here at blockwhat?.

In this post we will explore one of the essential building blocks (pun intended) of blockchain technology — hashing. This concept is of utmost importance to many of the different technological aspects, that together power blockchains under their hood.

We will start by getting to know the creator of hash functions, grasping what exactly they are and then understanding the important part they’re playing in our daily lives. Afterwards we’ll dive into fancy so-called cryptographic hash functions, explore what makes them so special and look into the many ways they are incorporated into blockchains.

It’s a super duper interesting topic and by the end of this post you’ll be a hash pro — are you ready?

Then let’s get going!

The Basics

Source

This man, who is arduously studying a staple of sheets that came out of what appears to be a printer right out of the stone age, has played an incredibly important role in the history of computer science — his ideas and his work are still playing a huge role in our day to day lives.

His name is Hans Peter Luhn and he was born on July 1st 1896 (yes, you’ve read that right).

During his early days everything pointed toward him learning the trade of a printer and taking over his father’s printing business. Yet, instead the first world war led to him becoming a communication engineer for the German army.

Once the war was over, Luhn entered the textile industry (where he invented the Lunometer, which is still being used today) and emigrated to the United States. He was always a very curious spirit, who continuously looked for innovative ways to solve the problems he encountered — this attitude got him granted over 80 patents throughout his lifespan!

It was only by pure chance that Luhn got into the computer field in 1947, which back then was characterized by the early day computers who overwhelmingly used punch cards to execute computations.

Working for IBM, he played a essential role in developing information retrieval and storage solutions for libraries and documentation centers, and pioneered the use of data processing equipment in resolving these problems.

While at IBM, he was also responsible for single handly coming up with the concept of business intelligence and creating one of the most widely used algorithms up to this day — the so-called Luhn algorithm. This algorithm is used all over the place, for checking credit card numers, social security numbers, etc. The Luhn Algorithm protects against accidental errors, but not malicious attacks — if you want to read more about this fascinating algorithm, be my guest.

In January 1953, Luhn was the first person to have come up with the concept of hashing — as he was looking for a convenient and easy technique to search and validate data.

So what exactly is hashing?

It turns out that if broken down, it’s pretty straightforward. A hashing algorithm takes an input (this can be literally anything), runs it through some magic mathematical processes and then creates a unique output (called a hash)

Source

The unique thing about this process is that they use so-called one way functions. It is very easy to compute the hash from the given input, but infeasible to know the original input from the resulting hash (also known as message digest).

There are a lot of different hashing algorithms and they all work somewhat differently — but most of them (there are some exceptions) have one thing in common. No matter how long (all the Harry Potter books) or short (just one word) the inputs, the results will always be displayed as the same fixed quantity of numbers. What makes hashing algorithms so incredibly useful, is the property that the slightest change in the input completely alters the output.

Source

This property is what makes hashing algorithms so incredibly helpful — because they can be used to validate data and detect even the slightest change to them.

Hashing algorithms nowadays are essential for many applications such as online payments, textual tools, cloud services, data-intensive research and cryptography among numerous other uses.

Especially one use cases sticks out — almost every time you’re entering a password somewhere in the world wide web, the websites runs your input (e.g. “Password123”) through a hashing algorithm (most likely RSA) and compares it with the stored hash that it has safed in their own database. If the hash matches the one they have stored, then you’re good to go and can log in!

This leads to the interesting fact, that even though you hear about hacks were passwords were stolen, this doesn’t mean that the hackers actually don’t know right away what the password is — since most likely the company had only stored the corresponding hashes. Yet this is so salvation for bad and easy passwords, since hackers can access huge databases that have a lot of hashes with their corresponding “real” passwords.

So instead of the need to take random passwords, hashing them and hoping to find a match, they can no simply look the hashes up — and if you happened to use a very easy and generic one (such as “Password123), it’s almost guaranteed to be in their databases. These databases are also known as Rainbow Tablesclick here to read more about this.

Photo by CMDR Shane

Now you might ask yourself what the big difference between hashing and encryption is?

Well, it turns out that hashing is only meant as a one way function, while encryption serves a two way purpose — it can be reversed, or decrypted, using a specific key.

To sum this first section up, we’ve learned that hashing algorithms are used to create unique hashes, with which you can detect the slightest altering of the data behind it — or find corresponding matches without needing to know what they actually consist of.

Next up we will explore the real badasses of this game, so-called cryptographic hash functions.

Cryptographic Hash Functions

Photo by Felix Mittermeier

Cryptographic hash function are a special class of hash functions that at the first impression sound quite intimidating, but they are actually quite easy to understand. These special hash functions serve a very important purpose in the field of cryptography and are widely used in online payments, the HTTPS protocol and , you guessed it, blockchain technology.

There are a couple of characteristics that these special class of hash functions all have in common and that make them extremely useful for cryptographic purposes.

First of all, they need to be deterministic. This means that for the same input you’ll always the same output, even if you run it a googol of times. It’s pretty much a no-brainer, since the whole purpose of hashes would be lost, if they’d change the output for the same input.

Furthermore, they need to be computationally efficient. Computers shouldn’t need forever to calculate the hash from a given input — fortunately with today’s powerful computers, this isn’t really a big problem.

Another characteristic that they need to possess is known as pre-image resistance. This fancy sounding term relates to the fact, that it should be infeasible, not impossible though (you’ll see later why this is an important distinction!), to guess the input through the output. In other words, the hash function should be resistant to your efforts of guessing what the original input (for example a picture file) is.

Every change in the inputs should also change the resulting hash, this property is extremely important for the purpose of making records immutable. This property is also known as the avalanche effect.

The fifth characteristic is collision resistance. This aspect refers to the need that it should be infeasible for two inputs to have the same output. Technically this could happen, since every hashing algorithm can only produce a specific amount of hashes — in the case of a 128 bit long string, this total amount of possible hashes is 2128or, written out, 3,402,823,669,209,384,634,633,746,074,300,000,000,000,000,000,000,000,000,000,000,000,000 possible combinations. You have better luck finding a particular grain of sand in the Sahara.

Last but not least, they need to be puzzle friendly. Remember that we stressed the importance between impossible and infeasible above in the part about pre-image resistance? Well, this characteristic is of utmost importance for a special type of consensus mechanism, known as Proof-of-Work. Computers who take part in the validation and consensus process of certain blockchains, such as Bitcoin or Ethereum for example, need to solve a cryptographic puzzle — more on this in an upcoming article though. For now it’s simply important to understand that, given a certain amount of time and sufficient computations performed, the original input can be guessed.

So much to the essential characteristics that these cryptographic hash functions need to possess. There are a couple of different popular cryptographic hash functions that see widespread usage in the blockchain context, namely SHA256 (Secure Hashing Algorithm) for Bitcoin and Keccak256 in the case of Ethereum.

Now let’s see how these cryptographic hash functions are actually used in the context of blockchains.

Applications in Blockchains

Photo by Kaley Dykstra

In this section we will explore in what ways hash functions play a role in blockchain technology. Before we begin, it’s very important to remember and to recall the key characteristics that makes blockchains so unique — their reliability and integrity of the data that is agreed upon.

This reliability and integrity of blockchains is rooted in there being no chance of any fraudulent data or transactions, such as a double spend, being accepted or recorded.

A lot of the things that we will take here will be covered in-depth in future articles — so for now a brief overview shall suffice.

One of the first aspects where hashes play a huge role are the addresses (where you either receive money or send money from), which are all derived using hashing functions.

They also play an important role in creating and defining cryptographic signatures, that help to validate transactions. (More on this in the post on public key encryption)

Another essential role of hashes can be found in the task of keeping track of transactions on the blockchain — as it is way easier to simply check for matching hashes, then to look for the exact specifics of a certain transaction.

As already hinted at briefly in the chapter before, hashing functions also play a crucial role in some consensus mechanisms, also known as crypto mining. Especially important in the context of Proof-of-Work.

In order to store, query and validate large amounts of data, a concept known as merkle trees is employed. This data is time-stamped and can be hashed for future reference. It makes the permanent data storage less bulky and simply more economical.

And last but not least, the hashrate is very important for adjusting the difficulty in the mining process — this simply refers to the overall amount of hashes produced by all the participating computers.

You’ve reached the end of this post — awesome!

In the upcoming article we will all of these aspects in greater detail, for now I hope to have given you a comprehensible and compact primer on hash functions.

Thank you very much for the time that you’ve invested in reading this post and please make sure to let me know if you have any comments, questions or constructive feedback for me!

Yours truly

Till

PS: If you’re looking for helpful and great resources to learn more about blockchain’s paradigm shifting technological potential, check out these awesome resources.

--

--

Till Antonio Mahler
blockwhat?

Technology enthusiast from Berlin. Lover of random yet mesmerizing knowledge. Curious about all aspects of life.