Blockchain Mechanics — Hash Functions

Technical overview of what hash functions are and why they are useful

Ciaran Mcveigh
blockchaintechnologies
5 min readJul 8, 2019

--

Hash functions are used extensively within blockchain technologies from ensuring the integrity of the entire blockchain to quickly verifying the validity of large data sets. This articles looks to explore the attributes of hash functions and how they relate to blockchains.

Hashing

A hash function is a function that takes and arbitrary amount of data and converts it into a set number of bits (256 bits for SHA-256) which can then be represented as hexidecimal string.

What does that sentence actually mean, well let me show you. Say I want to record some data, for example your certificate from university. Let’s say this is the data, “Bob Smith, Economics, 1st Class Honours”. If I put this through a hash function (SHA-256) this is what I get 41f93fab9e5178270a40cb5182e1e94ad1e4cd8fa4b0878dc372032fbd9c1e9b.

So why is this important, first let me show you what happens to the hash when I make a small change by adding a full stop to the previous example. “Bob Smith, Economics, 1st Class Honours.” yields 1dcbad0d283eadccd76d63f1a23d3100c5d0268f6c499be37eac91827a26fa7e which is a completely different hash. This means any change to the message (note this is at the bit level ie any bit flip) will result in a completely different hash. As a result we can use hashes to ensure the integrity of data. Since the hash is deterministic (it will always return the same output for a given input) we can be sure that the message has not changed if the resultant hash is the same.

While the data in the above example is only a sentence imagine if I had a fifty-page contract I needed to put on the blockchain, thats a lot of data I have to upload. Instead of me signing the contract I can sign the hash of the contract. Any change to the contract will result in a change to the hash which will invalidate the digital signature.

Blockchains ensure the integrity of the chain by linking the blocks together with the hash of the previous block in each new block. When each block is signed the previous block hash is included in that signature. If a block is changed that blocks hash will change, since the next block will reference that previous blocks hash the next blocks hash will also change. This will cascade upwards all the way to the current block meaning you can check the integrity of the entire blockchain through only the current block hash.

Hash Function Attributes

Hash functions must have certain attributes some of which I have already touched on. Firstly a hash function must be able to take an input of any size and convert it into a fixed size output. This process must be efficiently computable. This means hashing a message should not take a long time to compute for it to be useful in real applications.

There are 3 security attributes a hash function should have. Firstly it should be collision resistant. Hash functions produce a fixed size output and take an input of any size. Because of this there are an infinite number of inputs and only 2²⁵⁶ number of outputs (for SHA-256). As a result there are hash collisions, the question is can anyone find them. If no one can find two different values x and y (x != y) which yield the same hash (H(x) = H(y)) then the hash function is said to be collision resistant.

The next attribute is a hiding property, this means that given H(x) it is infeasible to determine what the value of x is. Since the value of x is sometimes from a set list, for example lets use exam results between 0 and 100. In this case it would be easy to determine which hash relates to which result by computing them ourselves since we only need to complete 100 computations. Because of this we need x to be chosen from a set that is large enough that the probability of x being any value is negligible.

To achieve achieve this we concatenate x with a random 256 bit value. The hiding property must now satisfy that given H(k|x) where k is the random 256 bit value and (k|x) is k concatenated with x, it is infeasible to determine x. The Hiding attribute is about stopping bad actors from accessing secrets.

The final attribute is puzzle friendliness. This attribute relates to forgery, given you are provided the values k and y in the equation H(k|x) = y find a value x. While this may sound similar to hiding there is a difference. In hiding k is not set therefore I am not constrained by it.

SHA-256

Since SHA-256 is the hash function used by bitcoin I thought it would be worth quickly explaining how this function works. First it takes the message and breaks it down into blocks of 512 bits. Because the message most likely isn’t a factor of 512 it also adds some padding to the final block. This padding consists of some number of zero bits followed by a one bit followed by a 64 bit length field. A random 256 bit value and the first block are put through a compression function to yield a new 256 bit value. This value is then combined with the next block and put through the compression function. This cycle continues all the way to the final block yielding the final 256 bit hash.

Hashing in Blockchains

As I already stated hashing enables us to ensure the integrity of a blockchain and verify huge amounts of data by checking a set length string rather than individually checking each byte of the data set. We can store the data off chain yet are still able to verify the data set was digitally signed. This a huge computational saving and enables the speed necessary for mass adoption of blockchain technology.

This article thanks Princeton University and the course Bitcoin and Cryptocurrencies for providing information on hash functions

--

--