The cooking behind Hashes

Tom Shnaider
Coinmonks
6 min readJul 4, 2022

--

After explaining what ECDSA and SHA256 are used for in Blockchain last week — respectively create pair of keys and create digital fingerprints, let’s talk a bit about the how.

We’ll focus on the creation of a Hash, for more details about elliptic curves you’ll be better off watching this video or this one for maths lovers.

Alexandre Debiève — https://unsplash.com/@alexkixa

In blockchain technology, Hashing is used for 3 main actions:

  • Summarise and represent your public key
  • Summarise and represent a previous block
  • Mine blocks by finding a Hash that fulfills some conditions

Different hashing algorithms can be used, we’ll focus on the one used in Bitcoin and other cryptocurrencies: SHA256. To understand how it works, let’s start with a refresh about binary code.

New to trading? Try crypto trading bots or copy trading

Bits

Bits stands for “Binary Digits”, 0 or 1.

The idea is to represent values in the form of an electrical signal that simply is or isn’t.

Computers work with what’s called “logical doors”, they’re like light switches: either on or off.

If the bit is 0 → no zap, it it’s a 1 a small resistor gets a zap. If you put them together you can create more complex signals. A little bit like morse code but only with dots and nothing.

Let’s start with the easy 4 bits language: _ _ _ _

4 bits means that we have 4 slots in which we can have either a 1 or a 0.

And as a reminder of your early maths classes, to compute the number of combinations you can make with 4 slots and 2 variable : it’s 2^4 = 16.

0000|0001|0010|0100|1000|0011|0101|1001|0110|0111|.…|1111

We only need 9 of these combinations to represent any number in the universe right ? A tenth for the 0.

So we have 6 slots left and some people decided to represent the 6 first letter of the alphabet: a | b | c | d | e | f | (in case you had a doubt).

And this my friends is the Hexadecimal code that you can represent with 4 bits ; Numbers and letters from a to f.

https://suchprogramming.com/beginning-logic-design-part-2/

Exponential growth

As you know, representing 16 values in computer is cool but we need a lot more, otherwise how would you encode that mysterious emoji to enrich your texts ?

Remember when we we got that little “?” instead of an emoji or character ? That was because your computer didn’t have enough room to store other values on its 16 or 32 bits processor.

So instead of 4 bits we could try:

  • 8 bits → 2^8 = 256 combinations or very retro gaming
  • 16 bits → 2^16 = 65'535 combinations
  • 32 bits → 2^32 = 4 294 967 295 combinations
  • 256 bits → 2^256 = 115'792'089'237'316'195'423'570'985'008'687'907'853'269'984'665'640'564'039'457'584'007'913'129'639'936 combinations

You get the idea, when we start computing numbers that are represented with 256 slots of 0 and 1 we get amazingly large strings of 0 and 1 to work with.

Deconstruction

So every number or text can be broken down into it binary form.

The SHA256 works this way: once it has broken down the data to its binary form it applies easy maths on it, it applies different functions : some move bits left or right, other compare bits to output another bit, another function add rows of bits and others are a combination of all these.

Shift values to the right, for example 0011 becomes 0001 (a 1 gets lots):

Here we see a shift of 32 on a 32 bits word, so all numbers are replaced by a 0

Rotate values to the right: 0011 become 1001 (the 1 comes back from the left):

Here we see a rotation of 32 on a 32 bits word, so all numbers come back to their initial position

XOR function, which is a comparison between 2 lines of bits: if only 1 of the two compared bits is a 1 the result of the function is 1, otherwise two 0 or two 1 outputs a 0:

Here we see a double XOR, the result between the first two rows is implicitly compared to the third one to get the end result

Add rows between themselves. The rule of binary addition are simple:

  • 0 + 0 = 0
  • 0 + 1 = 1
  • 1 + 1 = 10 | take 1 to add in the left column

Combined movements can look like the function sigma(x) hereafter: from the input x we apply a rotation right of 7, plus another one of 18, a shift right of 3 and and two XOR function on the last two rows to get the end result at the bottom.

Two other conditional functions are :

  • Choice — when comparing 3 rows x,y,z : if x = 0 you take the value in z and if x = 1 you take the value in y as the output
  • Majority — when comparing 3 rows : if there’s a majority of 0 or 1 that’s the output.

To add some randomness into all that, a bunch of constant are created: more precisely two sets of 64 numbers: cubic and square roots of prime numbers, from these numbers you get a real number, from this real number you keep the decimal part as an integer and from this integer you keep it binary form.

Padding

One very cool aspect of SHA256 is that it can take inputs of any size and always give back a 256 bits block.

It does so in two ways: if the message is smaller than 256 bits it adds enough zeros to fill up the free space in the bloc. Otherwise it breaks up the message in multiple 256 bits blocs, then applies the algorithm to the blocs one after the other.

Compression

After all these preparation steps (deconstructing the values, creating constants etc.) we get to the good part: the compression of the message.

To get from H0 to H1, or from your original data to your Hash, or your first hash to your second, etc.

You mix 32 bit Words W that have been created from the 256 bits initial bloc. With 32 bit Constants K that have been created from roots of prime numbers. By applying two functions T1 and T2 that are a combination the permutation function we saw earlier and Choices / Majority with additions.

The 256 bits bloc is separated into eight 32 bit words called a,b,c,d,e,f,g,h. Because 32 x 8 = 256.

After applying all the functions to all the words, you get a happy mess of 0 and 1 that is as different from your original message as possible.

But most importantly, this algorithm is deterministic, meaning that even if the end result looks random, if you put the same input it will always render the same output.

Final step

Once al the 8 final 32 bit values are computed, they are broken down into eight 4 bit words and represented in Hexadecimal code — remember earlier ?

Then, you just have to put them all together side by side. And this is how you get from a public key to a wallet address, from a whole bloc to a representative Hash or from a bloc + a random number to a Hash that wins you the right to mine a bloc.

Hope this helped you understand the idea of Hashing a bit better.

Thank you for reading and big thanks to the creator of the video from which the screenshots are taken.

Take care.

--

--