The Mathematics of Bitcoin — SHA256
Part 4 of a series looking behind the scenes of the worlds most popular cryptocurrency
4 — SHA-256
A cryptographic hash function is a special type of function that takes an input string of a given length and converts it into an alphanumeric string of fixed length. In the case of Bitcoin, a “Message” is inputted, and a hash function, known as SHA-256 (Secure Hashing Algorithm 256), gives an output known as a “Hash” or “Message Digest”. This means that however long the string of data (limit of 2²⁵⁶- 1 bits), the output will always be 256-bits in length. The process of hashing is not a method of encryption as it is only a one- way process and therefore cannot be reversed (decrypted). By running multiple outputs through SHA-256, we can see how different the output becomes, even when only changing a single character in the message. We can also see that despite having an input of longer length, the output length is the exact same (table 5.1). SHA-256 is also deterministic, meaning given the same input, the output will always remain the same.
To demonstrate how SHA-256 computes a message digest, I will be using the phrase ‘portsmouth’ (my old university), showing each step of the algorithm.
4.1 — Preprocessing Phase
To begin the hashing process, we take our message and convert it into binary using the ASCII (American Standard Code for Information Interchange) as shown below.
As we can see each letter is 8-bits long. We then make one long string from these and a ‘1’ is then added to the end:
We then find the phrase size which in this case is 80-bits (10 letters, each 8-bits long). The phrase size is then also converted to binary:
80 = 01010000
Now we take the phrase size and add it to the end of our message. In between these, we pad the code with zeros to get the block to the correct size (512-bits). If the message is larger than 512 bits, more blocks are created, and we run these through the process independently with the phrase size at the end of the last block.
The block is then divided into 16 words of 32-bits each:
We call this first block: block 1 = 𝑀(1).
When referring to a specific word in a certain block we use 𝑀(𝑖)(b) where b is the block number and 𝑖 refers to the specific word in the block.
𝑀(0)(3) = The 1st word in the 3rd block
where (𝑖 = 0, 1, 2,…, 15)
4.2 — Initial Hash Values
Prior to computing the remaining blocks, we must define the initial hash value (𝐻⁰). The initial hash value consists of 8 words, each 32-bits in length. For SHA-256 these are calculated from the first 8 primes. These always remain the same for any message. The primes are firstly square rooted and then taken to the modulus 1. The result is then multiplied by 16⁸ and rounded down to the nearest integer.
We convert the results from above into hexadecimal form; giving us the following 8 words assigned from a-h:
4.3 — Hash Computation Phase
Up until now, all the working has been part of the preprocessing phase. From now, we will look at the hash computation phase. In this phase, we will be using data from the preprocessing phase to compute an additional 48 words, totalling the number of words to 64. This then gives us 4 blocks in total. As calculating 48 more words would be too long, I will demonstrate the computation by computing the 17th word (𝑊(16) ). This is computed using the following equation:
Where for the 17th word (Just the t value changes for each word):
We start by finding the word 15 places back (𝑊 (1)). We then make two copies of this. Firstly, we take the first copy and right rotate by 7 places (𝑅𝑂𝑇𝑅⁷(𝑥)). This means each digit moves to the right one place 7 times and when the numbers fall off the end, they are moved to the front:
𝑾(𝟏) = 01110011011011010110111101110101
With the second copy, we right rotate by 18 ( 𝑅𝑂𝑇𝑅¹⁸(𝑥)):
Lastly, we take the original and right shift it by 3 (𝑆𝐻𝑅³(𝑥)). This means when a number falls off the end it is replaced by zeros at the beginning of the word:
We then take these 3 words and apply:
The bitwise exclusive-OR operation (XOR ⨁) takes two binary digits and returns 0 if both digits are 0 or 1 and returns 1 otherwise. As we have 3 words, we first use the XOR operation on 𝑅𝑂𝑇𝑅⁷(𝑥) and 𝑅𝑂𝑇𝑅¹⁸(𝑥) and then take this result and use the XOR operation with 𝑆𝐻𝑅³(𝑥).
Therefore, concluding that:
Converting this result into hexadecimal form we have:
We now take the word 2 places back (𝑊(14) ) and make 2 copies. We then take the first copy and right rotate 17 places (𝑅𝑂𝑇𝑅¹⁷(𝑥)):
𝑾(𝟏𝟒) = 00000000000000000000000000000000
Now, we take the 2nd copy and rotate right 19 places ( 𝑅𝑂𝑇𝑅¹⁹(𝑥)):
Lastly, we take the original and right shift by 10 (𝑆𝐻𝑅¹⁰(𝑥)):
We then take these 3 words and apply:
Therefore, concluding that:
As the whole word consists of only zeros, the hexadecimal value is 0. We then refer to our next word formula, finding the word 16 places back (𝑊 (0)) and the word 7 places back (𝑊 (9)).
𝑊(0) = 01110000011011110111001001110100 0
𝑊(9) = 00000000000000000000000000000000
We can now calculate 𝑊 (16). For this, we use binary addition (+) which follows the rules shown below:
The first column is ignored to maintain the 32-bit format. Now we have our 17th word (𝑊(16)):
𝑾𝟏𝟔 = 𝟎𝟎𝟏𝟎𝟏𝟏𝟏𝟏𝟏𝟏𝟎𝟎𝟎𝟏𝟎𝟏𝟏𝟎𝟎𝟏𝟏𝟏𝟏𝟎𝟎𝟏𝟎𝟏𝟏𝟏𝟏𝟏
Lastly, we convert this from binary into hexadecimal form:
This process is then repeated until we are left with 64 words. Computing the remaining words, we are left with the words shown below:
To compute the final hash, we must run 64 iterations of the equation below:
Using our 64 words (𝑊 ) from the table above and the initial hash values (𝒂 𝒕𝒐 𝒉), where:
(¬) is the NOT operation that returns 0 if the digit is 1 and 1 if the digit is 0. We also use (∧) which is the AND operation, this returns 1 if both digits are 1 and 0 otherwise.
Example: Computing 𝑡(0)
To compute 𝑡(0) we must compute our new 𝒂 𝒕𝒐 𝒉 hash values. Starting with a, where:
𝑒 = 510𝑒527 = 01010001000011100101001001111111
𝑓 = 9𝑏05688𝑐 = 10011011000001010110100010001100
𝑔 = 1𝑓83𝑑9𝑎𝑏 = 00011111100000111101100110101011
h = 5𝑏𝑒0𝑐𝑑19 = 01011011111000001100110100011001
Referring back to the final iteration formula we can now calculate T(1):
𝑻𝟏 = 01100011111001110101111111011100
𝑎 = 6𝑎09𝑒667 = 01101010000010011110011001100111
𝑏 = 𝑏𝑏67𝑎𝑒85 = 10111011011001111010111010000101
𝑐 = 3𝑐6𝑒𝑓372 = 00111100011011101111001101110010
We can now calculate 𝑇(2) :
𝑻𝟐 = 00001000100100001001101011100101
Now we can calculate 𝒂 from 𝑇(1) and 𝑇(2):
𝒂 = 01101100011101111111101011000001
Converting to hexadecimal, we have:
𝑎 = 6c77fac1
Referring back to the iteration algorithm, we can also now calculate 𝑒:
𝑑 = 𝑎54𝑓𝑓53𝑎 = 10100101010011111111010100111010
𝑇(1) = 01100011111001110101111111011100
𝑒 = 00001001001101110101010100010110
Converting to hexadecimal we have:
𝑒 = 09375516
Referring back to the iteration algorithm we also find:
𝑏 = 𝑎 = 6𝑎09𝑒667
𝑐 = 𝑏 = 𝑏𝑏67𝑎𝑒85
𝑑 = 𝑐 = 3𝑐6𝑒𝑓372
𝑓 = 𝑒 = 510𝑒527𝑓
𝑔 = 𝑓 = 9𝑏05688𝑐
h = 𝑔 = 1𝑓83𝑑9𝑎𝑏
Leaving us with the final values for the first iteration 𝑡(0):
We then perform an additional 63 iterations of the algorithm, using our 𝒂-𝒉 values from the previous iteration, updating every round. This leaves us with the 64th iteration 𝑡(63):
Using these values, we compute the ith intermediate hash value 𝐻(i) given by the following:
Where a-h are the values from our final iteration (𝑡(63)) and 𝐻(0)(0) to 𝐻(7)(0) are the initial hash values.
Finally, we can convert the intermediate hash values into hexadecimal form, giving us our parts for the final message digest:
Putting these values together in the following layout, we now have our final message digest (𝑀):
As noted at the beginning of the chapter, we can see that from our initial message ‘portsmouth’ there is no resemblance to the message digest. The long process of SHA-256 is necessary for retaining the security of Bitcoin, ensuring there is no possible way of reversing the process. If a method of reversal were possible, attackers would have the ability to alter transactions on the blockchain to their choosing.
And that concludes the hashing process. If you were wondering how I was able to get all of these results (I’m afraid I didn’t calculate every iteration myself) or wanted to try it yourself, I’d recommend checking out this google sheets document of the hashing process by David Rabahy.
If you enjoyed this series or had any burning questions, please don’t hesitate to drop a comment below.