SHA-256 Under the Hood

Look inside the popular hash function and learn what makes it work so well.

The Secure Hash Algorithm — 256 is extremely popular & widely used in cryptocurrencies, digital security, and data integrity verification. (Image by Dalle3)

Introduction

SHA-256, or Secure Hash Algorithm 256-bit, is a cryptographic hash function designed by the National Security Agency (NSA) as a part of the SHA-2 family.

It transforms any input data into a unique 256-bit (32-byte) fixed-size hash, which serves as a digital fingerprint of the input. This algorithm is crucial for ensuring the integrity of data in various digital transactions and is celebrated for its security and resistance to tampering.

Q : What is a hash?
A : A hash is like a digital fingerprint for data: it takes any piece of information, like a message or a file, and turns it into a unique, fixed-size string of characters that represents only that specific data, ensuring security and integrity online.

Key Uses of SHA-256

The SHA-256 function is extremely popular and is commonly used in :

  1. Cryptocurrency Transactions: It secures blockchain technology by validating transactions and mining new coins in cryptocurrencies like Bitcoin.
  2. Digital Signatures: SHA-256 generates unique signatures that verify the authenticity of digital documents and communications.
  3. Data Integrity Verification: It ensures that files transferred over the internet remain unchanged from their original state.
  4. Secure Password Storage: Some websites and applications use SHA-256 to hash passwords, enhancing security by making them unreadable even if data breaches occur.
  5. SSL Certificates: It plays a vital role in the creation and authentication of SSL certificates, ensuring secure web browsing and transactions.

Understanding Cryptographic Hash Functions

Put in a sequence of characters and you get a fixed-length sequence that represents the input. (Image by Dalle3)

Cryptographic hash functions are like complex digital blending machines. You put in any message or file, and it churns out a unique, fixed-length sequence of letters and numbers. No matter how many times you process the same message, you always get the same sequence, but if you change even a tiny bit of the original message, the sequence changes drastically. This makes it easy to check if data has been altered, without revealing the original data itself.

Why is SHA-256 a great Cryptographic Hash Function?

SHA-256 is the gold standard. (Image by Dalle3)

SHA-256 stands out among other hash functions due to its robust security features: it generates unique, unpredictable hashes that make it virtually impossible to reverse-engineer the original data or find two different inputs that result in the same hash, making it a gold standard for ensuring data integrity and security in a wide range of digital applications.

The Mechanics of SHA-256

Any input into the SHA-256 function undergoes three main phases to create the output. These three phases are :

1. Preprocessing Phase

Quick Summary : In this initial step, the input data is prepared for the hashing process. The data is first padded to ensure its length is a multiple of 512 bits. Next, a block with the original data length (in bits) is appended, setting the stage for uniform processing in the subsequent phases.

2. Schedule Phase

Quick Summary : During the schedule phase, the preprocessed data is divided into 512-bit blocks. Each block is then expanded into a series of sixty-four 32-bit words. This expansion is designed to introduce complexity and ensure that minor changes in the input lead to significant differences in the output.

3. Compress Phase

Quick Summary : The heart of the algorithm where the actual hashing takes place. Here, the expanded block from the schedule phase is compressed in a series of operations that repeatedly mix and transform the data with the algorithm’s current state (initial hash values or the result of the previous block’s compression). This process is meticulously designed to be one-way, ensuring the hash cannot be reversed to reveal the original data.

Each phase plays a crucial role in ensuring that SHA-256 hashes are unique, secure, and tamper-evident, making the algorithm a cornerstone of modern cryptographic security.

SHA-256 in Action

Let’s take an input string and follow it every step of the way within the SHA-256 function, right until the output.

As an example, we’d take “PicKey” as our input string into this function. The SHA-256 output value of “PicKey” is :

0620d914f9a253c0603f95180619315ff694ba8eac1b04474d2b1c1bc3ca8219

In this section, we will go step-by-step on how exactly the SHA-256 function turn an input string of “PicKey” into this value.

Phase 1 — Preprocessing

In this phase, the string “PicKey” will be :

  1. Converted into its binary form,
  2. Padded with a ‘1’ bit,
  3. Padded with additional ‘0’ bits till the length reaches 448 bits,
  4. Padded with another 64 bits that are derived from the length of the string itself.

After all these steps during the Preprocessing phase, the padded bits will be ready for the next phase — Schedule.

Let us follow the input “PicKey” through all the steps in the Preprocessing phase.

Step 1 — Convert to it’s binary form

SHA-256 Input being converted to it’s binary form

Each character in the input is then converted to it’s binary form. Look at the conversion table here to derive that each character in “PicKey” will be replaced with their corresponding binary values :

P > 01010000
i > 01101001
c > 01100011
K > 01001011
e > 01100101
Y > 01111001

So “PicKey” turns into “010100000110100101100011010010110110010101111001” after being converted to binary. We will now use this binary representation for further processing.

Step 2— Pad a ‘1’ bit

Add a ‘1’ bit to the binary value (notice the ‘1’ flying in).

We now add a ‘1’ bit to this binary. This is part of the Merkle–Damgård construction that aims to create collision resistance by introducing boundary separation with subsequent padding steps and to enhance the uniqueness of the input.

Step 3— Pad ‘0’ bits for a total of 448 bits

Add ‘0’ bits till the total length reaches 448 bits.

In this step, we ensure that the total length of the bits is 448. This is done since the final output from the pre-processing phase has 512 bits, where the final 64 bits come from step 4. So before adding those 64 bits for a total of 512, we get :

512 - 64 = 448

Step 3 should produce a total of 448 bits so that the final step-4 adds the remaining 64 bits, making a total of 512 bits after preprocessing.

Why is this done?

  • To ensure a fixed-block size for processing into the next phases as SHA-256 requires messages to fit into 512-bit blocks into the next “schedule” phase.
  • Zero padding allows processing of any message length, ensuring even short messages can be hashed.
  • The padding scheme ensures each message is unique, avoiding identical padded messages for different inputs, crucial for collision resistance.

Step 4— Pad 64 bits from it’s length

The final 64 bits come from the length of the input itself, to make a total of 512 bits per block.

The final 64 bits are added to the 448 bits created in step 3. These bits are derived from the length of the input itself. This is done to ensure that even if two different inputs have the same content but different lengths, their resulting hashes will still be different.

Think of it like adding a “signature” of how long your message is at the very end of your message. This “signature” is just a precise count of how many bits were in your original message before any processing was done.

It’s like telling someone, “Here’s my note, and by the way, it was exactly this long.” This helps make the hash even more unique because now, the hash represents not just the content of the message but also its size, adding an extra layer of distinctiveness.

Output of Preprocessing

After steps 1, 2, 3 and 4 of the preprocessing phase, our input “PicKey” turns into this 512 bit long binary representation :

010100000110100101100011010010110110010101111001100000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000110000

This is the preprocessed value that is now ready for Phase 2 — Schedule.

Note : While the SHA-256 function can work for any length of input, to simplify it’s explanation, we take an example of an input that creates a single block of 512 bits during the pre-processing phase. For longer inputs, multiple such blocks are processed by this function.

Phase 2 — Schedule

This phase uses the preprocessed 512 bits and creates a message schedule for the final phase (phase 3, compress).

512 bits to 16 Words

In the SHA terminology, a “word” is a sequence of 32 bits. This means that the incoming 512 bits can be broken into 16 words, each with 32 bits.

This is how it looks :

The preprocessed 512 bit block is broken into 16 words, each with 32 bits in length.

Create Message Schedule

The message schedule is an array of 64 words. As before, each word will be 32 bits long.

To create the message schedule, we start with the 16 words in the table above, and fill in the remaining (17 through 64) words via this formula :

W[t]=σ1​(W[t−2])+W[t−7]+σ0​(W[t−15])+W[t−16]

Where the t’th word (W[t]) is computed using previous words (W[t-2], W[t-7] W[t-15], W[t-16]) as an input along with two special functions, namely σ1 and σ0.

To understand it simply, imagine you’re working on a complex puzzle, and to solve it, you’re given a series of steps to mix and match different pieces based on some specific rules. This equation is somewhat like a recipe for creating new pieces of the puzzle (W[t]) from the ones you already have (W[t-2], W[t-7] W[t-15], W[t-16]).

The σ0​ and σ1​ functions are like special mixers that shuffle the bits (0s and 1s in the words) in those pieces in a unique way.

  • σ0​: This involves rotating the bits to the left or right and then mixing them up a bit, ensuring that the data is scrambled in a predictable yet complex way.
σ0 in action, rotating and mixing the bits.
  • σ1​: Similar to σ0​, but it follows a different pattern of mixing, further adding to the complexity.
σ1 in action, rotating and mixing the bits, different to σ0 .

To learn more about the low-level details of these functions, consider looking at this visual explanation.

Following this formula, we can now create a list of 64 words (the message schedule), where the first 16 are taken directly from the preprocessed 512 bits broken into 32 bit words, and the remaining 17–64 words created from the previous words in the list (W[t-2], W[t-7] W[t-15], W[t-16]) and the special mixing functions ( σ0​ and σ1) :

The message schedule is created with 64 words.

After the message schedule creation, our input “PicKey” will be transformed into the following message schedule of 64 words, each word being 32 bits in lengh :

 1. 01010000011010010110001101001011     2. 01100101011110011000000000000000
3. 00000000000000000000000000000000 4. 00000000000000000000000000000000
5. 00000000000000000000000000000000 6. 00000000000000000000000000000000
7. 00000000000000000000000000000000 8. 00000000000000000000000000000000
9. 00000000000000000000000000000000 10. 00000000000000000000000000000000
11. 00000000000000000000000000000000 12. 00000000000000000000000000000000
13. 00000000000000000000000000000000 14. 00000000000000000000000000000000
15. 00000000000000000000000000000000 16. 00000000000000000000000000110000
17. 10111100110011110011110110101001 18. 01100101100101111000000000000000
19. 01111001010011101101101000110001 20. 00110000000110010101101110011001
21. 10110110010000001100000000111000 22. 10000110101100111011100001011001
23. 01111000001101101101111000001000 24. 01100111110101011001110100001010
25. 00011010011100101011111010101010 26. 11110110100011000110010010101000
27. 00111000101000000010111001110001 28. 01110101001111011000011011000110
29. 10011001101011000100101110101000 30. 11101011110111011001100101100000
31. 01110100011010001101001000000001 32. 00100101101000110111010111001010
33. 00010011001100101111011101000011 34. 01001101111001000000010111001010
35. 01110110010111101101110111111011 36. 10100011101111110111000110100111
37. 10100101010010110000110111000010 38. 11111010011000001111110100111110
39. 00000100110000100010101000001111 40. 11010100110111110000001011000100
41. 00001110101101001100000011100110 42. 10111100111101000001010110010001
43. 10111000110010000110001001101110 44. 11110011110010110110000011110010
45. 10001100010101100011010100110100 46. 00000101111000010010111110110001
47. 01110010111101010011100000101011 48. 00100000000110010001000010000001
49. 10100111100010010101001100001001 50. 00000000010111101101011000010110
51. 01110101010011111011110010000110 52. 00110100011100101101110101011101
53. 00110010111100100100011000011001 54. 00110110011100000101010110001110
55. 11100100001000011010100110110000 56. 10011010110101100000100111100000
57. 00100010011011100000001110011110 58. 11001011001000011111000010000101
59. 11010010100100011101010011110010 60. 11100010111111011000010101101100
61. 10111100111101011100100100000010 62. 01110010110101111011110101101001
63. 10101101101111001100111011011000 64. 10111111000010001000100000100010

Phase 3 — Compress

The final and arguably the most important phase of the SHA-256 function is the compress phase. Here the 64 word schedule is “compressed” into the final fixed length 256 length hexadecimal code that you see as the output.

The main steps involved in this step are :

  1. We start with eight hash variables — a, b, c, d, e, f, g and h and set their initial values with the square root of the first eight prime numbers (binary value).
  2. Then we start doing 64 rounds of compression. In each round, we calculate two temporary words — T1 and T2 as :
    T1 = Σ1(e) + Ch(e, f, g) + h + K[t] + W[t] and
    T2 = Σ0(a) + Maj(a, b, c)
    where W[t] is the t’th word in the message schedule, K[t] is a 64 word long constant, made by cube roots of prime numbers and Σ1, Σ0, Ch and Maj are special functions that mix the bits together.
  3. In each round of compression, we
    - replace the new value of h with the current value of g.
    - replace the new value of g with the current value of f.
    - replace the new value of f with the current value of e.
    - replace the new value of e with the current value of d + T1.
    - replace the new value of d with the current value of c.
    - replace the new value of c with the current value of b.
    - replace the new value of b with the current value of a.
    - replace the new value of a with the current value of T1 + T1.
  4. At the end of 64 rounds of “compression”, we have the final values of a, b, c, d, e, f, g, and h.
  5. We then convert these binary values to hexadecimal.
  6. Join these hexadecimal values to get the final output.

Let us see this in action. This is how compression looks under the hood, creating the final values of a, b, c, d, e, f and g :

All of the 64 rounds of compression

The final values of a, b, c, d, e, f and g are then added with their initial values (square root of first eight primes — H0, H1, H2, H3, H4, H5, H6, H7 and H8, in binary). We then convert the added value into it’s hexadecimal representation.

Adding values of a, b, c, d, e, f and g are then added with their initial values and turning the result to hexadecimal.

The final Values created after this addition are :

a + H0 = 00000110001000001101100100010100  ( 0620d914 in hexadecimal )
b + H1 = 11111001101000100101001111000000 ( f9a253c0 in hexadecimal )
c + H2 = 01100000001111111001010100011000 ( 603f9518 in hexadecimal )
d + H3 = 00000110000110010011000101011111 ( 0619315f in hexadecimal )
e + H4 = 11110110100101001011101010001110 ( f694ba8e in hexadecimal )
f + H5 = 10101100000110110000010001000111 ( ac1b0447 in hexadecimal )
g + H6 = 01001101001010110001110000011011 ( 4d2b1c1b in hexadecimal )
h + H7 = 11000011110010101000001000011001 ( c3ca8219 in hexadecimal )

In the end, we simply join these hexadecimal values to get our final output :

This is the final result of the SHA-256 hash of the input “PicKey”. The SHA-256 function will always create this exact value for the given input.

Why SHA-256 Matters

In the digital realm where data flows endlessly, SHA-256 stands as a guardian, ensuring every bit of information remains untainted and true. It’s the backbone of digital trust, securing transactions, and safeguarding identities with unparalleled reliability.

To the casual internet user, it’s the invisible shield that protects the digital landscape. For technologists, it represents a pinnacle of cryptographic achievement, blending complexity and security seamlessly. SHA-256 matters because it embodies the essence of digital integrity and security, making our online world a safer place for everyone.

What about SHA-3?

Even with the arrival of the newer SHA-3, SHA-256 remains strong. (Image by Dalle3)

While SHA-256 has been the stalwart of cryptographic security, SHA-3 emerges as its evolutionary successor, building on a foundation of trust and robustness. Both share the critical mission of safeguarding digital information, but SHA-3 introduces a new hashing algorithm, Keccak, offering a different approach to data integrity and security. Despite the advent of SHA-3, SHA-256 remains highly esteemed for its proven reliability and widespread adoption across various technologies. SHA-3, designed to complement rather than replace SHA-2, adds diversity to the cryptographic toolkit, enhancing resilience against future threats. Ultimately, SHA-256 continues to stand strong, a testament to its enduring effectiveness in a landscape enriched by SHA-3’s advanced capabilities.

Conclusion

In the fast-changing world of online security, SHA-256 stands strong as a reliable protector of our digital lives. It shows just how powerful and important good security can be, keeping our data safe and sound.

Here are a few great resources that may help you learn more about it :

References

--

--

PicKey.ai - Visual Password Manager

Transforming cherished memories into robust digital keys with Computer Vision and 3D. Experience a personal touch in password security with innovation and ease.