Photo by Muhammad Zaqy Al Fattah on Unsplash

Mastering Hash Functions in C: Unraveling the Mysteries of SHA-256 and MD5

Step-by-Step Guide to Implementing SHA-256 and MD5 in C

Jean-Baptiste Terrazzoni
Published in
6 min readJun 7, 2019

--

Ever wondered how your online transactions stay secure or how your passwords remain protected? The secret lies in the world of cryptographic hash functions. Today, we’re going to demystify two of the most widely used algorithms: MD5 and SHA-256. Buckle up, because we’re about to embark on a thrilling journey into the heart of digital security!

Not only will you gain a deep understanding of these algorithms, but you’ll also learn how to implement them yourself in C. And if you’re feeling adventurous, we’ve got a GitHub repository with a complete implementation waiting for you to explore. Ready to become a cryptography wizard? Let’s dive in!

🔗 Explore the magical realm of hash functions in our GitHub repository

🔐 Cryptographic Functions: The Unsung Heroes of Digital Security

Imagine cryptographic functions as the silent guardians of the digital world. They’re the vigilant sentinels that:

  • Ensure the integrity of your messages and files, making sure they haven’t been tampered with.
  • Verify passwords without ever storing them in plain text (sneaky, right?).
  • Create the foundation for “proof of work” in cryptocurrencies like Bitcoin and Ethereum.
  • Generate and verify digital signatures, the virtual equivalent of your handwritten signature.

These digital alchemists take a message of any size and transform it into a fixed-length string of characters, known as a hash. It’s like turning a novel into a unique fingerprint! For instance:

  • MD5 produces 128-bit hashes (16 characters in hexadecimal)
  • SHA-256 creates 256-bit hashes (32 characters in hexadecimal)
md5 hash translation

The Superpowers of Cryptographic Hash Functions

These functions possess some truly remarkable properties:

  1. Determinism: Like a loyal friend, they always produce the same output for the same input.
  2. Speed: They’re the Usain Bolt of algorithms, computing hashes quickly for inputs of any size.
  3. One-wayness: They’re master illusionists — you can’t determine the input from the hash.
  4. Collision resistance: It’s practically impossible to find two inputs that produce the same hash.
  5. Avalanche effect: A tiny change in input causes a dramatically different hash, like a butterfly effect for data.

These properties make hash functions the Swiss Army knife of cryptography, ready for a variety of security applications.

The Contenders: MD5 vs SHA-256

MD5: The Retired Veteran

MD5, once a stalwart in the cryptographic world, is now like a retired superhero. While it’s no longer considered secure for cryptographic purposes (due to some sneaky vulnerabilities), it still has its uses:

  • Checking file integrity against unintentional corruption
  • Non-security-critical applications where speed is more important than bulletproof security

Remember: MD5 should never be used for security-critical applications. It’s like using a plastic sword in a real duel — not a great idea!

SHA-256: The Modern Guardian

SHA-256, part of the SHA-2 family, is the cool younger cousin of MD5. It was developed to address the weaknesses of its predecessors (SHA-1 and SHA-0), which themselves were inspired by MD5. Talk about a family tree of algorithms!

SHA-256 is considered secure because:

  • It’s computationally infeasible to reverse-engineer the input from the hash.
  • It has strong collision resistance.

However, it’s not recommended for password storage in databases. Why? It’s a bit too quick for its own good, making it vulnerable to brute-force and rainbow table attacks. In the world of password hashing, slower can sometimes be safer!

The Inner Workings: A Peek Behind the Cryptographic Curtain

Step 1: Preparing the Stage (Formatting the Input)

Before the main show begins, we need to set the stage. The input message needs to be formatted to meet specific criteria:

  1. The message is divided into 512-bit chunks.
  2. A “1” bit is added at the end of the message.
  3. “0” bits are added to make the length a multiple of (512–64) bits.
  4. The original message length is stored in the last 64 bits.
The formatted message size must be a multiple of 512 bits

To calculate the formatted message length, we use this magical formula:

aligned = (nb + (X-1)) & ~(X-1)

This aligns our number ‘nb’ to a multiple of ‘X’ bytes. It’s like making sure everyone in a marching band is in perfect rows!

The Endianness Enigma: Little & Big Endian

Before we dive deeper, let’s talk about endianness — the order in which bytes are arranged in computer memory. It’s like the difference between reading a book left-to-right or right-to-left.

  • Little-endian: The least significant byte comes first.
  • Big-endian: The most significant byte comes first.

🔗 Dive deeper into the endianness rabbit hole

To convert between these two, we use a bit of binary magic:

This shuffling algorithm works by swapping bytes, starting with small groups and progressively expanding. It’s like a choreographed dance of bits!

bswap(uint64_t)

Step 2: The Grand Shuffle (Data Processing)

Now comes the main event! Both MD5 and SHA-256 divide the message into 512-bit chunks and perform a series of mathematical operations on them.

  • MD5 uses 4 32-bit buffers
  • SHA-256 uses 8 32-bit buffers

These buffers are like jugglers, constantly tossing and catching data bits in a mesmerizing performance. The exact steps differ between MD5 and SHA-256, but both involve a complex dance of bitwise operations, modular additions, and nonlinear functions.

For the brave souls who want to dive into the nitty-gritty details:

Or, if you prefer to see the algorithms in action, check out our implementations:

The Grand Finale: Building the Hash

After all the shuffling and juggling, we’re left with 4 (for MD5) or 8 (for SHA-256) buffers. These buffers are the raw ingredients for our final hash.

To create the hash, we simply concatenate these buffers in their hexadecimal form. It’s like assembling a puzzle where each piece is a part of the final secret code!

One last twist for MD5: the buffers are in little-endian format, so we need to flip them before printing. It’s like reading the last page of a book first!

Need a Helping Hand?

If you’re feeling overwhelmed by SHA-256, fear not! We’ve got a step-by-step guide that breaks down the process into manageable chunks:

Congratulations! You’ve just taken a deep dive into the fascinating world of cryptographic hash functions. From understanding their crucial role in digital security to unraveling the inner workings of MD5 and SHA-256, you’re now equipped with knowledge that puts you ahead of the curve.

Remember, the world of cryptography is vast and ever-evolving. This tutorial is just the beginning of your journey. Keep exploring, keep learning, and who knows? You might be the one to develop the next groundbreaking hash function!

If you’re itching to see these concepts in action, don’t forget to check out our GitHub repository. It’s packed with complete implementations that you can tinker with and learn from.

Happy hashing, and may your digital adventures be secure and exciting!

Hey there, app enthusiasts! 👋 Ready to dive into the world of game-changing apps? At jterrazz.com, I’m all about crafting useful applications and sharing the journey! Discover coding insights, self-improvement hacks, and sneak peeks of my latest projects (psst… my next app is all about leveling up in life! 🚀). Come along for the ride — you might just find the spark for your next big idea! 💡💻

--

--

Jean-Baptiste Terrazzoni
42 Stories

Crafting digital solutions that matter. Self-dev, decentralization & fintech insights. Join the journey to build a better future! 🚀 jterrazz.com