Blockchain Ep02: Unboxing the Block

Alabhya Mishra
Coinmonks
8 min readNov 20, 2021

--

In my previous post, we created a very primitive chain of blocks. But we also defined a very simplistic structure of the block— in fact, without that structure we wouldn’t have made very far. And today, we will put this structure under a microscope and reveal all the magic happening within. In fact, we will attempt to build an almost real Block, from scratch.

A primitive block

Zooming Into the Address

Like every house on your street, each block too will have its own uniquely created address. And like the universal way of writing down your home address, we will also create a universal way of denoting the block address in our chain.

The Address System — with a real world analogy.

Current Block ID

Just like the house number, uniquely identifies each individual house (or apartment units), the Current Block ID is a unique address for each block. We are going to purposefully ignore how this ID is assigned, because we are not yet equipped with the protocol to do that. We are just going to believe that there exists a way to assign a very unique ID.

Previous Block ID(s)

This is where the real world example isn’t quite similar to the block version. The chain is a single street universe, which is why it doesn’t need the entire (Street Name, County, City, State, Country, Postal Code) set like in the real world. So all we need on the chain is a landmark, the neighbors will do. We are right next to the 2-storeyed red brick house.

Streets in the real world (left) vs ‘one lane’ chain (right)

But our chain requires all preceding blocks’ address IDs be stored in the right order. And this becomes a problem as the chain starts getting longer, there are going to be too many IDs to write down.

The 150th block on our primitive chain would need 149 rows of previous block IDs. What if the chain becomes 1,000,000 blocks long?

Now you’d say, this isn’t practical. And I would agree with you. We need a method to compress this huge information to a manageable size. Fortunately, there is a piece of magic that does this, and for a brief moment we are going to teleport to this magical land!

Author’s note: Data ID is an identifier for the data stored in the block. I will skip the paragraph on it, but I will show you how it works later.

The Magical world of Hashing

Hashing is so prevalent in the world today, that it wouldn’t be an exaggeration to say our life pretty much depends on it. And for that reason it is the subject of countless YouTube videos, blogs, college lectures and over the coffee discussions. But for our purposes today, I only want to highlight the major features of Hashing.

  1. Hashing is an irreversible function that takes any input and produces an output; you cannot recreate the input from the output
  2. Irrespective of the size of the input, the output is always a random looking alphanumeric string (a series of letters & numbers) of fixed length
  3. Hashing is extremely sensitive to input, even slightest of change to the input results in a wildly different output; output doesn’t change if the input doesn’t change!
  4. Probability of two different inputs producing the same output is practically 0, each input produces a unique output, like a fingerprint

Hashing is real, it exists, in fact there are many different types of Hashing Functions being used today! It might seem too convenient that something like this exists, because we will see in a moment how this solves almost every problem in the chain, even the ones we haven’t talked about yet. But today, I ask you to take these features of Hashing on faith. I promise I will explain how it all works in another post on another day.

However, before we leave this magical land, I will leave you with this nugget. Hashing gets its name for much the same reasons as hash browns got theirs!

Crispy potato hash browns and Hashing have much in common. How? Details in the next post…

Lets Build Some Blocks!

Now that we have met Hashing and know about the existence of Hashing Functions, we have the recipe to create unique Address IDs. This is going to be slightly complicated, but its totally worth it. We are going to build a block from scratch! So exciting..

Let’s create the first block in the chain — the Genesis block. All we need to create this block is some Data. Any data will do, even the sentence — Let’s create the first block in the chain.

The steps to create a new block are -

  1. Get the Data that is to be stored in the new block
  2. Input the Data into the Hashing Function to get the first Hash Output — store this in the data ID field for that block
  3. Insert the previous block’s current block ID in this new block’s previous block ID field, this step creates the link between the blocks of the chain (only for the first block of the chain this is empty as there is no previous block, all subsequent blocks have some value)
  4. Combine data ID and previous block ID as one input to the Hashing Function to create a new Hash Output — this is the current block ID for the new block
  5. Rinse & repeat for any new block
Different steps of creating the Genesis block

Author’s note: The Hashes in the examples are created using the MD5 Hash Function, which is one of the many Hashing Functions available. It is recommended that you verify the outputs for each example below using this MD5 Hash Generator, or any MD5 Hash Generator online. MD5 is not a secure hashing, which means it can be reversible. But it serves as a good example even today.

Now let’s create a second block in the chain, containing the data— This is the second block. Image below illustrates the steps, similar to Genesis block’s

Steps to create the second Block, with first block’s ID in the mix

Repeat the process for a total of 4 blocks and we now have a chain that looks like this…

A chain is shaping up

But wait, in this chain the previous block ID doesn’t contain all preceding blocks’ IDs. Isn’t that what we wanted our chain to have? How is this valid even?
What has happened is, the Hashing Function has magically compressed the entire history of preceding blocks into a single line of ID. To see that, lets look at what happens if we try to insert a new block in the middle of the chain.

Inserting a New Block

If we insert a new block before the second block, we have to update second block’s previous block ID to maintain the chain link. But doing so immediately updates the current block ID of the second block! Because previous block ID is an input to the Hashing function that creates the current block ID and we know that Hashing is extremely input sensitive. Now, second block’s updated current block ID needs to be inserted into the third block’s previous block ID (or the chain verification will fail) which will change the third block’s current block ID immediately!

Every hash after the new block is inserted will have to be modified because the current block ID takes previous block ID as input. Modifications in orange, previous values in strike-out

So every block down the line will have to be manually updated or the chain verification protocol will reject this update. This is exactly what we wanted from our chain!

Hashing Function has magically compressed the entire history of preceding block’s into a single line of ID. But it also solved another problem of ours…

Detecting Change in Data

If we change Data in any of the blocks even slightly, we change the Data ID of that block, as a result of Hashing Functions being sensitive to input. Data ID is one of the inputs to the current block ID, so changing Data changes Data ID which in turn changes current block ID. And the same effect as above propagates through the chain again, requiring change to every block downstream.

Even a small change of adding a period ‘.’ to the data of second block requires a wholesale change to the chain downstream

In Conclusion

At the end of previous post, our chain of block’s had 2 major problems. First, the address management system was not quite efficient. Second, we had not even touched on the data stored, and what happens if someone tries to tamper the data.

Hashing magically solves both the problems. Whether someone tries to tamper the chain (by inserting a new block in the middle), or tries to tamper with the data within the blocks, our system ensures that every block needs to be updated to pass the chain verification protocol. This is comparatively more difficult than just adding a new block at the end of the chain, because there is more work that needs to be done.

It is still not impossible though! Hashing Functions are lightning fast, for any given input getting Hash Output is almost instantaneous. So making these updates would take mere seconds.

So what really makes the Blockchain immutable then? That is the subject of my next post. See you then…

Cliffhanger

Hashing is fast, but its really random, we have no idea what the output is going to look like for a given input. What is going to happen if I want my Hash Outputs to look a certain way?

Join Coinmonks Telegram Channel and Youtube Channel learn about crypto trading and investing

Also, Read

--

--

Alabhya Mishra
Coinmonks

Working in finance, data science and analytics. Interested in learning Blockchain