Merkle Tree in Blockchain (Part 5- Blockchain Series)

Welcome to the fifth part of the 100 part series on Blockchain.

The data/transactions in a block are not stored as plain text; instead, they are stored in a data structure called the Merkle tree. In other words, the Merkel tree serves as a summary of all the transactions in a block. Every transaction in a block is uniquely hashed by hash functions like MD5, BLAKE2, SHA-1, SHA-256 to produce a digital fingerprint of the entire set of transactions. Each pair of hashed transactions is hashed together by the hash function, and this process continues until there is one hash for the entire block.

Structurally, the Merkle tree is a type of binary tree, where the hashes of the transactional data on the bottom row are referred to as “leaf nodes,” the intermediate hashes as “branches,” and the hash at the top as the “root.” Merkle tree is also called the Hash tree. Each block in the Blockchain has one Merkle root.

The Merkle tree of transactions A, B, C, and D; Hash ABCD is the Merkle root/root hash

A Basic Merkle tree

To make it easier, let’s understand the Merkle tree with the help of an example.

Suppose there are four transactions in a block, as shown in the above figure. Each of the four transactions (A, B, C, and D) has a unique hash (Hash A, Hash B, Hash C, and Hash D). Then every pair of these hashed transactions combine together to create another hash. In this case, hashed transaction A pairs with hashed transaction B to generate a new Hash AB. On the other hand, hashed transaction C pairs with hashed transaction D to create a new Hash CD. The Hash AB and Hash CD are the branches of the Merkle tree. The Hash AB and Hash CD are then grouped together by the hash function to produce the Hash ABCD, which is the Merkle root/root hash for this Merkle tree. Merkle root is then stored in the block header.

But now the question arises what is this block header?

A block is composed of a header and a body. The block header contains Merkel root, Timestamp, Block Version number (indicates which set of block validation rules to follow), Difficulty Target, Nonce, and Previous Hash. On the other hand, the block body contains all confirmed transactions within the block.

The Merkle root is stored in the block header of the block

An Unbalanced Merkle tree with an odd number of transactions

The above example illustrates the fundamental case of a Merkle tree. There are just the correct number of Merkle leaves at every level to form exact pairs. What happens if you have an odd number of transactions, leading to the formation of an odd number of Merkle leaves? For example, suppose there are five transactions in the block. Each of the five transactions (A, B, C, D, and E) will have a unique hash (Hash A, Hash B, Hash C, Hash D, and Hash E). The pair of hashed transactions A and B is hashed to produce the Hash AB. Similarly, the pair of hashed transactions C and D is hashed to produce the Hash CD. But hashed transaction E is left without a pair to hash into a new branch. Since the Merkle trees are binary in nature, they require leaf nodes to be even for them to work. Therefore, if there happens to be an unpaired hash, then that hash is copied and paired with itself. In this example, Hash E is duplicated to pair with itself to be hashed to form Hash EE. After this, the Hash AB and Hash CD are grouped and hashed to create a new Hash ABCD. On the other hand, Hash EE gets duplicated and pairs with itself to be hashed to form Hash EEEE. The hashes ABCD and EEEE are further grouped and hashed to produce one hash, i.e., the Merkle root.

The Merkle tree of transactions A, B, C, D, and E; Hash ABCDEEEE is the Merkle root/root hash

This root hash of a block is then used to find the block’s unique valid hash with leading 0’s. For example, the Merkle root/root hash of a block #72608 is a1c6d67c992a70fca66188e178d9ca7c20d5c775393948f19955be7f0952c0f. The root hash is then combined with other information of the block (block number, the previous block’s hash, the timestamp, the difficulty target, and the nonce) and run through a hash function to produce the block’s unique and valid hash with leading 0’s: 00001a03b7328189f5f7154681c5827a1b2fce2fcb5301a8e412e22ea9a7859. Once the block gets the valid hash, it becomes a part of the Blockchain, and further new blocks can be added to this block.

Advantages of Merkle tree

  1. Each block has a unique hash value, calculated from the Merkle root. The block also contains the previous block’s hash, thus linking one block to another in the Blockchain. If there is a change in any transaction, then the hash of that transaction changes. This change cascades up to the Merkle Root, changing the value of the Merkle root and thus making the block invalid. This then is reflected in the succeeding block, leading to a change in its hash, thus making the rest of the Blockchain invalid. Therefore, the Merkle tree makes an immutable record of transactions in the block.
  2. Blockchains are usually made up of hundreds of thousands of blocks, and each block can contain up to several thousand transactions, thereby making memory space and computing power the two big problems while validating the data. If the Blockchain did not have the concept of Merkle trees, then every single node on the network would have been required to keep a complete copy of every single transaction that has ever occurred on the Blockchain. While confirming a transaction, a node would have been required to compare each entry line by line to make sure its own records match exactly with the network records. If there was any discrepancy between the records, it could compromise the security of the network. Therefore, the computer used for validating the data would have required much higher processing power to compare the records to ensure that there had been no changes. On the other hand, Merkle trees solve this problem by considerably reducing the amount of data that has to be maintained for verification purposes. They hash all the records in the ledger, which effectively separates the proof of data from the data itself. Users can verify individual blocks and can also check transactions by using hashes. Thus, reducing the amount of computing power required to validate the transactions.
  3. Merkle trees help eliminate modifying transaction records and double-spending within a Blockchain. For example, if someone tries to hack an entry on the Blockchain to make it appear as if he has more cryptocurrency than what is available in reality, any such false entry would not be in line with the rest of the hashes in a Merkle tree and thus would be rejected by the network. This is because when a transaction happens on the Blockchain, it is not stored as such; instead, it gets verified. The transaction is hashed by the hash function, and then its hash is verified with every other hash on the Blockchain to prove that nothing has been altered or tampered with.

Since the hash function is said to be deterministic, it generates the same hash whenever the same input is passed through it. Therefore, if an individual tries to double-spend his digital currency, a hash will be generated for that transaction. If that hash matches with the existing records present on the Blockchain, that transaction is rejected. Hence, double-spending is prevented.

Merkle proofs

Merkle proofs allow for verification of a specific transaction/content in a large body of data. In Blockchain systems, there are two types of nodes:

(i) The nodes holding a complete history of Blockchain are called “full nodes.”

(ii) And other nodes are called “lightweight nodes.” A Lightweight Node contains only a partial list of Blockchain, which usually includes just the block headers (containing Merkle root) instead of its entire transaction history. The lightweight node can only validate transactions included in a block without the need to download the whole Blockchain.

When a lightweight node wants to verify a specific transaction, it uses the hash function to obtain the hash value of that transaction. The full node sends all the hash values to the lightweight node required for verification based on the structure of the Merkle tree. Next, the lightweight node repeats the hash operation to compute the root hash value. Then the root hash value obtained by the process is compared with the root hash value sent by the full node.

Suppose a lightweight node wants to verify whether transaction D has been lost or tampered with. For this, the full node sends the hash values Hash C, Hash AB, Hash EFGH, and Merkle root to the lightweight node. First, the lightweight node finds the hash of transaction D, i.e., Hash D using the hash function. Then Hash D, along with the values Hash C, Hash AB, and Hash EFGH, are recomputed to generate Merkle root/root hash. If the computed Merkle root and original Merkle root match, then it can be verified that the Hash D is a genuine leaf of the Merkle tree, and thus, transaction D is a part of the Merkle tree. On the contrary, if the computed Merkle root and original Merkle root do not match, then it can be confirmed that transaction D has been tampered with.

Merkle proof for transaction D

If you liked this article and want to know more about Blockchain, NFTs, Metaverse, and their applications, click the below link.

Happy learning!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store