Merkle Trees — An Introduction to Concepts and Components

Concept, Applications, and Benefits — simply explained

Shubhi Tiwari

Published in

Umbrella Network

4 min readMay 6, 2021

The Problem:

In a centralized network, data is usually accessed from a single source, so not much work is needed to organize a single copy of data, and no one has a choice but to trust the system.

However, in a decentralized network with multiple nodes, it is important to efficiently organize data. In a blockchain, where the data is distributed, the challenge is to first — efficiently access the data, and second — to verify it along with distributing a copy of the data among all the nodes accurately.

The Solution:

Implementing Merkle trees in systems based on decentralized networks to share and verify data.

Merkle trees lower costs by:

Organizing data in such a way that the sharing data and its verification does not require processing power.
It is carried out efficiently.

The Merkle Tree Concept

One thing that is common amongst the implementation of Bitcoin, Ethereum, IPFS, Git, Apache Cassandra, and BitTorrent is their technique to store data known as the Merkle Tree, a fundamental component of a blockchain that allows efficient and secure verification of large data structures.

Merkle trees are used to store and organize all the transactions in a block of a blockchain and help verify the consistency of the data. Let’s understand the concept further taking the help of the chart below,

A Merkle tree is a binary hash tree in which the value of an inner node is the hash of its leaf nodes. At the root of the Merkle tree, we have the hash of the transactions, represented as H[A], H[B], H[C], and H[D] in the above chart, called the leaf or child nodes. In the left tree, H[A] & H[B] are the hash values of data blocks L1 and L2 respectively, and in the right tree, H[C] & H[D] are the hash values of L3 and L4 respectively. The inner node H[AB] is the concatenation of leaf nodes H[A] and H[B], and similarly, H[CD] is the concatenation of H[C] and H[D].

Every leaf/child node of the Merkle tree contains the hash of the transactions, followed by the intermediate nodes containing the hash of the ‘combined hash values’ of the leaf/child nodes (H[AB] and H[CD]), and then followed by the root node that contains the combined hash values of its left tree and right tree (H[ABCD]), which is known as the Merkle Root, as shown above.

Interestingly, if we want to make any changes in a certain transaction, the intermediate node hash will change, leading to a change in the root hash at the same time. This means, if someone tries to change one particular transaction, the root hash will get altered and once the root hash changes, all the subsequent hash of all the blocks present will get altered since they are linked with each other. This way no attacker can try to manipulate the data because to do so, he will have to actually change the transaction data of the entire chain, which is of course an impossible task to do. This is the beauty of the design of a Merkle tree which makes it a tamper-proof data structure.

Merkle trees take care of the data’s integrity and thus you don’t have to go through the entire transaction to see its verifiability. The tree can be divided into small data blocks which can be used to verify transactions all across the network. This concept is known as Merkle Proofs and is highly beneficial in decentralized systems. Instead of needing to verify all the information in the whole tree, Merkle Proofs only need enough computing power to verify a small amount of data to see if it is true.

Applications

Merkle trees are used to synchronize the data in decentralized and distributed systems where the same data should exist in multiple places.
They are also used to check inconsistencies in the data and detect inconsistencies between replicas of the entire database.

Major Benefits

Merkle trees improve massive scalability by splitting up data into different pieces as Merkle trees can be divided into tiny information blocks for verification.
It effectively validates the integrity of data.
Any amount of data can be stored in a Merkle tree which will always end up in a root hash at the top.
It provides high-level data and consistency verification.
Merkle trees significantly lower costs with the help of Merkle Proofs.

How Merkle Trees benefit Umbrella Network

As Merkle trees help organize data efficiently allowing the verification process to use less computational power, Umbrella Network uses it to bring thousands of real-world data points on-chain at a low cost without sacrificing security and data integrity.

In Umbrella Network’s system, each leaf on the Merkle tree represents a data point from an oracle. The Merkle Root Hash is a unique identifier for the entire block of transactions within the Merkle tree. A ‘proof of stake’ consensus is conducted on the Merkle root hash that represents all the data in the tree, and the final validated set of transactions are written on-chain for the cost of one transaction fee.

Umbrella utilizes Merkle trees for batching transactions to solve the scalability issues in oracles today. Merkle trees are used to aggregate the data, allowing multiple transactions to bundle up into a single transaction, and for a single fee. Hence, the time and cost required to bring real-world data on-chain are significantly reduced, allowing Umbrella Network to be the lowest cost oracle solution with the largest data set available in the market.