Skewed Merkle Trees

Seung Woo Kim
Jun 21, 2018 · 3 min read

Why for CodeChain’s transactions_root and invoices_root, which require data validation, Skewed Merkle trees are a more appropriate solution.

As explained in the previous article, Ethereum has 4 different usages of the Modified Merkle Patricia Trie. Among these, State Trie and Storage Trie are used to efficiently store and verify changing data. However, the Transaction Trie and Receipts Trie do not exist to support changing data. These two trie are volatile data used only for the verification of the transactions used in one block and the receipt generated from that block.

The reason Ethereum uses identical data structure for two different objectives is to share identical implementations. However, since the objective is different, the optimized code to create a Transaction Trie or a Receipts Trie is different from code that creates a State Trie or a Storage Trie. In reality, Parity uses different code for each objective. If that’s the case, then wouldn’t there be no reason to use identical Trie for a Transaction Trie or a Receipts Trie?

This kind of consideration is reflected upon when looking at CodeChain’s transactions_root and invoices_root, which is being developed at KodeBox. Thus, we decided to take a different approach from a Merkle Patricia Trie that stores UTXO. Then what kind of properties would transactions_root and invoices_root need?

The minimum requirement would be that it needs to verify data. This verification should not simply verify data sets but also check whether data order is in agreement. This is what you need in a typical blockchain, including Ethereum. In CodeChain, there are additional incorporated requirements.

Firstly, implementation should be simple. By simple, it means that not only the code has to be simple, but it should also require less memory usage and less time to execute. All of this is mentioned because CodeChain is keeping light clients into consideration.

Next, transactions should be entirely verifiable. This property only exists in CodeChain, and not in any other existing blockchain technology. In order for CodeChain to synchronize, it uses a technique called Snapshot Sync. Snapshot Sync is a way to ensure that only one reliable header is required for a block to synchronize as if it has received the entire history. For this, all transactions must be verifiable with only a single header.

Lastly, it must be incremental. Since CodeChain’s transactions_root must be able to verify the entire transaction list, if it’s not incremental, then as blocks stack on and on, it would require a longer time to verify. In order to prevent this delay, the incremental property is crucial.

In order to satisfy these conditions, Ethereum’s Merkle Patricia Trie was the first to be excluded. This can make the entire data incremental, but the overhead of constructing the Trie was too large.

Then we considered the traditional Merkle Tree. However, this did not satisfy the requirement that the overhead should be small. It also did not satisfy the incremental property.

Thus, we decided to use the Skewed Merkle Tree which leans toward the left. In a Skewed Merkle Tree, you can create a new tree using the transactions of the current block on the root of the entire block, and quickly and lightly create a hash that contains a list of all transactions executed so far. That is, you can make the entire transaction list incremental, and the time and memory it takes to create is linearly proportional to the number of transactions. Furthermore, since we are only interested in root values for verification, we can optimize the used memory to use an amount times n(where n is a constant number).

https://gist.github.com/sgkim126/66a077b914263768d479c842bae7e30b

Of course, the Skewed Merkle Tree is not good at everything. The Skewed Merkle Tree is inefficient in creating the Merkle path, a basic feature of the Merkle Tree. Thus, if you need to create a Merkle path, the classic balanced Merkle Tree is more appropriate. However, given the use of CodeChain’s transactions_root and invoices_root, the Merkle-proof path is not used. We decided that the Skewed Merkle Tree is more appropriate because it was used solely for data validation.

Check out the source code at our github repo. If you would like to chat with the CodeChain developers community, join our gitter.

(For the original Korean version, click here)

CodeChain

Programmable multi-asset chain

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store