CodeChain Merkle Trie

Geunwoo Kim
CodeChain
Published in
4 min readAug 3, 2018
Ralph Merkle, the inventor of Merkle Trees

The Merkle Trie is an authenticated data structure to store (key, value) bindings. It is fully deterministic, meaning that the Merkle Trie, which has the same set of (key, value) binding, is guaranteed to be exactly the same down to the last byte and, therefore, have the same root hash. It has traits of both a Merkle tree and Trie. Due to a feature of the Merkle tree, each and every node contains hash values of its children. So, any changes down the root node will influence the hash value of the root node. In contrast, Tries contain nodes that do not contain the entire key of value. Instead, keys are divided into pieces. Each piece has a specific location in the tree, and you can get the entire key by looking at every node that contains pieces of the key.

So why do we use Merkle Tries? How can we use this data structure in blockchains? The Trie enables us to find a node with its alphanumeric key. However, what we have to look at closely is the merkling operation of the Merkle tree. This provides a form of cryptographic authentication to the data structure. If the root hash of a given trie is publicly known, then anyone can prove the validity and the integrity of a data structure by providing the nodes that are contained in the path. To be specific, if you want to prove the validity of a node, you also need the hash value of the nodes in the same level that also have the same parent. Accordingly, it is impossible for an attacker to prove a data structure that has been manipulated, since the hash of the root node is ultimately based on all of the hash below it. Therefore, any modification would make it totally different hash.

Codechain’s Merkle Trie is a simpler and an optimized version of the Ethereum’s Merkle Trie. The Ethereum’s Merkle Trie uses three kinds of nodes, which include the Leaf, Branch and Extension. The Leaf is for terminating nodes which have value. The Branch is for branching each node with a common path with each other and also for storing a value for the node that has a path ending in the middle of another node’s path. The extension is for storing a common path and hash value of the corresponding Branch to make it possible to access the Branch. Ethereum uses this data structure in all of the Trie it uses, including the state Trie. The problem arises from here. In state Tries, the only case we have to consider is a fixed size key because the key will be an account address. Therefore, we can make it simpler by removing the Extension while replacing the value in the Branch with a common path in the Extension. The reason is that if all the nodes share the same path length, the value will not be stored in the middle of another one’s path.

For instance, here are two paths 0x123456 and 0x1234 with some arbitrary values. Since they have a common path(0x1234), while going down to 0x1234 both paths use the same route in the tree, which means they are stored in the same node. But at the end of the common route, the second node has to store its value because its path is over. Thus, we needed the Extension node to store a common path and the hash value, which links the Branch that has a value. But if all the path length is fixed, then we only need two kinds of nodes: one for extension and one for terminating. So, in CodeChain, we only use two kinds of nodes: Leaf and Branch. We always use a blake hash of the inserted key as the path used to store the node. The path is a fixed size 256-bit data. (More precisely, blake256(key))

By using a blake hash of the key as the path, many advantages can be gained. First of all, the operation of the Merkle Trie has become simpler, since we can remove one kind of node. More specifically, search, insert and remove are much easier than Ethereum’s Merkle Trie, because the value is always placed in the terminating node. The insertion and removal operations can be defined roughly as follows:

Second, CodeChain’s Merkle Trie uses less memory because it doesn’t have to store one additional Extension node, which is eliminated in CodeChain’s Merkle Trie. Lastly, we can expect a tree to be balanced by the hash distribution of blake. We don’t have to do additional balancing operations to make the tree structure have a normal form.

To sum up, CodeChain uses optimized Merkle Trie only for state Tries. The simplified version of Merkle Trie will perform more efficiently in terms of memory usage and speed.

Check out the source code at our github repo. If you would like to chat with the CodeChain developers' community, join our gitter channel.

--

--