Data structure in Ethereum. Episode 3: Patricia trie.

Patricia trie is the main trie used in Ethereum to store data. It is a mixture of Radix trie and Merkle trie.

In order to make it easy to understand, I will take us to an inefficient Patricia trie first, then move to “improved” Patricia trie and finally, the “real” Patricia trie. This way will help us know the reason why some structures have been generated.
It means about motivation and intuition 🏃.

Dataset


Terminologies

A node.

A node in Ethereum is stored as a key-value with key is hash of node. Value is an array with 17 elements. In 17 elements, first 16 elements are indexed by hex number from 0 to f and the last one is data contained by that node.

key vs path

Please note that, key is used for “database lookup” (it means to find a node by database mechanism) and path is used for “trie lookup” (it means to find data by path descending as Radix trie).

If it is still ambiguous, it would be more easy with an example below.

Inefficient Patricia trie

We build a trie that represents our dataset.

Building trie by dataset.

To test it again, let’s try searching the value of 395 path step by step.

Notations: “this is database lookup” — tdl, “this is trie lookup” — ttl;
  1. We descend 395 to 3 parts 3, 9, 5 and use them in sequence.
  2. Starting at rootHash, we have to find rootNode corresponding with rootHash (tdl).
  3. First part of path is 3, so we get the element indexed by 3 of rootNode and that is hashA (ttl).
  4. Looking for hashA (tdl), getting element indexed by 9 (ttl) and value of element is hashC.
  5. Looking for hashC (tdl), getting element indexed by 5 (ttl) and value of element is hashD.
  6. At this time, we descend entire the path so we will get value contained in data element (the last element) of node corresponding with hashD (ttl), and the result is duck.

You can see that, we used path to find the value (an attribute of Radix trie) and if a value in trie is changed, it will cause the rootHash of trie changed (an attribute of Merkle trie).

Moreover, trie has so many nodes with null value in data element and we have to improve it for more efficiency.


“Improved” Patricia trie

How we can improve it? The best answer is in Ethereum wiki, I won’t be able to explain better, so I will cite it below 💅💅💅.

The holy answer.

Finally, chose one 💆💆💆.

I have been mentioned about leaf node and extension node in Episode 1+, but we don’t know what they are. We met some situations that make our trie become degraded and those are the long paths without branch (no divergent path).

For example: 56f0, to get horse value, you need to descend too many empty-value nodes.

But, these lead to 2 sub-problems.

  1. No divergent path points to a data at the end. For example: 56f0.
  2. No divergent path is branched in the middle. For example: cab of {cabe, cab8}.

To solve the first one they introduced leaf node and solve the second one they introduced extension node. They are nodes in form of an array with 2 elements, the first element is partialPath that helps to reduce empty-value node, the second one contains value that is data if leaf or merkleHash if extension.

Using leaf node.

hashE now becomes a leaf node, to get value of 56f0 path we can do like:

  1. Getting element with index 5 of rootHash and the value is hashE.
  2. Because hashE may be a leaf or extension node, we have to compare the rest of path (remainder) to partialPath of hashE. Remainder is 6f0 and partialPath is 6f0 (they are the same), so this node is leaf node. We return data field and result is horse.
Using leaf node and extension node.

hashB now becomes a extension node. For example, getting data of cab8 path:

  1. Getting element indexed by c of rootHash, value is hashB.
  2. We can see hashB is a node as an array with 2 values, so we keep comparing remainder with partialPath to know which one node is. Remainder is ab8 and partialPath is ab, it means node is extension node. We remove partialPath from the current rest of path and we get the new rest of path is 8 and the next hash is hashJ.
  3. Finding node corresponding with hashJ, getting element indexed by 8 and the value is hashK. Remainder is empty now.
  4. Finding hashK and we are received a node with empty partialPath (leaf node because remainder is equal to partialPath), return dog.

In addition, we can see hashK and hashL are leaf nodes as well. And actually, our trie is still not optimized completely.

Optimized trie.

Now, this trie seems to be optimized fully by leaf node and extension node.

Two examples above help us understand why and how Patricia trie was built and improved. Now we will complete it and get final Patricia trie which is used by Ethereum. 🚀🚀🚀

Patricia trie

Some additional rules:

  1. Every partialPath will be HP encoded before hand.
  2. Every elements in a node will be RLP encoded.
  3. Value (Node) will be RLP encoded before stored down.
Patricia trie.

Ethereum Data Structure

In Ethereum, Patricia trie is used in 4 trie:

  1. stateRoot
  2. storageRoot
  3. transactionRoot
  4. receiptRoot

In them, stateRoot, transactionRoot and receiptRoot are contained in block header. Especially, storageRoot is a sub-trie and contained in data of state trie.

The link below is a perfect explanation of them.


Conclusion

Everything I shown from episode 1 to episode 3 is all about some theoretical stuff, and it seems to be not enough to understand how Patricia trie works in practice. In the next episode, we will do an example to connect to database and show how trie organized. That’s really cool.

Something you need to go first is about geth, web3 and levelDB, just for getting smooth in reading episode 4.

References

Documents from wiki Ethereum:

Like what you read? Give Phan Sơn Tự a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.