Data structure in Ethereum. Episode 3: Patricia trie.
Patricia trie is the main trie used in Ethereum to store data. It is a mixture of Radix trie and Merkle trie.
In order to make it easy to understand, I will take us to an inefficient Patricia trie first, then move to “improved” Patricia trie and finally, the “real” Patricia trie. This way will help us know the reason why some structures have been generated.
It means about motivation and intuition 🏃.
A node in Ethereum is stored as a key-value with key is hash of node. Value is an array with 17 elements. In 17 elements, first 16 elements are indexed by hex number from
f and the last one is data contained by that node.
key vs path
Please note that,
key is used for “database lookup” (it means to find a node by database mechanism) and
path is used for “trie lookup” (it means to find data by path descending as Radix trie).
If it is still ambiguous, it would be more easy with an example below.
Inefficient Patricia trie
We build a trie that represents our dataset.
To test it again, let’s try searching the value of
395 path step by step.
Notations: “this is database lookup” — tdl, “this is trie lookup” — ttl;
- We descend
395to 3 parts
5and use them in sequence.
- Starting at
rootHash, we have to find
- First part of path is
3, so we get the element indexed by
rootNodeand that is
- Looking for
hashA(tdl), getting element indexed by
9(ttl) and value of element is
- Looking for
hashC(tdl), getting element indexed by
5(ttl) and value of element is
- At this time, we descend entire the path so we will get value contained in data element (the last element) of node corresponding with
hashD(ttl), and the result is
You can see that, we used path to find the value (an attribute of Radix trie) and if a value in trie is changed, it will cause the
rootHash of trie changed (an attribute of Merkle trie).
Moreover, trie has so many nodes with null value in data element and we have to improve it for more efficiency.
“Improved” Patricia trie
How we can improve it? The best answer is in Ethereum wiki, I won’t be able to explain better, so I will cite it below 💅💅💅.
I have been mentioned about
leaf node and
extension node in Episode 1+, but we don’t know what they are. We met some situations that make our trie become degraded and those are the long paths without branch (no divergent path).
56f0, to get
horse value, you need to descend too many empty-value nodes.
But, these lead to 2 sub-problems.
- No divergent path points to a data at the end. For example:
- No divergent path is branched in the middle. For example:
To solve the first one they introduced
leaf node and solve the second one they introduced
extension node. They are nodes in form of an array with 2 elements, the first element is
partialPath that helps to reduce empty-value node, the second one contains value that is
hashE now becomes a
leaf node, to get value of
56f0 path we can do like:
- Getting element with index
rootHashand the value is
hashEmay be a
extensionnode, we have to compare the rest of path (remainder) to
hashE. Remainder is
6f0(they are the same), so this node is
leafnode. We return
datafield and result is
hashB now becomes a
extension node. For example, getting
- Getting element indexed by
rootHash, value is
- We can see
hashBis a node as an array with 2 values, so we keep comparing remainder with
partialPathto know which one node is. Remainder is
ab, it means node is
extensionnode. We remove
partialPathfrom the current rest of path and we get the new rest of path is
8and the next hash is
- Finding node corresponding with
hashJ, getting element indexed by
8and the value is
hashK. Remainder is empty now.
hashKand we are received a node with empty
partialPath(leaf node because remainder is equal to
In addition, we can see
leaf nodes as well. And actually, our trie is still not optimized completely.
Now, this trie seems to be optimized fully by
leaf node and
Two examples above help us understand why and how Patricia trie was built and improved. Now we will complete it and get final Patricia trie which is used by Ethereum. 🚀🚀🚀
Some additional rules:
Ethereum Data Structure
In Ethereum, Patricia trie is used in 4 trie:
receiptRoot are contained in block header. Especially,
storageRoot is a sub-trie and contained in data of state trie.
The link below is a perfect explanation of them.
Everything I shown from episode 1 to episode 3 is all about some theoretical stuff, and it seems to be not enough to understand how Patricia trie works in practice. In the next episode, we will do an example to connect to database and show how trie organized. That’s really cool.
Something you need to go first is about geth, web3 and levelDB, just for getting smooth in reading episode 4.
Documents from wiki Ethereum: