# Data structure in Ethereum. Episode 3: Patricia trie.

Patricia trie is the main trie used in Ethereum to store data. It is a mixture of Radix trie and Merkle trie.

In order to make it easy to understand, I will take us to an inefficient Patricia trie first, then move to “improved” Patricia trie and finally, the “real” Patricia trie. This way will help us know the reason why some structures have been generated.

It means about motivation and intuition🏃.

*Dataset*

*Terminologies*

A node in Ethereum is stored as a key-value with key is hash of node. Value is an array with 17 elements. In 17 elements, first 16 elements are indexed by hex number from `0`

to `f`

and the last one is data contained by that node.

key vs path

Please note that, `key`

is used for “database lookup” (it means to find a node by database mechanism) and `path`

is used for “trie lookup” (it means to find data by path descending as Radix trie).

If it is still ambiguous, it would be more easy with an example below.

*Inefficient Patricia trie*

We build a trie that represents our dataset.

To test it again, let’s try searching the value of `395`

path step by step.

Notations: “this is database lookup” —tdl, “this is trie lookup” —ttl;

- We descend
`395`

to 3 parts`3`

,`9`

,`5`

and use them in sequence. - Starting at
`rootHash`

, we have to find`rootNode`

corresponding with`rootHash`

(**tdl**). - First part of path is
`3`

, so we get the element indexed by`3`

of`rootNode`

and that is`hashA`

(**ttl**). - Looking for
`hashA`

(**tdl**), getting element indexed by`9`

(**ttl**) and value of element is`hashC`

. - Looking for
`hashC`

(**tdl**), getting element indexed by`5`

(**ttl**) and value of element is`hashD`

. - At this time, we descend entire the path so we will get value contained in data element (the last element) of node corresponding with
`hashD`

(**ttl**), and the result is`duck`

.

You can see that, we used path to find the value (an attribute of Radix trie) and if a value in trie is changed, it will cause the `rootHash`

of trie changed (an attribute of Merkle trie).

Moreover, trie has so many nodes with null value in data element and we have to improve it for more efficiency.

*“Improved” Patricia trie*

How we can improve it? The best answer is in Ethereum wiki, I won’t be able to explain better, so I will cite it below💅💅💅.

I have been mentioned about `leaf`

node and `extension`

node in Episode 1+, but we don’t know what they are. We met some situations that make our trie become degraded and those are the long paths without branch (no divergent path).

For example: `56f0`

, to get `horse`

value, you need to descend too many empty-value nodes.

But, these lead to 2 sub-problems.

- No divergent path points to a data at the end. For example:
`56f0`

. - No divergent path is branched in the middle. For example:
`cab`

of`{cabe, cab8}`

.

To solve the first one they introduced `leaf`

node and solve the second one they introduced `extension`

node. They are nodes in form of an array with 2 elements, the first element is `partialPath`

that helps to reduce empty-value node, the second one contains value that is `data`

if `leaf`

or `merkleHash`

if `extension`

.

`hashE`

now becomes a `leaf`

node, to get value of `56f0`

path we can do like:

- Getting element with index
`5`

of`rootHash`

and the value is`hashE`

. - Because
`hashE`

may be a`leaf`

or`extension`

node, we have to compare the rest of path (remainder) to`partialPath`

of`hashE`

. Remainder is`6f0`

and`partialPath`

is`6f0`

(they are the same), so this node is`leaf`

node. We return`data`

field and result is`horse`

.

`hashB`

now becomes a `extension`

node. For example, getting `data`

of `cab8`

path:

- Getting element indexed by
`c`

of`rootHash`

, value is`hashB`

. - We can see
`hashB`

is a node as an array with 2 values, so we keep comparing remainder with`partialPath`

to know which one node is. Remainder is`ab8`

and`partialPath`

is`ab`

, it means node is`extension`

node. We remove`partialPath`

from the current rest of path and we get the new rest of path is`8`

and the next hash is`hashJ`

. - Finding node corresponding with
`hashJ`

, getting element indexed by`8`

and the value is`hashK`

. Remainder is empty now. - Finding
`hashK`

and we are received a node with empty`partialPath`

(leaf node because remainder is equal to`partialPath`

), return`dog`

.

In addition, we can see `hashK`

and `hashL`

are `leaf`

nodes as well. And actually, our trie is still not optimized completely.

Now, this trie seems to be optimized fully by `leaf`

node and `extension`

node.

Two examples above help us understand why and how Patricia trie was built and improved. Now we will complete it and get final Patricia trie which is used by Ethereum.🚀🚀🚀

*Patricia trie*

Some additional rules:

- Every
`partialPath`

will be HP encoded before hand. - Every elements in a node will be RLP encoded.
- Value (Node) will be RLP encoded before stored down.

*Ethereum Data Structure*

In Ethereum, Patricia trie is used in 4 trie:

- stateRoot
- storageRoot
- transactionRoot
- receiptRoot

In them, `stateRoot`

, `transactionRoot`

and `receiptRoot`

are contained in block header. Especially, `storageRoot`

is a sub-trie and contained in data of state trie.

The link below is a perfect explanation of them.

*Conclusion*

Everything I shown from episode 1 to episode 3 is all about some theoretical stuff, and it seems to be not enough to understand how Patricia trie works in practice. In the next episode, we will do an example to connect to database and show how trie organized. That’s really cool.

Something you need to go first is about geth, web3 and levelDB, just for getting smooth in reading episode 4.

*References*

Documents from wiki Ethereum: