Leaving aside the network part, we could say that Ethereum is a state machine where transactions modify states on the Ethereum network. A state can be expressed as a key-value pair. Although there are several ways of representing a key-value pair, the Ethereum specification defines the Modified Merkle Patricia Trie (a.k.a MPT) as the method to save states.
Basically, MPT is a combination of Patricia trie and Merkle tree, with few additional optimizations that fit the characteristics of Ethereum. Thus, an understanding of the Patricia trie and Merkle tree should precede the understanding of MPT.
Patricia trie is a data structure which is also called Prefix tree, radix tree or trie. Trie uses a key as a path so the nodes that share the same prefix can also share the same path. This structure is fastest at finding common prefixes, simple to implement, and requires small memory. Thereby, it is commonly used for implementing routing tables, systems that are used in low specification machines like the router.
Merkle tree is a tree of hashes. Leaf nodes store data. Parent nodes contain their children’s hash as well as the hashed value of the sum of their children’s hashes. Since all the nodes except for leaf nodes contain a hash, the Merkle tree is also known as a hash tree.
Finding out whether two different nodes have the same data or not can be efficiently done with the Merkle tree. You first have to compare the Top Hash value of the two nodes. If they are the same, then the two nodes have same data. For example, if you look at the picture above, when there are four nodes (L1, L2, L3, L4), you only need to check whether they have the same Top Hash or not. If the Top Hash is different and you want to know which data is different, you should compare Hash 0 with Hash1 and check which branch is different. By doing so, you will eventually find out which data is different.
Merkle Patricia Trie
In the MPT, as well as in the Merkle tree, every node has a hash value. Each node’s hash is decided by the sha3 hash value of its contents. This hash is also used as the key that refers to the node. Go-ethereum uses levelDB, and parity uses rocksDB to store states. They are key-value storages. Keys and values saved in the storage are not the key-values of the Ethereum state. The value that is stored in the storage is the content of MPT node while the key is the hash of this node.
Key-values of the Ethereum state are used as paths on the MPT. Nibble is the unit used to distinguish key values in the MPT, so each node can have up to 16 branches. Additionally, since a node has its own value, a branch node is an array of 17 items composed of 1 node value and 16 branches.
A Node that does not have a child node is called a leaf node. A leaf node consists of two items: its path and value. For example, let’s say the key “0xBEA” contains 1000 and the key “0xBEE” contains 2000. Then, there should be a branch node with the “0xBE” path, and, under that node, two leaf nodes with two paths (“0xA” and “0xE”) will be attached.
In the MPT, there is one more type of nodes apart from the branch nodes and the leaf nodes. They are extension nodes. An extension node is an optimized node of the branch node. In the Ethereum state, quite frequently, there are branch nodes that have only one child node. This is the reason why the MPT compresses branch nodes that contain only one child into extension nodes that have a path and the hash of the child.
Since both the leaf node and the extension node are an array of two items there should be a way to distinguish these two different nodes. In order to make such distinction, the MPT adds a prefix to the path. If the node is a leaf and the path consists of even number of nibbles, you add 0x20 as a prefix. If the path consists of odd number of nibbles, you should add 0x3 as a prefix. If the node is an extension node and the path consists of even number of nibbles, you add 0x00 as a prefix. If it consists of odd number of nibbles, you should add 0x1 as a prefix. Because the path that consists of an odd number of nibbles gets a nibble as prefix and the path that consists of an even number of nibbles gets two nibbles as a prefix, a path is always expressed as a byte.
(For the original Korean version, click here)