Data structure in Ethereum | Episode 1+: Compact (Hex-prefix) encoding.
In the episode 1, we discussed RLP Encoding/Decoding, however Ethereum still has another encoding called Compact encoding or Hex-prefix (HP) encoding. So, what are the differences between RLP and HP encoding? and How does HP encoding work? This article will help us to get it.
Moreover, I am going to explain some confused terminologies that really helpful for understanding this article, also the next episode about the trie in Ethereum.
Some terminologies
Before we go to HP encoding, I will try to introduce some terminologies in advance. However, this part relates to Trie in Ethereum and perhaps you are not familiar with it. Don’t worry about that, I will try to skip that things and let us approach it by an intuitional way.
Let’s play a game. I will give you a map and a path, you have to find the animal which I am implying.
The path is: 3–2–3
In order to solve this game, we do step by step following:
- Start at Root and the first element in path is 3, it means we must get the element of Root which labeled by number 3 and we can see that the value of that element is A.
- Find node A, the next element in path is 2 so we get the value in node A is D.
- Find node D, the last element in path is 3, obviously the value we find out is ‘Whale’.
So, the animal of the game is whale. Let’s try another path: 3–1–3–2 (the result is ‘Chicken’).
By this game we get familiar with some terminologies:
- Key: Root, A, B, C, D and E in game are keys
- Node: the content corresponding with the key in the right part of each row. For example:
{1: ‘Dog’, 2: B, 3: A}
. - Path: path is like 2–2–3 in game.
- Value: you can see it has some elements in all nodes, every element is a key-value pair. Value is the right part of element, value can be a key or a name of animal.
- Nibble: hex form of 4 bits is a nibble. For example:
0x1
,0x4
,0xf
…
Differentiate RLD from HP encoding
It’s really easy to differentiate RLP from HP encoding, RLP is used for encoding/decoding Value and HP encoding is used for encoding/decoding Path.
HP encoding goals
We will meet 2 more terminologies which are leaf
and extension
in this part, but we can blur them and discuss the details in the episode 3.
Leaf
and extension
are 2 kinds of node, however path of leaf
has terminator
and extension
does not. Terminator
is the last byte of the path and has value of 16
in dec or 0x10
in hex.
As we can see, it’s possible to the path has odd length but odd length is not friendly to machines. So we have to convert all odd-length paths to even-length paths.
In summary, HP encoding goals are:
- Distinguish
leaf
andextension
from each other withoutterminator
. - Convert the path to even length.
HP encoding specification
For now, we call the path inputed in HP encoding as input for convenience.
- If input has
terminator
, removeterminator
from input. - Create the prefix into input which has value as following table:
- If the prefix is
0x0
or0x2
, add a padding nibble0
follow the prefix, so the prefix is0x00
and0x20
. The main reason to do that is we are trying to maintain the even-length attribute of the path. Convince yourself :). - Add prefix to path.
For example:
Conclusion
Now, I believe that we got all of weapons to make a battle with Trie in Ethereum. In the next episode, we will get deep into Trie that is the most excited part I can wait to discuss with you.
References
Wiki Ethereum
HP encoding on original Golang code:
HP encoding on my github with some comments of explanation.
sontuphan/debug-geth
Contribute to debug-geth development by creating an account on GitHub.
github.com