Structure of a binary state tree — part 1

Guillaume Ballet
8 min readOct 5, 2020

--

In the past few months, I have been working on transitioning the trie from a hexary structure to a binary one. I have already written about the conversion method, in which the structure wasn’t fully specified. This article is the first in a series that goes over the design trade-offs to be considered when designing a new structure.

Some design choices that sounded like a good idea when the hexary trie was designed, turned out to be a source of complexity over its 5 years of faithful service. As eth1x contemplates a switch to binary tries, there is a great opportunity to take another look at how the state is stored.

The root of the problem

There are at least 5 things that could be improved when redesigning the storage format.

  • Merging the accounts and storage tries: maintaining several structures creates complexity, and the typical use case is to go through the account trie first, get the root of the storage trie, and then go to the storage trie to get to the data.
  • Extension nodes: this is a special node used to prefix all keys in a given subtree. They are useful because they limit the number of nodes that need to be hashed, however they also introduce complexity because the section of the keys that they cover has to be packed in a specific way.
  • The RLP encoding format is meant to encode arbitrary structures rather efficienctly. It is also the biggest source of pain due to its complexity: it also requires going through hoops to pack key chunks. Furthermore, due to the fairly constant structures of each node, the size gains made by using RLP can be made using a simpler serialization method.
  • Connected to the previous two problems, the hex prefix is also a huge source of complexity and confusion.
  • The proofs (a.k.a “witness”) for hexary tries are about 4 times larger than that of a binary trie (see the binary overlay article for a visualization).

RLP — Ramble, Lose yourself and Pester?

Let’s tackle the RLP question in this article. The vast majority of core devs wouldn’t be sad to see RLP go. This is due to its huge complexity. In fact, the only argument I’ve heard in favor of keeping it was: “RLP is so complex, let’s not take the risk of re-living the same nightmares by picking a new format”.

The basic principle is that it follows a simple size + data format. This is where the simplicity stops. First, the size can be any integer (in practice, it’s unlikely to be more than 2 bytes, but it could be, so you have to make sure this is supported). How do you tell where the size part ends and where the data begins?

  • In most cases, a header is added in front. The header is composed of a value h, to which the size is added: the RLP encoding is therefore [length(data)+h] [data]
  • The previous case works if length(data)+h < 256. What if the data is so large that it doesn’t fit into a single byte with the header? Bingo, you need another byte to store the size! And another h’ value to notify that you are using this second storage scheme. The RLP encoding is therefore: [h'+length(length(data))] [length(data)] [data]. h' has “conveniently” been chosen to be 56.
  • If the data is a single byte, one finds oneself having to add a byte in front of a single byte. Is it possible to not do that? Sometimes yes, and only at the price of an extra complication! If data[0]< h, then no header is necessary. Conversely, any byte payload starting with a value that is less than h is necessarily a single byte.

Can we make it more fun? Of course we can! There are two types of “objects” that RLP knows of: lists of structures and byte arrays.

  • Byte arrays have header = 128 and overflow_header=183,
  • Lists have header=192 and overflow_header=247.

So note that the maximal size for the data size is 8 bytes, which is a 64-byte number. 18014398509481984 KB ought to be enough for anybody.

Although there’s plenty more tricky details to be looked into, the point is clear: RLP is a beast. Let’s now see how it interacts with the state trie.

Merkelization rule

How do we apply both RLP and its tortured logic to our Merkelization rule? Let’s start at the bottom with the leaves.

Leaves and the hex prefix

Leaves contain a value, and then there’s whatever bits of the key leading to them, that is not common to other leaves. So we have a tuple (key, value), which is a 2-element list, containingkey and value that are byte arrays. RLP should be able to let us handle that right? Not so fast.

Keys in a hexary trie are nibble-based, so if we take a chunk from a key it could end up in the middle of a byte and the question comes as to where the alignment should take place. So a decision has to be made for this, and that key chunk is packed using a function called the hex prefix method. This adds a header to tell if the chunk length is even or odd.

The hex prefix encoding uses the first nibble of the first byte to store whether the length of the key chunk is even or odd.

The hex prefix require twiddling with nibbles. For example, if the chunk length is odd, then the first nibble is packed in the lowest nibble of the header.

Also, if the length is odd but is not byte-aligned, each nibble will have to be shifted to make it fit in one byte less.

In case the chunk length is even but is odd-aligned, all nibbles need to be shifted by 4 bits.

So the leaf node is finally encoded as RLP(HP(key_chunk), value).

Branch nodes and Small children

A branch node has 16 children, one for each nibble. RLP-wise it’s simply a list of the hashes of its children, so a list of 16 byte arrays. If one of the children is missing, it will simply have an empty byte array (just a standalone 128 to mark an array with a size of 0). There is actually a 17th entry in the list, for the value at the branch node, but it isn’t used in practice, so the last entry in the list is always 128.

As you should expect by now, there’s a bit of a catch here, too. In order to not create database entries for something too small, a node whose RLP is too small will not be hashed. So in that case, the RLP is stored directly as a raw list, instead of its hash. The consequence is that instead of finding a start-of-array marker (128) in the list, one will find another start-of-list marker (192) which gives rise to a lot of fun problems to solve.

If the total size of the RLP of a branch node is more than 32, then it will be hashed. Which happens most of the time because for that not to happen, the RLP size of a child would have to be 16 bytes max.

extension nodes

There is a third type of nodes called extension nodes, which will be discussed extensively in the next article. Luckily, they don’t add any woes from the RLP standpoint.

Is RLP really needed?

Are all these optimization useful? Not really.

For instance, this is how much the space would increase if the rlp < 32 bytes rule wasn’t used:

To be compared to the ~300GB that a full sync needs these days.

Likewise, not using the hex prefix would cause an increase of ~100MB. A mere 0.03% increase, which can be countered still, by carefully choosing the encoding format of the binary trie.

Binary tries — the last nail in the RLP coffin

With binary tries, branch nodes become much simpler: a node has a maximum of two children, that are pointed at by their hashes. The size of a Keccak256/Sha256 hash is fixed to 32 bytes.

At this stage, it becomes obvious that a general-purpose encoding scheme that has values of any length isn’t warranted. For example, assume that each node in the new binary trie has the following fields:

  • left child hash, used as a pointer;
  • right child hash, used as a pointer;
  • value (optional), the value that is stored at the key used to get to this node;
  • prefix (optional) which is intended as a replacement of the hexary trie’s extension node.
Example trie storing (key, value) pairs (0xcafe, 0x00) and (0xcaff, 0x01): the root node is prefixed by the key chunk “0xcafe” with a number to indicate that only 7 bits out of the last byte are used for the prefix. Then it has a pointer to each of its children, and no value. Each child has an empty prefix (marked by bit count 0x00), no child and a value.

A client could encode the node data with a simple two-byte header:

Intermediate node serialization header proposal

This model only requires two extra bytes, as opposed to at least 5 for the same model using RLP. The first byte contains the following flags:

  • If bit #7 is set, the first 32 bytes of the payload following this header contain the hash of the left child. If the bit is not set, then the left child hash is the empty hash;
  • If bit #6 is set, the subsequent 32 bytes represent the right child hash. If the bit is not set, then the right child hash is the empty hash;
  • If bit #4 is set, then the header has an extra byte giving the number of prefix bits following the left/right child hashes;
  • If bit #5 is set, then the remaining bytes in the payload represent the value

An important point is that this approach also covers the functionalities of the hex prefix, which can thus be deprecated.

Conclusion

I confess that with this post, I have as much been seeking to explain how RLP works as venting the frustration I accumulated working with RLP.

To be fair, the design is far from bad, and it has no doubt served its purpose over the last 5 years. It has become apparent over time, that its complexity is too hefty a price to pay, and I hope to have convinced you that RLP can advantageously be replaced in the context of storage tries.

The storage trie format proposed in this first post is far from complete, and more aspects will be covered in further installments.

Many thanks to Sina Mahmoodi and Martin H. Swende for their feedback.

--

--

Guillaume Ballet

#geth core dev @ #Ethereum. Interested in scalability, privacy, and decentralized organizations. Seeking to bring more fairness in human interactions.