Statelessness series — Part4: Exploring the Verkle Trie structure

Chaisomsri
12 min readFeb 15, 2024

--

[Table of Contents]

  1. What is a Verkle Tree?
  2. Verkle Trie Structure
  3. Merkle Trie VS Verkle Trie

The Verkle Trie is one of the key changes for the introduction of statelessness, and the Ethereum Foundation has been continuing the development and research of the Verkle Trie to replace the MPT (Merkle Patricia Trie) currently used for state storage. The Verkle Trie documentation has been updated until recently, and currently, implementations of the Verkle Trie can be found in (Python, Go, Rust). Following the verification of the Verkle Trie, this article will focus on the structure of the Verkle Trie to provide a detailed understanding of it.

What is a Verkle Trie?

The Verkle Trie is a combination of Vector and Merkle tree, using Vector Commitment instead of the keccak256 hash function found in traditional Merkle trees. The Vector Commitment (VC) scheme used in the Verkle Trie is based on the Pedersen Commitment (also known as Pedersen Hash), which is based on polynomial commitment.

The use of commitments eliminates the need for sibling nodes to generate a proof, and since it is based on elliptic curves, operations can be performed. This allows for the addition (multiproof) of the proofs required at each level of the tree, reducing the size of the proofs. A constant-size proof can be included in the block header, enabling weak statelessness.

(source: SalomonCrypto)

Verkle Trie Structure

2-layer structure(source: ethereum yellow paper, ethereum reddit)

Ethereum has a total of four trees: the World State Trie, Receipts Trie, Transaction Trie, and Account Storage Trie. In this structure, the leaf nodes of the State Trie contain data for each account such as nonce, balance, codeHash, and the root of the storage tree (StorageRoot). This structure, where a tree contains another tree, is referred to as a 2-layer structure (Tree-inside-a-tree). The 2-layer structure has the following problems:

1. Complexity

2. Imbalance

3. Difficulty in understanding interactions between mechanisms such as state expiration

Because the tree contains another tree, it requires additional processing. Similarly, when attempting to access account header items (i.e., nonce, balance, code) that are not storage, complex processing is needed in various areas such as database reads/writes, Merkle proof construction, Merkle proof verification, synchronization, and caching. Additionally, while the World State Trie only contains the storage root, this actually signifies a large amount of data, making this imbalance a significant nuisance in the state sync protocol.

state trie와 storage trie (soure: cryptoauxiliary)

For example, in the existing state tree, the data mapped to the key for account 0x58…ae84 would include nonce, balance, storageRoot, and codeHash. Here, the storageRoot only holds a root value, not the actual data. To access a storage slot, the account 0x58…ae84 and the storage slot are used as keys to read values in the Storage Tree (Account Storage Trie). If the storage tree is updated, the storageRoot value in the state tree that was holding it gets updated according to the changed tree content. Thus, the 2-layer structure requires separate logic for account-level and storage-slot-level manipulation, and similar issues arise in state expiry, which is intended for statelessness. Due to these inefficient issues, Vitalik has proposed a single-layer structure.

single layer structure (source: ethereum reddit)

This structure maps data to a 32-byte single key at all locations within the state, resulting in a single key:value structure. This means that access to nonce, balance, as well as storage slots is possible using the same key structure.

eg. (address, storage_slot), (address, NONCE), (address, balance) ..

As a result, each account’s storage is located in another part of the state tree rather than being a subtree within the state tree, eliminating the tree-inside-tree structure. For a single-layer structure, values sharing the first 31 bytes of the key are included in the same bottom-layer commitment. This can save space for witnesses needed for verifying many header fields, code chunks, or adjacent storage slots. The Verkle Trie uses Tree Key to represent this and save witness space.

Tree Keys

Tree Key’s Structure(source: ethereum foundation youtube)

The Tree Key of a Verkle Trie is 32 bytes, consisting of a 31-byte stem and a 1-byte suffix. The suffix allows for distinguishing the state information (account header data, code, storage) stored by the Tree Key.

Parameters of the tree key generation function (source: Verkle Tree EIP)
Interpretation based on suffix value (source: Verkle Block Sample)

In the context mentioned, *_LEAF_KEY represents the suffix of the Tree Key, and *_OFFSET indicates the position of a child node within 32 bytes (256 bits). For example, if the last 1 byte of a Tree Key is 00, it represents the account version; 01 represents the account’s balance; 02 represents the account’s nonce. The nonce and balance previously held in the state tree are stored with Tree Keys having suffixes 01 and 02, while storage and code hash are stored based on the offset values mentioned.

def get_tree_key(address: Address32, tree_index: int, sub_index: int):
# Asssumes VERKLE_NODE_WIDTH = 256
return (
pedersen_hash(address + tree_index.to_bytes(32, 'little'))[:31] +
bytes([sub_index])
)

This explanation introduces the tree key generation function featured in the Verkle Trie EIP.

def get_tree_key_for_version(address: Address32):
return get_tree_key(address, 0, VERSION_LEAF_KEY)

def get_tree_key_for_balance(address: Address32):
return get_tree_key(address, 0, BALANCE_LEAF_KEY)

def get_tree_key_for_nonce(address: Address32):
return get_tree_key(address, 0, NONCE_LEAF_KEY)

# Backwards compatibility for EXTCODEHASH
def get_tree_key_for_code_keccak(address: Address32):
return get_tree_key(address, 0, CODE_KECCAK_LEAF_KEY)

# Backwards compatibility for EXTCODESIZE
def get_tree_key_for_code_size(address: Address32):
return get_tree_key(address, 0, CODE_SIZE_LEAF_KEY)

The keys for nonce and balance are generated using *_LEAF_KEY values as suffixes.

def get_tree_key_for_code_chunk(address: Address32, chunk_id: int):
return get_tree_key(
address,
(CODE_OFFSET + chunk_id) // VERKLE_NODE_WIDTH,
(CODE_OFFSET + chunk_id) % VERKLE_NODE_WIDTH
)
def get_tree_key_for_storage_slot(address: Address32, storage_key: int):
if storage_key < (CODE_OFFSET - HEADER_STORAGE_OFFSET):
pos = HEADER_STORAGE_OFFSET + storage_key
else:
pos = MAIN_STORAGE_OFFSET + storage_key
return get_tree_key(
address,
pos // VERKLE_NODE_WIDTH,
pos % VERKLE_NODE_WIDTH
)

On the other hand, the code hash and storage are generated based on offsets, where VERKLE_NODE_WIDTH is 256, indicating the size of a tree’s child nodes. Unlike the MPT, which had 16 child nodes, the Verkle Trie has 256 child nodes.

Let’s examine some examples of tree keys.

tree keys (source: verkle block sample)

These tree keys belong to an account, showing different suffixes. Each Tree Key holds a different value, and the stored value corresponds to what the suffix indicates. The data stored under the Tree Key is also represented in 32 bytes.

For instance, if the keys are for the address 0x71562b71999873DB5b286dF957af199Ec94617f7, the balance for that address would be stored under 0x274cde18dd9dbb04caf16ad5ee969c19fe6ca764d5688b5e1d419f4ac6cd1601.

If there are no tree keys with suffixes within the range of 128256 (0x800xff) for this address, it indicates that it is an Externally Owned Account (EOA).

tree keys (source: verkle block sample)

For the tree keys mentioned, you can see they have suffixes within the range of 0x80~0xff. Therefore, it can be understood that the address holding these tree keys is a Contract Account (CA). The balance and code owned by the contract are stored in the data area under the tree key according to the suffix.

tree key 와 vekle tree (source: ethereum foundation youtube)

As shown in the description, having the tree key alone allows direct access to the leaf node (=suffix node). This means that data with the same stem’s tree keys all converge to the same leaf node, and these data are stored under one commitment of the suffix.

Merkle Patricia Tree Structure (source: Leo Zhang)

This is in contrast to the original Merkle Patricia Tree, where only one piece of data was stored in each leaf node. In the Verkle Trie, leaf nodes are more akin to branch nodes rather than the leaf nodes of the MPT.

Inner Node & Suffix Node(extension node)

The Verkle Trie consists of two types of nodes: inner nodes and suffix nodes.

verkle tree structure (source: ethereum foundation blog)

The dashed box represents the leaf node, that is, the suffix node. The use of Vector commitment allows for an expansion from the original 16 child nodes to 256 child nodes.

Suffix Node

The leaf node (= suffix node, referred to as leaf node for ease of understanding) holds a commitment, which is structured as follows:

Leaf node structure (source: Verkle Tree Dev)
  • 1: A marker for the suffix node, which is 1 on the elliptic curve but does not literally mean the number 1.
  • Stem: The stem refers to the stem in the tree key.
  • C1, C2: Are Pedersen Commitments.
(source: ethereum foundation blog)

Ultimately, a commitment value that commits these four values is included as the commitment in the leaf node. So, what form does the commitment of data take?

C1, C2 Structure (source: Verkle Tree Dev)

Values sharing the same stem converge into one leaf node. The leaf node reached by a stem includes data of different suffixes, where Pedersen commitment C1 is for data with suffixes 00 ~ 7F, and Pedersen commitment C2 is for data with suffixes 80 ~ ff. In other words, account version, balance, nonce… code hash are stored in C1, and the rest of the data are stored in C2. The reason for this division is that the creation of Pedersen Commitment is limited to committing up to 256 values of maximum 253-bit size, and for 256-bit values, data loss occurs.

When storing 32-byte data under a tree key, the following process is undertaken (this description is based on the Go implementation of the Verkle Trie):

  1. Depending on the suffix, the data become v0, v1… v255. If the suffix is 20, the value is located at v32.
  2. To calculate the leaf node’s commitment, C1 and C2 are computed. Here, v0v127 are included in C1, and v128v255 are included in C2.
  3. For C1, each 32-byte value of v0~v127 is divided into the upper 16 bytes (v1,0) and the lower 16 bytes (v1, 1) to serve as coefficients in a polynomial. This allows for the creation of a 256-degree polynomial from 128 pieces of data, with each coefficient’s data being 16 bytes (128-bit). C2 is constructed similarly.
  4. The 256-degree polynomial is committed to compute C1 = commit([(v0,0), (v0,1), (v1,0), (v1,1)…(v127,0),(v127,1)]) and C2 = commit([(v128,0), (v128,1), (v129,0), (v129,1) … (v255,0),(v255,1)]). Since each coefficient is data of 128 bits from 256 pieces, no data loss occurs.
  5. C1, C2, 1, and the stem value are committed to compute Cc = commit([1, stem, C1, C2]). This C becomes the commitment for the leaf node.

Therefore, the commitment of each node includes its child nodes, serving a similar role to hashes in the Merkle Tree.

Inner Node

The inner node holds the stem value of the tree key and stores 256 pointers to sub-nodes. Simply put, it represents the path from the root to the leaf node, based on the stem. This is similar to the branch node in the MPT.

inner node structure (source: verkle tree dev)

In the illustration, C0, C1 … C255 represent the commitments of sub-nodes, and the inner node contains these commitments. Let’s understand the structure of inner nodes and the Verkle Trie by looking at the changes in inner nodes when inserting tree key-value pairs into the Verkle Trie.

  1. tree key: 0x00..20 [insert empty]
Refer to the go-verkle implementation

Since there is no node with the path “00,” an empty insertion creates a leaf node directly under the root node “00.” Since the suffix is 20, the value is included in the leaf node’s C1.

2. tree key: 0xdefe…64 [insert empty]

Refer to the go-verkle implementation

Before the insertion of the 3rd tree key, similar to case 1, “de” is in an empty state, so a leaf node is created under the root node “de.”

3. tree key: 0xde03a8..02 [insert leaf node]

Refer to the go-verkle implementation

Upon the 3rd insertion, a leaf node is already positioned at “de.” In this case, new inner nodes are inserted until a different path from the existing leaf node’s tree key appears. The inserted inner node holds pointers to sub-nodes. The previously existing 2nd node is stored at “fe,” and the 3rd node is stored at “03.”

4. tree key: 0xde03a8..02 [insert inner node]

Refer to the go-verkle implementation

Similarly, upon the 4th insertion, due to the new inner node inserted at “de” from case 3, an inner node is positioned at “de.” The insertion location moves down to the sub-node (the new inner node inserted in case 3), and the path becomes “03.” Since a leaf node is located at “03,” inner nodes are inserted until a different path appears, as described in case 3. The Verkle Trie with the 1st, 2nd, 3rd, and 4th tree key-value inserted ultimately has the structure described above.

Through the process of inserting tree keys, we have explored the role of inner nodes, the structure of the Verkle Trie, and the form of leaf nodes. If example values are inserted into the Verkle Trie, it would be as follows:

Verkle Trie Structure (source: Verkle Block Sample, you can view it larger by clicking on the picture)

In the illustration, I represents Inner node, L represents Leaf node, and C represents Commitment. In summary:

  • The Verkle Trie consists of two types of nodes: leaf nodes and inner nodes.
  • A tree key contains a stem and a suffix.
  • The same stem corresponds to the same leaf node.
  • Data is stored differentiated by the suffix of the tree key.
  • The tree key is encoded byte by byte along the path from the root to the leaf node.
  • Data is included in the commitment of the leaf node.

Merkle Tree (MPT) VS Verkle Trie

Before we begin, this article focuses on the Verkle Trie, so we will skip over detailed explanations of the Merkle Tree.

Similarities

  • Data elements are stored as key-value pairs.
  • Keys are encoded byte by byte along the path from the root to the node containing the value.
  • Each node has a cryptographic commitment that commits to the value and position of sub-nodes.
  • The position of nodes and cryptographic commitments depend solely on the content of the data, not on the order in which data is created and updated.

If you are familiar with the structure of the Merkle Tree and have understood the Verkle Trie’s tree key, inner node, and leaf node as described above, it should be easy to grasp the similarities between the two tree structures.

Differences

  • The Merkle Tree has a nested tree structure (trie-inside-trie), while the Verkle Trie has a single flat trie structure.
  • Instead of the keccak256 hash function used in the Merkle Tree, the Verkle Trie generates cryptographic commitments using Pedersen commitments.
  • The Merkle Tree uses 20-byte keys, while the Verkle Trie uses 32-byte keys.
  • The Merkle Tree has 16 sub-nodes, but the Verkle Trie has 256 sub-nodes.

Pedersen commitments are based on polynomials and generate proofs and commitments using evaluations at specific positions, eliminating the need for sibling nodes and uniquely determined by specific values (for more details, see IPA). This allows for maintaining a constant proof size even as the tree’s width increases, solving a major issue Ethereum faced in transitioning to a stateless-client model: the size problem of witnesses.

--

--