Statelessness series — Part3: Verification of Verkle Tree

Chaisomsri
11 min readFeb 13, 2024

--

[Table of Contents]

1. Reasons for Introducing Verkle Trees

2. Why is Proof of the Tree Necessary?

3. Verification Method of Merkle Trees

4. Proof of Verkle Trees

5. Verification Method of Verkle Trees — KZG Commitment

6. Verification Method of Verkle Trees — Pedersen Vector Commitment

7. Conclusion

1. Reasons for Introducing Verkle Trees

Vitalik proposed two methods for achieving statelessness in the “A state expiry and statelessness roadmap”:

  • State expiry
  • Weak statelessness

State expiry and Weak statelessness were covered in detail in the first part. To briefly recap, Weak statelessness requires only block proposers to store the state, allowing nodes verifying the state to do so without needing to store it themselves. If the block proposer submits a witness indicating whether a specific value is included at a specific location, the verifying nodes do not need to store the state.

The reduction in the size of the witness is essential for Weak statelessness. This is because a witness must be created and transmitted each time a block is generated. The witness size created by the current Merkle tree increases as the tree’s depth and breadth expand, making it unsuitable. Verkle trees, which can create witnesses of constant size, are a promising candidate for adoption.

2. Why is Proof of the Tree Necessary?

A witness is a proof that indicates whether a specific value is included at a specific location. The introduction of Verkle trees to reduce the size of witnesses for state expiry underscores the importance of witnesses. So, does this mean the concept of proof is unnecessary for Ethereum, given that full nodes store the entire state?

Ethereum nodes can be categorized into three types: archive nodes, full nodes, and light nodes. Archive nodes store all data. Full nodes have all the data from the genesis block but do not keep historical states, though they can provide the current state. Light nodes only have block headers.

Light nodes cannot provide data to other nodes or clients and must receive proof from full nodes to verify that the data they have received is correct. Light nodes verify the data using the proof and the state root in the block headers they possess.

Therefore, archive and full nodes do not need proofs. These nodes can verify requests they receive because they possess the data. Proofs are meant for light nodes, which cannot independently verify the requests they receive.

However, proofs can be critically important for users of Ethereum. Let’s consider an example introduced in “Light Clients After the Merge” presented at Devcon 6.

Source: Light Clients After the Merge by Etan Kissling | Devcon Bogota

MetaMask, one of the most widely used wallets, requires entering an RPC URL provided by a mainnet node to send requests to the mainnet. If users do not run their own full node, they must use a node URL provided by someone else. But what if the trusted node turns out to be malicious?

Source: Light Clients After the Merge by Etan Kissling | Devcon Bogota

MetaMask does not independently verify the data it retrieves. It queries the node RPC URL to obtain information about the Ether and token balances of the address currently logged into the wallet. Therefore, if the node intentionally reports incorrect Ether amounts, MetaMask cannot detect this and displays the information to the user as is. For example, even if a user sends 1 Ether in a transaction, the node could falsely report that the Ether balance has not decreased, leading the user to believe the transaction failed and possibly sending another 1 Ether.

Source: Light Clients After the Merge by Etan Kissling | Devcon Bogota

To use MetaMask safely, it is advisable to use a proxy that retrieves and verifies proofs in front of MetaMask. Even if the node provides incorrect data, having the proof allows for the verification of this data.

Therefore, proofs are not only important for light nodes but also play a crucial role for end-users.

3. Verification Method of Merkle Trees

Source: merkle-proofs-for-offline-data-integrity

Trees are data structures that can store data. The top node of a tree is called the root node, a node with child nodes is called an internal node, and a node without child nodes is called a leaf node.

Merkle trees store data in leaf nodes, and internal nodes contain values obtained by inputting child nodes into a hash function. The reason for including hash values in the tree, not just data, is to generate proofs.

If Ethereum’s state is stored in a Merkle tree, light nodes can know whether a specific value exists in the Merkle tree by having just the value of the root node (Merkle root) and through proof.

For example, to demonstrate that C exists in the Merkle tree in the above figure, light nodes need to submit D, H(A-B), and H(E-H) as proof. Light nodes can calculate H(C-D) using C and the received D as proof, and then H(A-D) using H(C-D) and the received H(A-B) as proof. Finally, by calculating the hash value using H(E-H) and H(A-B) as inputs and comparing it with the Merkle root they have, they can verify that C exists in the Merkle tree.

Ethereum uses a Merkle Patricia tree, which satisfies the properties of both Merkle trees and Patricia trees. A Patricia tree is a tree that organizes internal and leaf nodes using the prefix of strings used as keys. Using a Patricia tree has the advantage of not needing to move all nodes to maintain the tree’s order when data is added or removed based on specific keys.

The tree used in Ethereum is a hexary Merkle Patricia tree, which, unlike the example, has a maximum of 16 child nodes. Vitalik has stated in his writings that Merkle trees are most efficient when they have two child nodes.

However, Ethereum uses 16 child nodes because it stores “state” in the tree. Ethereum has a tree where the key is the address and the value is the balance, nonce, code, etc. The problem with this tree is that values like balance and nonce associated with addresses change, and new addresses are generated with user influx. That is, modifications and insertions occur frequently, so it’s necessary to reduce these costs. Therefore, Ethereum optimized by encoding keys in hexary, resulting in 16 child nodes. Details on optimization are beyond the scope of this article, but those interested can refer to the following articles.

The drawback of the hexary Merkle Patricia tree is the large size of proofs. When generating a proof, it must include all sibling nodes at the same height. While a binary tree has only one sibling node, a hexary tree has 15, making the size much larger. Introducing statelessness, by eliminating state in exchange for frequent use of proofs, means the size of proofs becomes a significant obstacle.

Verkle trees provide the same functionality as Merkle trees while solving the issue of proof size. In the following paragraphs, we will explore how Verkle trees verify and reduce the size of proofs.

4. Proof of Verkle Trees

Source: why_verkle_trees

Verkle trees differ from Merkle trees in that the size of the proof is constant. This is possible because it’s not necessary to include sibling nodes in the proof.

In a Merkle tree, to prove the existence of the value HORSE for key 4ce, inner nodes must include their hash value along with all their children (colored in red).

However, Verkle trees do not operate this way. Verkle trees address the proof size issue using vector commitments. Simply put, a vector commitment scheme is a special type of hash function. In Verkle trees, the proof size at each level is independent of the number of sibling nodes, making it easier to increase width. Indeed, discussions are ongoing about setting the width between 256 and 1024.

Verkle Proof. Source: Verkle tries for Ethereum state with Dankrad Feist

Merkle trees stored values in internal nodes for the proof of existence. Verkle trees also have values in internal nodes, known as commitments. Unlike Merkle trees, commitments are not merely obtained by hashing but require the values of child nodes. How commitments are calculated changes the verification method for Verkle trees, as commitments affect the creation of proofs.

The formula used to generate proofs in Verkle trees is called an Opening. An Opening, Open(C, x, i, v), proves that x_i = v when the Commitment is C.

Proofs generated through Openings can be compressed into one. Therefore, instead of having one proof per level, totaling log(n) proofs, they can ultimately be compressed into a single proof, achieving a size of O(1).

How can the size of the proof remain constant regardless of the number of child nodes? A full understanding requires advanced mathematical knowledge. However, this article will avoid deep mathematical proofs to make it easier to grasp the bigger picture. First, we’ll look at KZG commitments mentioned in the early discussions of Verkle trees, and then we’ll explore Pedersen commitments, which are currently considered a promising option for adoption.

5. Verification Method of Verkle Trees — KZG Commitment

Consider a d-ary Verkle tree that can have up to d children. Each internal node holds a commitment calculated through its child nodes and can generate a proof to demonstrate the presence of a specific child node.

KZG commitments use polynomials to create commitments. The process of creating a commitment using a polynomial is as follows:

  1. Choose a w such that w^d = 1 and for 0≤i<d, w^i≠1.
  2. Use Lagrange interpolation to uniquely define a polynomial f of degree d-1 such that f(w^i)=v_i for 0≤i<d.
  3. Calculate f(s) by substituting a specific value s into f.
  4. Map f(s) to an elliptic curve to compute [f(s)] (the explanation of elliptic curves is deferred to writings by Vitalik Buterin).
  5. Use [f(s)] as the commitment.

This means using the value v_i of the ith child node to create a polynomial f and using it to calculate the commitment. There’s a precondition that the value s used to create the Commitment must be unknown to anyone. If s is exposed, it’s possible to create a different polynomial f’(x) = f(x)-x+s with the same commitment. Hence, forging a child node’s value and creating an opening that proves its existence is possible, so s must never be disclosed.

When f(w^i) = v_i, the proof is [(f(s)-v_i)/(s-w^i)]. Explained further, since f(w^i) = v_i is considered, q(x) = {f(x)-v_i}/(x-w^i) is also a polynomial. This polynomial is the Opening, and the value obtained by substituting s into the Opening is the Proof, i.e., proof is q(s). Calculating the proof this way satisfies the following equation:

Here, e represents the concept of pairing on elliptic curves.

What does this equation imply? Suppose we are a node holding a Verkle tree. We know the Commitment, i.e., f(s). A user claims that the value of the ith child node of an internal node is v_i and submits a proof. We already know f(s) and, from the user, receive v_i, w^i, and the proof [q(s)], making us aware of all elements of the equation. We can now calculate directly to check if the equality holds. If it does, it confirms that the value of the ith child node is indeed v_i as claimed by the user.

In summary, each layer of a Verkle tree has a Commitment, and submitting the corresponding proof can verify the existence of a child node’s value.

However, this does not satisfy the condition of statelessness, which demands proofs of constant size. Since a proof is needed for each layer, the total number of proofs is O(logn). To achieve constant size, these proofs must be compressed again. Simplifying greatly, the method seen above is extended to create a Polynomial for proofs. For a detailed explanation, please refer to Dankrad Feist’s article (PCS multiproofs).

If you’ve followed thus far, you’ve briefly understood how KZG commitments are used to reduce the size of Verkle tree proofs to a constant. However, the KZG commitment method has a critical drawback: creating an unknown s is very challenging. An attacker who knows s could create a forged opening to prove the existence of a non-existent value. The next section will explore Pederson vector commitments, which do not use s.

6. Verification Method of Verkle Trees — Pedersen Vector Commitment

Comparison of Pedersen+IPA and KZG. Source: inner-product-arguments

Pedersen vector commitment (hereinafter referred to as Pedersen) is a scheme for creating commitments. Inner Product Arguments (IPA) is a scheme for creating Openings on top of Pedersen. By using Pedersen and IPA together, it is possible to create proofs without the need for a trusted setup that was required with KZG.

The downside, as seen in the table above, is that the size of the proof increases and the verification time becomes longer. Although the proof size may be manageable due to its logarithmic scale, the increase in verification time to linear scale could be critical for applications.

To mitigate this disadvantage, multiproof compression and the Halo protocol, briefly mentioned earlier, are used. (1) Gathering several proofs into one proof and (2) Using Halo allows for the combined Opening to be verified in less time. The Halo protocol divides the verification step into k parts, creating and combining proofs at each step, allowing for verification in nearly O(k) time.

Applying Halo to IPA shows that, although the proofs are larger than those of KZG, the verification time is reduced to a constant. Therefore, it’s suitable for application in Verkle trees as it lacks a trusted setup while keeping the proof size small and reducing verification time.

Verge Roadmap. Source: Vitalik Twitter

Halo is a protocol applied to Zcash and is predicated on working with SNARK-based chains. That means, it must be an IPA-based SNARK block for Halo to be applicable. Indeed, looking at Verge’s roadmap, the ultimate goal is a fully SNARKed Ethereum. Creating Verkle proofs for SNARKs could reduce the proof size and verification time, preparing for statelessness.

For those who want to learn more about the operation of Pedersen, IPA, and Halo, please refer to articles by Dankrad Feist and Vitalik Buterin.

7. Conclusion

Verkle trees were proposed to reduce the size of proofs (witnesses) needed to prove the existence/non-existence of specific values at specific locations within a tree. With the application of statelessness, it’s crucial because nodes and users will frequently exchange witnesses. There are various methods to implement Verkle trees, with the Pedersen vector commitment method being a promising approach.

This article briefly covered the reasons for the introduction of Verkle trees and their verification methods. In the next article, we will look at how the structure of Merkle trees and Verkle trees differ.

--

--