Explainer & Practical Implementation Guide
What is a sparse Merkle tree?
Sparse Merkle tree is a strategy of calculating a Merkle root in a reduced memory space. The computation is constrained but by a clever algorithm that allows the root calculation to be a single contiguous calculation. Overall computation necessary for root calculation is increased but, it is an incremental root calculation distributed across multiple
deposit method invocations.
For a more academic explanation of a sparse Merkle tree read the Efficient Sparse Merkle Trees white paper.
Why a sparse Merkle tree for Ethereum 2.0 deposits?
Memory is reduced by using a single array with a size the same as the tree order, in the case of the Ethereum 2.0 deposit tree it is a 32 element array. A traditional Merkle tree calculation of size 2³² SHA2 hashes would require
360,777,252,864 gas far exceeding the current gas limit of
8,000,000 for a single block (storage costs not included). Calculations demonstrated with the following values below:¹
-- SHA256BASE 60 Base cost of SHA256.
-- SHA256WORD 12 Cost of SHA256 per word.
To be fair the previous calculation is a trivial implementation that doesn’t incorporate the efficiencies of empty leaves. Regardless, each depositor would need to pay gas cost that grows exponentially at each increment of the total deposit count.
By calculating the root incrementally as described in the previous section the sparse tree algorithm allows for the gas costs to be distributed equitably among the depositors. Clean storage is only requested once at contract deployment and does not grow as the size of the tree increases.
How does a sparse Merkle tree work?
A sparse Merkle tree relies on two key principles. The first being zero hashes and the second is that leafs are inserted in numeric order by index. Below is a diagram of zero hash tree.
Zero hashes are the default value null value in the tree at a particular level. In this particular instance (an empty tree) the root is calculated as the hash of
H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0))))root = H(zerohashes,zerohashes)
Below is an example of inserting out first leaf. For example, if a leaf of value
0xA were to be inserted into index
0 the proof path labeled in pink would be the only values recalculated.
H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0))))root = H(H(H(0xA,0x0),H(0xA,0x0)),H(H(0xA,0x0),H(0xA,0x0)), zerohashes)
The zero hashes are used to rebalance the tree where
n%2!=0 at any particular order. An observation to make is that the zero hash values are standing in to allow a leaf or node to be hashed when there isn’t a value to pair with.
The implementation of
__init__ allows only 32 elements to be stored in the array
zerohashes. Notice from the example above that zero hashes remain the same across nodes comprised of zero hashes. This is the first part of the strategy for reducing memory with a sparse Merkle tree.
Insert By Index
As mentioned previously, the second principle to reducing memory size in a sparse Merkle tree is to insert leaves in numeric order of index.
validator_registration.v.py deposit contract the
zerohashes array is then copied over to another array named
branches of the same size. The
branches array is data structure that stores a single proof path for the sparse Merkle tree of the most recently inserted leaf. Each time a new leaf is inserted into the tree
branches array is updated to reflect the proof path of that leaf.
The first step when the
deposit method of the contract is called is to calculate
i. Example below:
The branch index
i will be used to calculate the depth of the tree to be hashed. The deposit parameters are hashed into a variable named
value and inserted into the
In this previous gist it can be seen that the
value is hashed with
branch[j] and the final calculated value is inserted at
branch[i]. Using this insertion algorithm allows
branches to maintain a proof path of the most recent insertion.
get_deposit_root method the contract is able to calculate a
root from the
branches array once the deposit threshold has been met. The
root calculation will match the standard calculation described in the implementation section.
To learn more about how the sparse Merkle tree is implemented in the deposit contract read the formal verification of the deposit contract.
Client Side Merkle Proof Path Calculation
Since the client needs to calculate a proof path for each individual
deposit for the root found in the
Eth2Genesis event, calculation on the client side is not computed as a sparse Merkle tree. There are several approaches that can be used to calculate a Merkle tree proof path; trading off between the coefficients of memory, computation, and storage.
Proof calculation must satisfy the
verify_merkle_branch method documented in the Ethereum 2.0 specification repository.
proof the proof path, an array of values that will be hashed together with the value
leaf to prove the inclusion of the
leaf in the
Artemis’ implementation of the same
verify_merkle_branch method can be seen below
Below you can find the implementation of the
MerkleTree class from the Artemis client, afterwards there will be a break down of each of the methods.
The constructor method is invoked when an instance of the
MerkleTree is created. The parameter
height provided is used to create an array for each level of the tree. Additionally, the
zeroHashes array is created.
generateZeroHashes method is used to fill the
zeroHashes array. The
generateZeroHashes method works identical to the method
__init__ in the
validator_registration.v.py deposit contract.
Below is an example of the
zeroHashes array when implemented properly for a Merkle tree with a height of 32.
add method will be used to add new leaves to the
MerkleTree. Each time a new leaf is added the
MerkleTree instance is marked as
dirty. Doing so allows the instance to know when it is necessary to recompute branch values.
Below is an implementation used to calculate all the branch and root values. The algorithm is iterating through each level of the tree. Each iteration has an array
child (values to be hashed) and an array
parent (place to store the hashed values). During the last iteration the
parent array is the root of the
MerkleTree. If the
child level is unbalanced the statement
Bytes32 rightNode = (j + 1 < child.size()) ? child.get(j + 1) : zeroHashes.get(i) replaces the
rightNode to be replaced with an element from
zeroHashes of the same order. The index to store the resulting hash in
parent is calculated using
Once the branches have been calculated the value
dirty is set to
false to indicated the
MerkleTree instance has an up to date calculation of the branch values.
Now to generate the proof path the method
getProofTreeByIndex is used to iterate through the
tree object and get the proof values pertinent to the deposit index provided.The algorithm employed iterates through each order of the
index value is non-final and is divided by two at each iteration. Initially if the value of
index is odd
index is decremented, otherwise increment using the statement
index = index % 2 == 1 ? index — 1 : index + 1.
A proof path of the same size as the
height will be created and returned as the array