Ethereum 2.0 Deposit Merkle Tree
Explainer & Practical Implementation Guide
What is a sparse Merkle tree?
Sparse Merkle tree is a strategy of calculating a Merkle root in a reduced memory space. The computation is constrained but by a clever algorithm that allows the root calculation to be a single contiguous calculation. Overall computation necessary for root calculation is increased but, it is an incremental root calculation distributed across multiple deposit
method invocations.
For a more academic explanation of a sparse Merkle tree read the Efficient Sparse Merkle Trees white paper.
Why a sparse Merkle tree for Ethereum 2.0 deposits?
Memory is reduced by using a single array with a size the same as the tree order, in the case of the Ethereum 2.0 deposit tree it is a 32 element array. A traditional Merkle tree calculation of size 2³² SHA2 hashes would require 360,777,252,864
gas far exceeding the current gas limit of 8,000,000
for a single block (storage costs not included). Calculations demonstrated with the following values below:¹
-- SHA256BASE 60 Base cost of SHA256.
-- SHA256WORD 12 Cost of SHA256 per word.
²³²(60+12+12)=360,777,252,864
To be fair the previous calculation is a trivial implementation that doesn’t incorporate the efficiencies of empty leaves. Regardless, each depositor would need to pay gas cost that grows exponentially at each increment of the total deposit count.
By calculating the root incrementally as described in the previous section the sparse tree algorithm allows for the gas costs to be distributed equitably among the depositors. Clean storage is only requested once at contract deployment and does not grow as the size of the tree increases.
How does a sparse Merkle tree work?
A sparse Merkle tree relies on two key principles. The first being zero hashes and the second is that leafs are inserted in numeric order by index. Below is a diagram of zero hash tree.
Zero Hashes
zerohashes[0]='0x000...'=0x0
zerohashes[1]='0xf5a...'=H(0x0,0x0)
zerohashes[2]='0xdb5...'=H(H(0x0,0x0),H(0x0,0x0))
zerohashes[3]='0xc78...'=H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0)))
Zero hashes are the default value null value in the tree at a particular level. In this particular instance (an empty tree) the root is calculated as the hash of zerohashes[3]
.
root =
H(H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0))),
H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0))))root = H(zerohashes[3],zerohashes[3])
Below is an example of inserting out first leaf. For example, if a leaf of value 0xA
were to be inserted into index 0
the proof path labeled in pink would be the only values recalculated.
root =
H(H(H(0xA,0x0),H(0xA,0x0)),H(H(0xA,0x0),H(0xA,0x0))),
H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0))))root = H(H(H(0xA,0x0),H(0xA,0x0)),H(H(0xA,0x0),H(0xA,0x0)), zerohashes[3])
The zero hashes are used to rebalance the tree where n%2!=0
at any particular order. An observation to make is that the zero hash values are standing in to allow a leaf or node to be hashed when there isn’t a value to pair with.
Next, if the an example of how the validator_registration.v.py
contract from the Ethereum 2.0 specification generates zero hashes.
The implementation of __init__
allows only 32 elements to be stored in the array zerohashes
. Notice from the example above that zero hashes remain the same across nodes comprised of zero hashes. This is the first part of the strategy for reducing memory with a sparse Merkle tree.
Insert By Index
As mentioned previously, the second principle to reducing memory size in a sparse Merkle tree is to insert leaves in numeric order of index.
In the validator_registration.v.py
deposit contract the zerohashes[]
array is then copied over to another array named branches[]
of the same size. The branches[]
array is data structure that stores a single proof path for the sparse Merkle tree of the most recently inserted leaf. Each time a new leaf is inserted into the tree branches[]
array is updated to reflect the proof path of that leaf.
The first step when the deposit
method of the contract is called is to calculate i
. Example below:
The branch index i
will be used to calculate the depth of the tree to be hashed. The deposit parameters are hashed into a variable named value
and inserted into the branch[]
array.
In this previous gist it can be seen that the value
is hashed with branch[j]
and the final calculated value is inserted at branch[i]
. Using this insertion algorithm allows branches[]
to maintain a proof path of the most recent insertion.
Using the get_deposit_root
method the contract is able to calculate a root
from the branches[]
array once the deposit threshold has been met. The root
calculation will match the standard calculation described in the implementation section.
To learn more about how the sparse Merkle tree is implemented in the deposit contract read the formal verification of the deposit contract.
Client Side Merkle Proof Path Calculation
Since the client needs to calculate a proof path for each individual deposit
for the root found in the Eth2Genesis
event, calculation on the client side is not computed as a sparse Merkle tree. There are several approaches that can be used to calculate a Merkle tree proof path; trading off between the coefficients of memory, computation, and storage.
Proof calculation must satisfy the verify_merkle_branch
method documented in the Ethereum 2.0 specification repository. proof
the proof path, an array of values that will be hashed together with the valueleaf
to prove the inclusion of the leaf
in the Eth2Genesis
root.
Artemis’ implementation of the same verify_merkle_branch
method can be seen below
Below you can find the implementation of the MerkleTree
class from the Artemis client, afterwards there will be a break down of each of the methods.
The constructor method is invoked when an instance of the MerkleTree
is created. The parameterheight
provided is used to create an array for each level of the tree. Additionally, the zeroHashes
array is created.
The generateZeroHashes
method is used to fill the zeroHashes
array. The generateZeroHashes
method works identical to the method __init__
in the validator_registration.v.py
deposit contract.
Below is an example of the zeroHashes
array when implemented properly for a Merkle tree with a height of 32.
[ '0x0000000000000000000000000000000000000000000000000000000000000000',
'0xf5a5fd42d16a20302798ef6ed309979b43003d2320d9f0e8ea9831a92759fb4b',
'0xdb56114e00fdd4c1f85c892bf35ac9a89289aaecb1ebd0a96cde606a748b5d71',
'0xc78009fdf07fc56a11f122370658a353aaa542ed63e44c4bc15ff4cd105ab33c',
'0x536d98837f2dd165a55d5eeae91485954472d56f246df256bf3cae19352a123c',
'0x9efde052aa15429fae05bad4d0b1d7c64da64d03d7a1854a588c2cb8430c0d30',
'0xd88ddfeed400a8755596b21942c1497e114c302e6118290f91e6772976041fa1',
'0x87eb0ddba57e35f6d286673802a4af5975e22506c7cf4c64bb6be5ee11527f2c',
'0x26846476fd5fc54a5d43385167c95144f2643f533cc85bb9d16b782f8d7db193',
'0x506d86582d252405b840018792cad2bf1259f1ef5aa5f887e13cb2f0094f51e1',
'0xffff0ad7e659772f9534c195c815efc4014ef1e1daed4404c06385d11192e92b',
'0x6cf04127db05441cd833107a52be852868890e4317e6a02ab47683aa75964220',
'0xb7d05f875f140027ef5118a2247bbb84ce8f2f0f1123623085daf7960c329f5f',
'0xdf6af5f5bbdb6be9ef8aa618e4bf8073960867171e29676f8b284dea6a08a85e',
'0xb58d900f5e182e3c50ef74969ea16c7726c549757cc23523c369587da7293784',
'0xd49a7502ffcfb0340b1d7885688500ca308161a7f96b62df9d083b71fcc8f2bb',
'0x8fe6b1689256c0d385f42f5bbe2027a22c1996e110ba97c171d3e5948de92beb',
'0x8d0d63c39ebade8509e0ae3c9c3876fb5fa112be18f905ecacfecb92057603ab',
'0x95eec8b2e541cad4e91de38385f2e046619f54496c2382cb6cacd5b98c26f5a4',
'0xf893e908917775b62bff23294dbbe3a1cd8e6cc1c35b4801887b646a6f81f17f',
'0xcddba7b592e3133393c16194fac7431abf2f5485ed711db282183c819e08ebaa',
'0x8a8d7fe3af8caa085a7639a832001457dfb9128a8061142ad0335629ff23ff9c',
'0xfeb3c337d7a51a6fbf00b9e34c52e1c9195c969bd4e7a0bfd51d5c5bed9c1167',
'0xe71f0aa83cc32edfbefa9f4d3e0174ca85182eec9f3a09f6a6c0df6377a510d7',
'0x31206fa80a50bb6abe29085058f16212212a60eec8f049fecb92d8c8e0a84bc0',
'0x21352bfecbeddde993839f614c3dac0a3ee37543f9b412b16199dc158e23b544',
'0x619e312724bb6d7c3153ed9de791d764a366b389af13c58bf8a8d90481a46765',
'0x7cdd2986268250628d0c10e385c58c6191e6fbe05191bcc04f133f2cea72c1c4',
'0x848930bd7ba8cac54661072113fb278869e07bb8587f91392933374d017bcbe1',
'0x8869ff2c22b28cc10510d9853292803328be4fb0e80495e8bb8d271f5b889636',
'0xb5fe28e79f1b850f8658246ce9b6a1e7b49fc06db7143e8fe0b4f2b0c5523a5c',
'0x985e929f70af28d0bdd1a90a808f977f597c7c778c489e98d3bd8910d31ac0f7' ]
Next, the add
method will be used to add new leaves to the MerkleTree
. Each time a new leaf is added the MerkleTree
instance is marked as dirty
. Doing so allows the instance to know when it is necessary to recompute branch values.
Below is an implementation used to calculate all the branch and root values. The algorithm is iterating through each level of the tree. Each iteration has an arraychild
(values to be hashed) and an arrayparent
(place to store the hashed values). During the last iteration theparent
array is the root of the MerkleTree
. If the child
level is unbalanced the statement Bytes32 rightNode = (j + 1 < child.size()) ? child.get(j + 1) : zeroHashes.get(i)
replaces the rightNode
to be replaced with an element from zeroHashes
of the same order. The index to store the resulting hash in parent
is calculated using j/2
.
Once the branches have been calculated the value dirty
is set to false
to indicated the MerkleTree
instance has an up to date calculation of the branch values.
Now to generate the proof path the method getProofTreeByIndex
is used to iterate through the tree
object and get the proof values pertinent to the deposit index provided.The algorithm employed iterates through each order of the tree
. The index
value is non-final and is divided by two at each iteration. Initially if the value of index
is odd index
is decremented, otherwise increment using the statement index = index % 2 == 1 ? index — 1 : index + 1
.
A proof path of the same size as the MerkleTree
height
will be created and returned as the array proof
.