Ethereum 2.0 Deposit Merkle Tree

Joseph Delong
6 min readJun 11, 2019

--

Explainer & Practical Implementation Guide

What is a sparse Merkle tree?

Sparse Merkle tree is a strategy of calculating a Merkle root in a reduced memory space. The computation is constrained but by a clever algorithm that allows the root calculation to be a single contiguous calculation. Overall computation necessary for root calculation is increased but, it is an incremental root calculation distributed across multiple deposit method invocations.

For a more academic explanation of a sparse Merkle tree read the Efficient Sparse Merkle Trees white paper.

Why a sparse Merkle tree for Ethereum 2.0 deposits?

Memory is reduced by using a single array with a size the same as the tree order, in the case of the Ethereum 2.0 deposit tree it is a 32 element array. A traditional Merkle tree calculation of size 2³² SHA2 hashes would require 360,777,252,864 gas far exceeding the current gas limit of 8,000,000 for a single block (storage costs not included). Calculations demonstrated with the following values below:¹

-- SHA256BASE 60 Base cost of SHA256.
-- SHA256WORD 12 Cost of SHA256 per word.

²³²(60+12+12)=360,777,252,864

To be fair the previous calculation is a trivial implementation that doesn’t incorporate the efficiencies of empty leaves. Regardless, each depositor would need to pay gas cost that grows exponentially at each increment of the total deposit count.

By calculating the root incrementally as described in the previous section the sparse tree algorithm allows for the gas costs to be distributed equitably among the depositors. Clean storage is only requested once at contract deployment and does not grow as the size of the tree increases.

How does a sparse Merkle tree work?

A sparse Merkle tree relies on two key principles. The first being zero hashes and the second is that leafs are inserted in numeric order by index. Below is a diagram of zero hash tree.

Zero Hashes

zerohashes[0]='0x000...'=0x0
zerohashes[1]='0xf5a...'=H(0x0,0x0)
zerohashes[2]='0xdb5...'=H(H(0x0,0x0),H(0x0,0x0))
zerohashes[3]='0xc78...'=H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0)))

Zero hashes are the default value null value in the tree at a particular level. In this particular instance (an empty tree) the root is calculated as the hash of zerohashes[3].

root =
H(H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0))),
H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0))))
root = H(zerohashes[3],zerohashes[3])

Below is an example of inserting out first leaf. For example, if a leaf of value 0xA were to be inserted into index 0 the proof path labeled in pink would be the only values recalculated.

root =
H(H(H(0xA,0x0),H(0xA,0x0)),H(H(0xA,0x0),H(0xA,0x0))),
H(H(H(0x0,0x0),H(0x0,0x0)),H(H(0x0,0x0),H(0x0,0x0))))
root = H(H(H(0xA,0x0),H(0xA,0x0)),H(H(0xA,0x0),H(0xA,0x0)), zerohashes[3])

The zero hashes are used to rebalance the tree where n%2!=0 at any particular order. An observation to make is that the zero hash values are standing in to allow a leaf or node to be hashed when there isn’t a value to pair with.

Next, if the an example of how the validator_registration.v.py contract from the Ethereum 2.0 specification generates zero hashes.

https://github.com/ethereum/deposit_contract/blob/ed9c81dd6788142d22106df93d5654578063eb32/deposit_contract/contracts/validator_registration.v.py#L21
https://github.com/ethereum/deposit_contract/blob/ed9c81dd6788142d22106df93d5654578063eb32/deposit_contract/contracts/validator_registration.v.py#L28-L31

The implementation of __init__ allows only 32 elements to be stored in the array zerohashes. Notice from the example above that zero hashes remain the same across nodes comprised of zero hashes. This is the first part of the strategy for reducing memory with a sparse Merkle tree.

Insert By Index

As mentioned previously, the second principle to reducing memory size in a sparse Merkle tree is to insert leaves in numeric order of index.

In the validator_registration.v.py deposit contract the zerohashes[] array is then copied over to another array named branches[] of the same size. The branches[] array is data structure that stores a single proof path for the sparse Merkle tree of the most recently inserted leaf. Each time a new leaf is inserted into the tree branches[] array is updated to reflect the proof path of that leaf.

The first step when the deposit method of the contract is called is to calculate i. Example below:

https://github.com/ethereum/deposit_contract/blob/ed9c81dd6788142d22106df93d5654578063eb32/deposit_contract/contracts/validator_registration.v.py#L89-L96

The branch index i will be used to calculate the depth of the tree to be hashed. The deposit parameters are hashed into a variable named value and inserted into the branch[] array.

https://github.com/ethereum/deposit_contract/blob/ed9c81dd6788142d22106df93d5654578063eb32/deposit_contract/contracts/validator_registration.v.py#L112-L117

In this previous gist it can be seen that the value is hashed with branch[j] and the final calculated value is inserted at branch[i]. Using this insertion algorithm allows branches[] to maintain a proof path of the most recent insertion.

Using the get_deposit_root method the contract is able to calculate a root from the branches[] array once the deposit threshold has been met. The root calculation will match the standard calculation described in the implementation section.

To learn more about how the sparse Merkle tree is implemented in the deposit contract read the formal verification of the deposit contract.

Client Side Merkle Proof Path Calculation

Since the client needs to calculate a proof path for each individual deposit for the root found in the Eth2Genesis event, calculation on the client side is not computed as a sparse Merkle tree. There are several approaches that can be used to calculate a Merkle tree proof path; trading off between the coefficients of memory, computation, and storage.

Proof calculation must satisfy the verify_merkle_branch method documented in the Ethereum 2.0 specification repository. proof the proof path, an array of values that will be hashed together with the valueleaf to prove the inclusion of the leaf in the Eth2Genesis root.

https://github.com/ethereum/eth2.0-specs/blob/v0.5.1/specs/core/0_beacon-chain.md#verify_merkle_branch

Artemis’ implementation of the same verify_merkle_branch method can be seen below

https://github.com/PegaSysEng/artemis/blob/00c678b010ee83e5c2a7d387f40407c669a3b1fc/ethereum/datastructures/src/main/java/tech/pegasys/artemis/datastructures/util/BeaconStateUtil.java#L211-L222

Below you can find the implementation of the MerkleTree class from the Artemis client, afterwards there will be a break down of each of the methods.

The constructor method is invoked when an instance of the MerkleTree is created. The parameterheight provided is used to create an array for each level of the tree. Additionally, the zeroHashes array is created.

The generateZeroHashes method is used to fill the zeroHashes array. The generateZeroHashes method works identical to the method __init__ in the validator_registration.v.py deposit contract.

Below is an example of the zeroHashes array when implemented properly for a Merkle tree with a height of 32.

[ '0x0000000000000000000000000000000000000000000000000000000000000000',
'0xf5a5fd42d16a20302798ef6ed309979b43003d2320d9f0e8ea9831a92759fb4b',
'0xdb56114e00fdd4c1f85c892bf35ac9a89289aaecb1ebd0a96cde606a748b5d71',
'0xc78009fdf07fc56a11f122370658a353aaa542ed63e44c4bc15ff4cd105ab33c',
'0x536d98837f2dd165a55d5eeae91485954472d56f246df256bf3cae19352a123c',
'0x9efde052aa15429fae05bad4d0b1d7c64da64d03d7a1854a588c2cb8430c0d30',
'0xd88ddfeed400a8755596b21942c1497e114c302e6118290f91e6772976041fa1',
'0x87eb0ddba57e35f6d286673802a4af5975e22506c7cf4c64bb6be5ee11527f2c',
'0x26846476fd5fc54a5d43385167c95144f2643f533cc85bb9d16b782f8d7db193',
'0x506d86582d252405b840018792cad2bf1259f1ef5aa5f887e13cb2f0094f51e1',
'0xffff0ad7e659772f9534c195c815efc4014ef1e1daed4404c06385d11192e92b',
'0x6cf04127db05441cd833107a52be852868890e4317e6a02ab47683aa75964220',
'0xb7d05f875f140027ef5118a2247bbb84ce8f2f0f1123623085daf7960c329f5f',
'0xdf6af5f5bbdb6be9ef8aa618e4bf8073960867171e29676f8b284dea6a08a85e',
'0xb58d900f5e182e3c50ef74969ea16c7726c549757cc23523c369587da7293784',
'0xd49a7502ffcfb0340b1d7885688500ca308161a7f96b62df9d083b71fcc8f2bb',
'0x8fe6b1689256c0d385f42f5bbe2027a22c1996e110ba97c171d3e5948de92beb',
'0x8d0d63c39ebade8509e0ae3c9c3876fb5fa112be18f905ecacfecb92057603ab',
'0x95eec8b2e541cad4e91de38385f2e046619f54496c2382cb6cacd5b98c26f5a4',
'0xf893e908917775b62bff23294dbbe3a1cd8e6cc1c35b4801887b646a6f81f17f',
'0xcddba7b592e3133393c16194fac7431abf2f5485ed711db282183c819e08ebaa',
'0x8a8d7fe3af8caa085a7639a832001457dfb9128a8061142ad0335629ff23ff9c',
'0xfeb3c337d7a51a6fbf00b9e34c52e1c9195c969bd4e7a0bfd51d5c5bed9c1167',
'0xe71f0aa83cc32edfbefa9f4d3e0174ca85182eec9f3a09f6a6c0df6377a510d7',
'0x31206fa80a50bb6abe29085058f16212212a60eec8f049fecb92d8c8e0a84bc0',
'0x21352bfecbeddde993839f614c3dac0a3ee37543f9b412b16199dc158e23b544',
'0x619e312724bb6d7c3153ed9de791d764a366b389af13c58bf8a8d90481a46765',
'0x7cdd2986268250628d0c10e385c58c6191e6fbe05191bcc04f133f2cea72c1c4',
'0x848930bd7ba8cac54661072113fb278869e07bb8587f91392933374d017bcbe1',
'0x8869ff2c22b28cc10510d9853292803328be4fb0e80495e8bb8d271f5b889636',
'0xb5fe28e79f1b850f8658246ce9b6a1e7b49fc06db7143e8fe0b4f2b0c5523a5c',
'0x985e929f70af28d0bdd1a90a808f977f597c7c778c489e98d3bd8910d31ac0f7' ]

Next, the add method will be used to add new leaves to the MerkleTree. Each time a new leaf is added the MerkleTree instance is marked as dirty. Doing so allows the instance to know when it is necessary to recompute branch values.

Below is an implementation used to calculate all the branch and root values. The algorithm is iterating through each level of the tree. Each iteration has an arraychild (values to be hashed) and an arrayparent (place to store the hashed values). During the last iteration theparent array is the root of the MerkleTree. If the child level is unbalanced the statement Bytes32 rightNode = (j + 1 < child.size()) ? child.get(j + 1) : zeroHashes.get(i) replaces the rightNode to be replaced with an element from zeroHashes of the same order. The index to store the resulting hash in parent is calculated using j/2.

Once the branches have been calculated the value dirty is set to false to indicated the MerkleTree instance has an up to date calculation of the branch values.

Now to generate the proof path the method getProofTreeByIndex is used to iterate through the tree object and get the proof values pertinent to the deposit index provided.The algorithm employed iterates through each order of the tree. The index value is non-final and is divided by two at each iteration. Initially if the value of index is odd index is decremented, otherwise increment using the statement index = index % 2 == 1 ? index — 1 : index + 1.

A proof path of the same size as the MerkleTree height will be created and returned as the array proof.

  1. https://ethgastable.info/

--

--