EVM Bytecode Merklization

Published in

ewasm

7 min readApr 7, 2020

TL;DR Stateless clients need the code for every contract invoked in a block to be sent as part of the block witness. Contract codes are the second biggest contributors to the stateless block bandwidth overhead. Code merklization was believed to help reduce this overhead. This article explains in detail how we can split code into chunks, merklize those chunks, and transfer only the chunks necessary for a transaction. We’ll then see some initial data showing 40–60% savings on the total amount of code transferred, based on an experiment done on recent mainnet blocks.

Stateless blocks are big

Code merklization has been around as an idea, mainly with the purpose of code de-duplicating, though it was never fully explored. But it recently re-gained life for a different purpose, namely to reduce bandwidth requirements for stateless clients (e.g. in this post). If you want to know what the motivations behind stateless clients are, I encourage you to check out this recent summary or Alexey Akhunov’s post which released data from his experiment. I won’t go into the details of the model in this post, but for the sake of completeness provide a short summary of the relevant details (feel free to skip the next paragraph if you’re familiar).

Under the stateless model, (at least some) nodes wouldn’t need to store the state and would depend on other nodes (e.g. miners) to include all the necessary state parts (including contract code) in the block along with merkle proofs that attest to their validity. This implies a significantly higher network bandwidth. Alexey Akhunov and the turbo-geth team have been doing experiments to measure the size of block witnesses for historical mainnet blocks. Below you see such measurement for 50000 recent blocks. The red line tracks the amount of contract code needed to be sent in a stateless block, which is the second biggest contributor to block witness size. If Ethereum migrates from the current hexary trie to a binary one, the hash component of these witnesses will be reduced by ~3x, making code the biggest contributor to witness size.

Data from https://github.com/mandrigin/ethereum-mainnet-bin-tries-data. Chart shows breakdown of stateless block witnesses for 50000 relatively recent mainnet blocks. The values are a moving average with a window of 128 blocks.

Don’t send the whole code

Intuitively we can assume that a given transaction would touch only part of the contract code it invokes (say 2 out of 4 functions). The goal is therefore to split the code in chunks and only send the chunks necessary for the given transaction (plus proofs that the chunks are valid) in the block witness. If our assumption is right and transactions do indeed use a small-ish part of their bytecode (spoiler alert: early data suggests this is the case), the contract code component of the block witness in the chart above could be significantly reduced.

To see exactly how this would work, let’s imagine a new contract is being deployed. We pass over the code and identify the basic blocks (see algorithm). Note that clients already do a single pass over code for JUMPDEST analysis, therefore this doesn’t add a high overhead. These basic blocks have two characteristics:

Each block either starts at index 0, or at a JUMPDEST . This is to allow stateless clients to perform jumpdest analysis safely (more on this later).
There’s no control flow change in each block (e.g. no jump). Therefore we can be certain once we start executing a block either it will run to the end or it will run out of gas. We hypothesize this is more efficient, but haven’t yet tested it against alternatives.

For efficiency reasons neighboring blocks are merged until each has a minimum length of 128 bytes (this is a configurable parameter). They are then inserted into a trie, using the index of their first byte as key. The client finally stores the root of this trie in the newly created account record for that contract. As you can see below, the code trie becomes effectively a sub-trie of the state trie (similar to the storage trie).

Merklized code would become a sub-tree of the state trie. To simplify the diagram I used a binary trie (instead of the Merkle Patricia Tree used in Ethereum). The paths are also inaccurate and don’t really represent the keys.

Let’s test the contract by submitting a transaction that invokes it. The miner executes the transaction and marks the chunks touched during execution (in this example chunk #1 and #3). When publishing the block, the miner includes a proof of the contract account within the state, and a turboproof of all the touched code chunks.

The touched chunks and all the hashes necessary to verify the codeRoot are sent as a turboproof.

Upon receiving this block, a stateless client can verify that the contract is part of the state and has the correct attributes balance, nonce, stateRoot and codeRoot. It can then verify the code chunks and their keys against codeRoot. These information are enough for the client to re-construct a partial bytecode from the chunks, leaving every other chunk zeroed out. Note that due to our chunking algorithm the client knows every chunk (except the first one) starts with a JUMPDEST and can therefore perform jumps safely.

From the turboproof we can re-construct the bytecode. The chunks which are not needed for the given transaction are left zero.

The experiment

To test this out we wrote a prototype which fetches mainnet blocks as well as their prestate from Geth’s RPC endpoint. It then runs the transactions in those blocks and whenever encountering a new contract, splits it into chunks and marks the touched chunks. After every transaction in the block has been processed it generates turboproofs for the chunks. We run the transactions again on a fresh prestate only replacing the contract codes with the re-constructed code we computed from the proofs. To check for correctness we compare the amount of gas used and the filter bloom of the block. Running this process for 50 of the recent blocks we can see somewhere between 40% to 60% reduction in code size.

Caveat: The data seems promising, but please keep in mind that we’d need numbers from tens of thousands of blocks for a compelling argument, and the prototype is in an early phase and therefore possibly buggy!

Where to go from here

As you might remember, the minimum length of each block was a configurable parameter (which was set to 128 bytes for the data above). Modifying this parameter influences the chunk witness size in two opposing ways. Decreasing it, e.g. to 32 bytes, makes the chunks more granular, thereby decreasing the total amount of code that has to be sent. But at the same time increases the trie depth which in turn increases the number of hashes we need for the proof. A next step would be to do a more thorough analysis of minimum chunk sizes to see if there’s a value for which we get the most saving. Regardless of the min chunk size, switching to a binary trie instead of the hexary trie will reduce the number of hashes needed in the proof to one fourth (see here), thereby reducing the chunk witness size even further.

For this prototype we opted to split code into basic blocks. There exist however various other chunking algorithms, some simpler some more complicated. The simplest one is to simply divide code into fixed-sized chunks (e.g. each 32 byte becomes a chunk). Currently the only problem we see with this approach is around push data and jumpdest analysis.

To expand on that: if we split the bytecode at arbitrary boundaries, the operand of a PUSH opcode, or any other multi-byte opcode introduced in future, can be mistaken for a JUMPDEST (0x5b) by the client receiving the chunk. As you can see below, a client which has the full code can see the jump is invalid because 0x5b is an operand of PUSH1 and halt execution, but a client receiving chunks #6 and #8 but not #7 will jump to position 41 thereby interpreting the contract differently. Later in the article we’ll briefly mention approaches with arbitrary boundaries that avoid this issue.

To work around this issue, Martin Holst Swende proposed adding a metadata to each chunk which specifies how many of the first bytes are push data. Then the verifier can skip those bytes during jumpdest analysis. An alternative route that Alexey is exploring is disallowing dynamic jumps in EVM (at least for contracts that wish to use the code merklization feature), allowing us to statically analyse jumps once at deploy-time, and not during every execution of the code. Alex Beregszaszi proposed using the control flow graph of a contract to better guide merklization. Simultaneously Christian Reitweissner came up with a proof-of-execution approach where a merklized DAG is created from the control flow graph of a contract. I can’t do justice to his idea in this post and can only hope he writes about it at some point.

It might turn out that different chunking algorithms have only negligible efficiency improvements, in which case the least complex algorithm would become the most sensible choice. The good news is that we have at least one algorithm which based on early data seems to significantly reduce the amount of code needed to be transmitted in a stateless block.

This article was about merklizing EVM bytecode in particular, but the general idea is not limited to EVM. In fact other Ewasm team members are simultaneously experimenting with merklizing Wasm code, which has its own set of challenges. These mainly arise from the fact that Wasm code is comprised of multiple sections and has strict validation prior to execution, which means the re-constructed bytecode must pass validation. Stay tuned for more on this.

Acknowledgements: Many thanks to Guillaume Ballet, Alex Beregszaszi, and Casey Detrio from Ewasm for reviewing the post and feedback.

EVM Bytecode Merklization

Stateless blocks are big

Don’t send the whole code

The experiment

Where to go from here

Written by Sina Mahmoodi