I wanted to look into the stateless clients for a long time, since this. Of course, during the intervening 15 months my understanding of the Ethereum state, the protocol, the software and the network has changed. For example, I no longer believe that the concept of Stateless clients can be introduced without a hard fork. But I am still glad that I was finally able to collect some data on it and to suggest couple of ways forward.
Idea of stateless clients
To illustrate the idea of stateless clients, lets start from looking at how transaction processing is usually done in Ethereum client software:
Execution of transactions in a block requires transaction data (blue rectangles) and the current state (yellow rectangle). After the execution, the current state becomes a historical state, and the new current state is created. There are some other bits of data that transactions might require (like block timestamp, or one of the 256 previous block hashes), but we won’t look at them because they are constant in size and insignificant. Execution can also produce receipts (green stacks), but they are not important in the context of stateless clients.
The main idea of stateless clients is to remove the requirement of access to the entire state when executing transaction in the block. The creator of the block supplements the block with a separate data structure, containing all the extracts from the state that are necessary to execute all the transactions in the block. In order for the executor to be sure that those extracts were indeed taken from the current state, the Merkle proofs are also supplied. This is possible because every block header carries so called “state root”, which is the root of the Merkle tree encompassing all the values in the state that existed after the block’s execution. This would lead to possibility of this kind of execution:
It is important to note that the supplemental information (let’s call it Block Proofs) is also very useful for the stateful clients (those that want to keep the entire state and possibly history). Currently, the so-called “full nodes” execute blocks against their copy of the state and then verify that the Merkle root of the state tree that they computed matched the one in the block’s header. The computation of that Merkle tree root is a resource intensive operation, and requires either large memory or lots of I/O, as described here. If there is a supplemental information, it becomes possible for the full nodes to verify the correctness of the execution very easily, most likely without any I/O at all, and then simply update its current and historical state databases based on the execution results. They don’t need to have a cache of the state trie in memory, and their interaction with the state and state history database becomes predominantly write-only. It is illustrated here:
How large are the block proofs?
Now to the biggest downside of the stateless client — the requirement to move some extra data around the network, together with the blocks. How much data are we taking about?
In order to compute the sizes of the block proofs, I have created a prototype of the stateless client. The prototype is based on Turbo-Geth, and this is what it does:
- Retrace blocks from the Ethereum mainnet, one by one
- For each block, capture the extracts of the state accessed by the transactions, as well as the byte codes of the contracts being accessed.
- Select the minimal set of hashes that are enough to verify (by recomputing the state merkle root) that the captured state extracts belong to the state
- Add some structural information to encode the proofs and the state extracts as a tree. This is likely not the best encoding, but as it turns out, its size is very insignificant compared to other pieces, so it is OK for now.
- Encode the block proof as a byte array
- Decode the byte array back into the block proof (the tree where some nodes are parts of the state, and some — hashes of the irrelevant parts)
- Execute the block again, but this time, without any access to the current state, only with the help of the block proof
- Verify that State root is correct, as well as all the storage roots.
It took a while to make it all work, but I wanted the data to be pretty accurate. As I expected, there were a lot of edge cases to handle (because the hexary Merkle Patricia trie has things like leaf keys, extensions, embedded nodes). There are still some edge cases what make my prototype fail (currently on the blocks 5340939, 5361803, 5480357, 5480507, 5480722, 5632299, 5707052, 5769636 and more…). But given that it only affects tens of blocks out of 6.7 million, I am confident that it will not distort the data enough to alter the analysis (of course, I will debug and fix those edge cases).
So far I only have the data until the block 6757045, but hope to move past that point soon.
Here is the first chart, which shows the total sizes of the block proofs (note that all the charts show moving average with window 1024):
Before the Spurios Dragon hard fork (block 2675000), it was possible with modest gas usage to generate blocks with rather large block proofs, which was done back in autumn of 2016. Lets only look at the history past the Spurios Dragon now:
The “tower” right after the Spurious Dragon hard fork is from the “state clearing”, where lots of accounts with zero balance and nonce were “touched” in order to get them removed from the state. That level of “block proof” intensity has seen been surpassed by the blocks of the second half of 2017, and then later in 2018.
Lets first break down the total size of the block proofs into 3 components: anything related to the “main” trie, anything related to the contract storage tries, and the contract bytecodes.
It is a bit hard to see, but the 2016 transaction spam caused both byte codes (red line) and “main” trie block proofs (yellow line) to soar. Some may remember that there were no limit on the size of the contract byte code (which is now 24k), that allowed for deployment of very large contracts and querying them via EXTCODESIZE opcodes or similar.
Post Spurious Dragon chart shows that main trie block proofs would have dominated for most of the history, but then the contract storage proofs (blue line) and byte codes started to catch up.
Break down of the “main” trie block proofs
Here, we will split further into 4 components. First two are the keys and the values that are being read by the transactions, or are required so satisfy some weirdness of the Patricia tree:
It looks like this state clearing operation was hard to beat in terms of sheer number of keys and values accessed. Also note that the values (usually around 80 bytes) are usually slightly bigger than keys (usually less than 64 bytes — I have just realised this could be compressed because I have been using one byte per hexadecimal digit 🤦)
Next, we look at the hashes that constitute Merkle Proofs, and those structural information that I call “masks”:
It is easy to see that the size of the structural information is negligible compared to the hashes. It is also easy to see that hashes occupy the majority of space in the block proofs (this will also be the case for the contract storage proofs).
Breakdown of the contract storage proofs
The same components as before. First, keys and values:
It is interesting to see that values here are usually much smaller (they are at most 32 bytes) than the keys.
And now the hashes and the masks:
Very similar picture to what we saw for the main trie.
Relation to the State Fees (Rent) research
One of the ideas I wanted to test with this analysis is whether it is possible to use the stateless clients approach to avoid some parts of the rent. The most challenging part is the rent for contract storage, which makes lots of existing contracts vulnerable to griefing attacks. In the version 3 of the “State Fees” proposal there is a concept of the rent pre-payment, which is aimed to temporarily protect contracts from such griefing, and to allow some limited time to migrate to a new non-vulnerable code.
The idea of using stateless client approach, at least for the contract storage part (therefore the breakdown above), is an attempt to instead create a permanent, albeit costly protection against griefing.
It is inevitable, that if some form of stateless client mechanism is introduced, the transaction senders will need to pay extra gas proportional to the size of the block proof their transactions are generating. That would make operations with the contract storage (SLOAD, SSTORE) potentially much more costly than now. Incidentally, Martin Swende’s recent analysis shows that SLOAD seems to be severely underpriced at the moment.
It is possible to apply stateless clients model to the existing contracts, and make the new contract opt-in to the contract storage with rent, which I expect would be much cheaper to use than the the one with the “block proofs”, but it will have a weird property of being able to disappear. This effectively means that we do not force the migration of the existing contract to the rent-compatible implementations within certain timeframe, but instead, incentivising them to migrate.
Suggested next steps
Obviously, extend the data collection to the more recent blocks, and fix the existing edge cases.
The size of the block proof is a big concern, even if we take contract storage component. I see two potential ways to bringing it down:
- Introducing a bit more “statefullness” into the stateless clients. What if the most of the clients are keeping N last block proofs (N could be 1, 2, 3…). Then, if you (as in an Ethereum node) know that your peer keeps that many past block proofs, you may choose only to send the part of the block proof that is not contained in those N recent ones. Your peer will be able to reconstruct the full block proof on its own. There needs to be more analysis on how beneficial this strategy is.
- Use of STARK proofs to compress the now variable “hashes” part of the block proof to a proof of the constant size (the “state of the art” STARKs have the proof size of around 60k as far as I know). This would require more research and development though, like creation of STARK circuits for Keccak256, for example, seeing how much overhead the prover will have, would we need a GPU to accelerate the prover, to seal the block quickly enough…
Perhaps, the part 2 is coming, with updates and more thoughts.