This is a half backed continuation of the research into stateless clients that has been published here. But what I got so far is potentially quite promising, so I am excited to share. And, I will write a more detailed explanation of what is going on, so don’t worry if you don’t get the point of the chart below.
At the end of the previous article, I suggested two possible ways to reduce the size of the block proofs, and therefore, the bandwidth consumption of the stateless clients. One of them is adding some “statefulness” by assuming that most of the peers already saw the block proof from the previous block and you can simply show them the difference between the block proof for the current block, and the block proof for the previous block.
I have implemented that and it is currently running through the blocks (at the same I have fixed some flaws of the previous prototype, like not using compact keys and other bugs causing failures). It is currently around the block 4.7m and producing stats for the block proofs with 1 block of “statefulness”.
But then I realised, mostly intuitively, that I will have to generalise this, and make the degree of statefullness adjustable. And I wrote another prototype, where instead of making client 1-block stateful, I made it 256-blocks stateful. 256 blocks is about 1 hour. Effectively, block proofs would omit the information that was accessed by transactions within the last 256 blocks. At the moment, the implementation is very inefficient, but I managed to get some early “sneak preview” numbers from it. Lets see. First, total sizes of the block proofs:
We see substantial reduction. Why? Lets do the breakdown the same way as before:
Next, key-values for the main trie:
Keys and values for contract storage:
And now, the 2 most interesting parts. Hashes (and structure — masks) for the main trie:
And, finally, hashes (and structure — masks) for contract storage:
In order to have a chance of getting data for more than a few thousand blocks, I will need to implement a special data structure that can produce these 256-block stateful proofs efficiently. Incidentally, this would also be the data structure that would be needed in the client implementation, and it will have to be formally specified as a part of the stateless clients proposal.
What are the consequences of this?
If the next, more full batch of data look encouraging (it also needs to include the memory footprint of different modes with various shades of statefulness), then potentially the nodes in the Ethereum network would be able to flexibly decide on which point of the “tradeoff curve” they want to be:
- More stateful (currently all full nodes are assumed to be fully stateful) — consume more memory and potentially do disk I/O, but less bandwidth (because peers will be able to send you smaller proofs)
- More stateless — consume less memory, but more bandwidth (because peers will need to send you larger proofs).
When a node joins the network, it would start in a fully stateless mode, and as it “soaks” up the block proofs, it can become more and more stateful to the desired tradeoff point. Also, it might be possible for the nodes to dynamically shift along the tradeoff curve, for example, to limit memory consumption.