Statelessness series — Part1: State Expiry & History Expiry

Chaisomsri
8 min readFeb 7, 2024

--

[Table of Contents]

1. Introduction

2. What is State Expiry?

3. What is History Expiry?

4. In Conclusion

1. Introduction

In blockchain research, the Blockchain Trilemma is a well-known concept that suggests the three goals a main chain strives for — Decentralization, Security, and Scalability — are in a trade-off relationship with each other, making it difficult to achieve all three simultaneously. The Ethereum community prioritizes decentralization as its most important value and has contemplated various ways to improve scalability without compromising decentralization and security, even publishing a roadmap for these plans.

The two topics that the Ethereum team is focusing on, MEV (Miner Extractable Value) and Stateless, may seem unrelated but both share the common goal of enhancing Ethereum’s decentralization. For instance, the idea of making block production accessible to everyone, to prevent MEV from becoming a benefit only some centralized nodes can obtain, is a movement to prevent the centralization of powerful nodes.

Moreover, as Ethereum’s chain size grows, raising the minimum hardware requirements to operate the chain, efforts to make the chain as light as possible to avoid centralization to a few nodes are also considered efforts towards decentralization. Although recent discussions in the Ethereum research forum have been more centered around MEV than Stateless, it’s clear that the increasing chain size is also a problem that must be addressed eventually.

Hardware Requirements for Operating Ethereum Full Nodes and Archive Nodes (Source: https://blog.woodstockfund.com/2022/04/21/deep-dive-into-eip-4444/)

In this article, we will discuss State Expiry and History Expiry, which are being mentioned as methods to reduce the hardware requirements needed for node operation.

To summarize briefly, State Expiry involves deleting parts of the State Trie maintained by the full node at regular intervals to prevent the Trie’s size from increasing indefinitely. History Expiry, on the other hand, involves the regular Ethereum full nodes completely deleting very old past block data.

2. What is State Expiry?

To understand the bigger picture, let’s revisit the concept of Stateless. When a new block is created on the Ethereum chain and new transactions are included in this block, full nodes execute these transactions to verify their validity. To determine if a transaction is valid, the current State Trie is required. This State Trie stores the state of all accounts at the current moment, allowing verification of whether the current state has any issues in executing a specific transaction. For example, if address A broadcasts a transaction to transfer 1 ETH to B, but A’s current balance is less than 1 ETH, then this transaction would be deemed invalid.

The State Trie contains the state of all accounts that have appeared in over 16 million blocks from Ethereum’s genesis block to the present. The size of the State Trie is nearing 50GB, and it’s continuously growing, not just in storage but also in the resources needed for searching and modifying. It’s also understood to be inefficient because over 90% of these accounts have not been used for a long time, yet they are still stored.

The concept of Stateless is about reducing the size of the State Trie held by Ethereum full nodes, and it can be divided into two directions: 1) Weak Statelessness and 2) State Expiry.

1) Weak Statelessness

First, Weak Statelessness is the idea that only the block proposer should store the state, while the rest of the nodes verifying the transactions should not maintain the State Trie. Currently, the block proposer (Block Proposer) broadcasts only the blocks containing transactions. However, if the block proposer could also broadcast the transactions along with proof (witness) that the transactions are valid, then the other verifying nodes would not need to hold the entire State Trie. While Weak Statelessness could lighten the load on many verifying nodes, it places a heavier demand on block proposer nodes to generate witnesses, which could accelerate centralization. Therefore, Weak Statelessness alone is insufficient, and State Expiry is needed.

2) State Expiry

State Expiry is the idea of periodically deleting and reconstructing the State Trie maintained by Ethereum full nodes.

The current period under discussion is one year, meaning that by resetting the State Trie every year and creating it anew, nodes would naturally not need to store states of seldom-used accounts (dormant accounts), reducing the Trie’s size and preventing it from growing indefinitely. However, indiscriminately deleting the Trie would obviously lead to situations where new transactions could not be validated if the account that initiated the transaction was not in the State Trie. To prevent such situations, a method to recover the state of accounts that existed in the old State Trie is essential.

Source: https://notes.ethereum.org/@vbuterin/verkle_and_state_expiry_proposal

In discussions at ethresear.ch, the structure for State Expiry is such that the State Trie is reset every year, storing only the State Trie of the previous period and keeping only the root values of State Tries that are more than two periods old. If the current period is referred to as the T period, there are (T-1) and (T) State Tries, with only the (T) State Trie subject to change.

Furthermore, to store which period (hereafter referred to as epoch) an account accessed belongs to, an Extended Address Scheme is used instead of the current address system. The term ‘year’ is denoted as an epoch here, which is different from the epoch in Ethereum’s consensus algorithm, Gasper (1 epoch = 32 slots). In the Extended Address Scheme, the current Ethereum address is used with the epoch number attached in front, denoted as (e,s).

(e,s): e is the epoch, and s is the current 160-bit address used in Ethereum.

(e,s) is an address corresponding to epoch e, so if the current epoch is less than e, access to (e,s) is not possible. For easier understanding, let’s denote the State Trie at epoch e as S_e. The cases where the State Trie is referenced can be divided into four scenarios. We will examine what information is needed to correctly load the State in each scenario.

(1) If the state of (e,s) changes during Epoch e

Full nodes can access and modify S_e, the State Trie at epoch e, so the situation is similar to how Ethereum’s full nodes currently access the State Trie. Full nodes have the State Trie at epoch e, so they can read the current state of (e,s) and execute transactions to modify the state.

(2) If the state of (e,s) changes during Epoch e+1

S_{e+1} does not contain (e,s), and the current full node has both the previous and current State Tries, S_e and S_{e+1}. The previous State Trie, S_e, is read, and (e+1,s) is newly added or changed in S_{e+1}.

(3) If the state of (e,s) changes during Epoch f > e+1

Full nodes only store S_f and S_{f-1}. If (e,s) is part of S_f and S_{f-1}, it means (e,s) was called at f-1 or f, and it can be modified as in (1) or (2).

However, if (e,s) is not included in S_f and S_{f-1}, the entity wanting to change (e,s) must broadcast a witness that (e,s) existed in S_e, along with an absence proof that (e,s) was not included in S_{e+1}, …, S_{f-2}.

In summary, through State Expiry, full nodes only store the State Trie of the most recent 1–2 years, keeping only the root of previous periods’ State Tries. To retrieve the account state from a previous period, the account owner submits a witness proving their state at the most recent point and an absence proof that it was never used up to a certain period. This scheme prevents the State Trie’s size from growing indefinitely.

3. History Expiry

Nodes store all block data, transaction data, and some receipt data from the genesis block to the current block (approximately 16 million), amounting to about 500GB in size. History Expiry refers to the idea that full nodes delete all block and transaction data from a certain time in the past and only keep the most recent blocks. Known as EIP-4444 (History Expiry), the discussed storage period is about one year.

The purposes of Historical Data include 1) Syncing, 2) Maintenance of dAPPs, etc. When running an Ethereum client like Geth for the first time, it connects to peer nodes to request and receive block data, executing all block and transaction data from the genesis block to the current block.

However, if all Ethereum nodes in the world were to delete all block data older than a year, it would be impossible for a new node to execute all transactions directly from the genesis block. Therefore, it’s possible to periodically change specific checkpoints and use them like the genesis block, so that only block and transaction data after the checkpoint are received. Nodes attempting to sync anew will have to trust the first checkpoint they receive. Proposals also include storing deleted historical data on distributed servers like IPFS or torrent.

Moreover, if dAPPs require information from transactions in very old blocks, it could become difficult for them to request this data from other nodes. dAPPs would have to store transaction and receipt data themselves or rely on a few archive nodes that hold all transaction data from the genesis block.

History Expiry is not a change at the Ethereum protocol level but at the client level. It’s not about creating new rules for Ethereum, but clients like Geth or Erigon default to deleting data older than a year. Deleting old block data might seem risky at first glance, but it doesn’t significantly compromise safety.

In fact, Geth clients already allow users to delete past block data from their computers. There’s no obligation to provide block data to other nodes requesting a sync; it’s done out of goodwill. Storing past block data is not a mandatory responsibility for full nodes. Theoretically, or in the worst-case scenario, changing the client default to not store past data might seem insignificant. However, realistically, the default settings of clients control the behavior of the majority of full nodes that don’t modify their clients, potentially impacting the entire Ethereum network.

Changing the default for data storage might seem risky and as if the blockchain is abandoning its function. However, Ethereum is a state machine; its existence is not for the preservation of historical data but to execute new transactions and reach a new state through consensus on the current state. If checkpoints can be trusted, there’s no issue in receiving and directly executing transactions after the checkpoint to sync up to the current state, trusting a checkpoint is not much different from trusting the genesis block.

4. In Conclusion

This article discussed State Expiry and History Expiry, parts of the roadmap for Stateless Ethereum. In summary, the idea is to prevent the State Trie size from growing indefinitely and delete old block data that the node has been storing for a long time to lighten the client. The Address Space Extension (ASE) system in State Expiry, not covered in detail in this article, will be discussed in the next article.

References

--

--