Optimistic State Prefetching (OSP)

Published in

RootstockLabs: Research & Technology

17 min readJan 30, 2023

A recent paper suggests that the EVM performance can be increased 6x with speculative execution. The idea is to use the information gathered by executing transactions while they are in a node’s mempool to optimistically choose execution paths when the transaction is executed within a block. Vitalik Buterin has questioned this research arguing that this speedup corresponds to the average case, but in the worst case it may be the opposite: a huge slowdown. Attackers can try to create worst-case transactions to delay block processing of competing block producers. While we agree with Vitalik, a new scheme presented in this article enables a comparable speedup (yet to be proven by benchmarks), without the original downsides and with almost no extra complexity. We propose that users creating transactions can specify which storage cells can be prefetched during transaction propagation but instead of giving them explicitly as EIP-2930 access list, we let network nodes compute the access list on-the-fly in the mempool. The users bet that the state addresses accessed will be exactly the same when executing the transaction inside a block. The benefit over access lists is that access lists are costly in terms of bandwidth, so the incentive to provide them is low. We increase the incentive to use implicit access lists. We also add an extra economic incentive not present in EIP-2930: we reward senders that can predict the full set of state objects accessed, including all objects read, and not only a proper subset. If the state objects required when the transaction is executed in the block equals the state objects predicted by the user, the user wins and he receives a discount on the transaction cost. If he errs, he must pay a penalty for the unused prefetched cells.

State Prefetcher

Ethereum’s state consists of key-value pairs cryptographically indexed by a Merkle-Patricia trie. The transaction execution on the critical path accesses the blockchain’s state through the StateDB objects, each corresponding to a snapshot of Ethereum’s state. Looking up the value associated with a key involves iteratively loading StateDB objects from the disk and decoding all the serialized intermediate nodes from the trie’s root down to the leaf node, which contains the value. The node containing the value and all the decoded intermediate nodes are cached to expedite future lookups.

When transactions are broadcast over the peer to peer network, the bytecode is not actually executed. Only the source account is checked for sufficient funds, the nonce is checked for correctness and digital signature is verified. However, some of the original Ethereum clients did in fact execute the transactions in the mempool, using a “pending” state that was updated with each new transaction processed from the mempool. The first versions of Ethereumj did exactly that (but not as an optimization, but because the coders thought this was not problematic). This idea was later discarded as it can lead to resource exhaustion attacks.

We propose the use of this type of pre-execution for certain transactions with protection measures to avoid DoS attacks. The pre-execution allows trie node data prefetching, enabling caching storage cells for immediate use. We call this technique Optimistic State Prefetching (OSP). We propose caching directly the trie nodes that will be used during block processing, which allows a simple caching strategy based on data expiration. Also, caching trie nodes enable optimizing storage writes, because storage writes require reading trie nodes along the trie path to the trie node to be written, and those complementary nodes can be stored in the cache. Caching higher-level objects such as storage cell values is also possible, but requires cache invalidation techniques that are more expensive, and cannot improve writing speed.

By reducing the number of state database lookups on the critical path of block propagation, we can speed up block processing or increase the block gas limit to process more transactions. This does not mean we do less processing, but we do pre-processing spread over the whole block interval, so that when a new block arrives, much less work is done.

In a nutshell, we execute the transaction in the current state context of the network node to pre-fetch optimistically the storage cells that we may need later on block processing.

If data is prefetched during transaction forwarding in the mempool, then in the optimistic case paying a high fee for storage I/O is not required because all data read will always be in memory. If blockchain synchronization is still trying to catch up with the latest block, then storage access is still necessary as there won’t be any prefetching. In the worst case, when optimistic data prefetch fails because a different set of storage cells is read during block processing, then the sender can be forced to pay a fee multiple times higher than normal, as a penalty. Therefore DoS attacks become very costly for the attacker.

In this article, we’ll assume that all storage access costs (SLOAD, EXT*, BALANCE, CALLs) are made of a fixed cost and a variable storage rent. Discounted fees are achieved by reducing the storage rent component, while the penalty fees come from an increased storage rent. The same can be achieved by altering the fixed cost without storage rent, but assuming the existence of a storage rent makes reasoning about incentives straightforward.

Formalization of an OSP Scheme

For a given transaction T, we define the state access set sas(T) as the list of unique addresses of all storage and account objects read or written during the processing of T in a certain specified context. For Ethereum, which maintains one trie per contract, the sas() can consist of pairs (account address, storage address) with nil storage address indicating the EOA record. For Rootstock which uses a single trie to store the world state, the set can also be built using the trie paths of the objects within the world state trie. The implementation does not change the properties of the proposal.

The user wallet can choose to tag a transaction T with certain context information X, so that:

When the transaction is being forwarded (from one peers’ mempool to another peers’ mempool), the network node executes T and collects sas(T). The trie nodes accessed during the execution of T are kept in a short term cache.
Optionally (see the “Attacks and Solutions” section) a network node will forward the transaction T only if sas(T) matches the context information X.
When the transaction T is executed inside a block, we charge storage rent at a discounted rate if sas(T) matches X, or we penalize the user with a higher storage rent rate if it does not.

If wallets can provide accurate tagging information for almost all transactions, then we’ll have a system that almost never charges storage rent.

Existing Solutions

Ethereum implemented access lists in EIP-2930. This EIP introduces a new field in the transaction which specifies X as a list of account addresses and storage cell addresses that the contract will read or write with high probability. Therefore nodes can prefetch the trie nodes required to access those storage cells when the transaction is still in the mempool, and store those trie nodes in a memory cache. During transaction execution in a block, the addresses specified in the access list are loaded normally, but if they are read or written by the EVM code, they get a gas cost discount (a warm read). The proposal does not impose that nodes should cache the values in the access list while transactions are in the mempool, but may be suggesting this strategy (“clients can pre-load the data from databases”). However, the geth client does not seem to be prefetching any trie node related to elements in the access list. Therefore this technique is currently not used as OSP.

The access list OSP solution has the drawback that it expands the size of the transaction significantly, and therefore a large part of the storage cost savings is instead spent in additional transaction data. More precisely, each storage key in the access list costs 1900 gas, which is almost as high as the COLD_SLOAD_COST (2100) defined in EIP-2929, leading to savings generally lower than 0.5% of the transaction cost per cell.

The transaction data cost cannot be arbitrarily lowered, as 32 bytes of calldata cost 512 gas in Ethereum and 2048 gas in RSK. Therefore the costs cannot be lowered to less than 25% of the cold access cost.

The Proposed OSP Solution

We propose a new solution that provides benefits similar to access lists but without the need to give this list explicitly. Instead, we let the transaction execution be used to build the access list. We make X equal to the cryptographic hash on sas(T). The state access set is specified as the list of all trie keys read or written in a specific order (i.e lexicographic), and therefore the construction is deterministic. We call this hash summary(T). At the end of transaction execution we compute Y=summary(T), and charge storage rent at a discounted rate if X ==Y. Otherwise, we charge rent using a penalty factor. To prevent MEV, wallets that are unsure which storage addresses will be read or written should refrain from adding the context information X.

Consensus Changes

This proposal does not require storage rent, and incentives can be defined based on fixed gas cost. However, we present our proposal based on storage rent because it becomes easier to reason about gas costs.

This proposal requires the following consensus changes:

Storage rent
Reduction of the cost of state reading opcodes and increase the cost of storage rent so that storage rent represents the majority of the cost.
Addition of a new field to the transaction called Predicted Accessed Keys Hash or pakHash for short (the field addition is done according to EIP-2718 transaction envelopes). This field pakHash represents what we’ve been calling the context information X. Similar to the set sas(T), the elements listed in pakHash can be pairs of addresses (for Ethereum or Rootstock) or unique trie paths (only for Rootstock).
Addition of a new field to the transaction called storageRentGasLimit. For transactions having this field, the storage rent will always be paid from this field.
50% of the normal transaction gas limit is paid unconditionally. This is to prevent an attack where a broadcast transaction consumes a lot of resources but it consumes nothing when included in a block.
50% of the storageRentGasLimit is paid unconditionally.
If the pakHash field is not present in the transaction, then charge 2X storage rent (a part of it can be burned).
If the pakHash field is present then:
(a) Before storage rent is charged, compute Y=summary(T) as the hash digest of the sorted serialization of all accessed cell addresses.
(b) If the Y is different from pakHash, charge 2.5X storage rent (a part can be burned)
(c ) If Y is equal to pakHash, then charge 1X storage rent (a part can be burned)

To clarify, the rent timestamp update is still based on actual storage rent logic, but the rent collected is scaled up (penalty) or down (discount). While the penalty/discount factors chosen are somehow arbitrary and could be better selected, the ascending relation between the costs in the cases (a), (b) and (c ) must be preserved so that he highest incentive is to use OSP, and the lowest is to DoS-attack OSP.

As specified in the list of consensus changes, part of the storage rent can be burned. The burned part can be zero, be a specific percentage or be equal to the remainder of setting an equal payment for miners in all three possibilities.

If we fix the percentage paid to miners (i.e. always 1X), and the remainder is burned, then miners have no preference for certain types of transactions over the others. This may be desired to provide use case neutrality and increase fairness.

However, costs if (a), (b) and (c ) must selected so that the user creating the transaction will prefer to issue transactions in the following prioritized order:

Successful prediction: Transactions with correct pakHash (pays 1X). However, the user’s wallet will need to overestimate the storage rent to prevent transaction failure due to low storageRentGasLimit, which incurs an additional cost.
No prediction: Transactions without pakHash (pays 2X).
Failed prediction: Transaction with incorrect pakHash (pays 2.5X) plus payment of storageRentGasLimit overestimation.

Note that storage rent gas limit overestimation does not mean overpayment in case of successful prediction. It only means that wallets must be willing to commit to pay a higher fee in case of a failed prediction.

An alternative proposal to this proposal is to use EIP-2929 and add prefetching of access lists of block N when processing block (N-1). However, due to low incentive to add access lists, this technique currently does not provide a significant speedup.

Resources Required

Prefetching doesn’t seem to require too much RAM. If RSK could perform 100 tps, and the average time for a transaction to be included in a block is 10 minutes, and each transaction reads 4 storage slots, then the cache needs to be 100*60*10*4= 240k slots, which is a modest requirement. The cache would consume not more than 24 Megabytes.

One strategy to reduce the amount of resources required for prefetching is to pre-execute only a subset of all transactions reaching the mempool. For example:

Those consuming less than G gas (G=100k ?)
Those paying 10% over the minimum gas price.

In fact, we already have such a scoring system to defend from network DoS attacks. But if we want to charge storage rent or not based on certain conditions, then these conditions would need to be hardcoded in consensus rules. Therefore it’s not suggested that these types of resource optimizations are implemented.

Instead of just not pre-executing certain transactions, transactions could be enqueued and prioritized depending on the probability that they will be included in the following blocks, so that all transactions are pre-executed, although not immediately. This can help to keep a certain predefined resource consumption for pre-fetching and avoid I/O congestion.

Deprecation

One of the benefits of this proposal is that it can be deprecated if it is no longer used. Simply the new type of transaction having the pakHash field is deactivated in a network upgrade.

Attacks And Solutions

We present several potential attacks and we show how the current design defends from them, or how it can be defended by taking additional protective measures.

Overall the attack surface is increased by this proposal as transactions need to be processed while in the mempool, even if we can defend against the proposed attacks. We suggest that transactions of the new type initially have a CPU gas limit below the block gas limit (i.e. only 1M gas), and are not broadcast if having higher gas limits, to prevent unforeseen attacks.

DoS Attacks

A DoS attack to this scheme would consist of a transaction that reads a number of very old storage cells (implying a high storage rent) while it is propagated, but that reads no storage cell when executed within a block. However, when the transaction is propagated, the transaction gas limit will need to be higher than the gas limit specified for storage rent (consumed by accessing the old cells), and therefore the user will always pay at least 50% this amount (due to rule 3). This means that the actual cost of storage access can only double the amount paid. However, if there is mismatch with the pakHash, the storage rent cost is actually doubled, so the cost paid by the user is close to the total cost paid by the network, enough to prevent abuse. Only 50% of the computation may be unpaid for. Note that the additional computation cost imposed by a malicious transaction is only paid by nodes which are synchronizing the tip of the blockchain. Historical transactions cannot enforce this unpaid cost on nodes, because those transactions are not pre-executed in the mempool.

Finally, the cost of hashing to obtain pakHash is negligible compared to the time saved by not performing an I/O operation, so we estimate this cannot lead to a DoS attack vector.

Diverted Prefetch Attack

A different, more problematic attack is if an attacker builds a transaction that accesses a very different set of storage cells in mempool and block execution modes, but provides the pakHash that matches the address set accessed in block execution mode. This means that nodes in the network will load into RAM a set of cells that will never be required during block execution. This also means that other cells that may be required will be displaced from RAM caches, causing blocks to execute even slower.

We give one example of the attack: an attacker creates 100 transactions Z(i) that, depending on the state of a storage cell W, execute a code path A or a code path B. The attacker sends these 100 transactions over the network, and the paths A is executed for each and a certain set of storage cells are prefetched. Then the attacker sends a single transaction T with a high gas price that modifies the storage cell W. Block producers will insert T before the other 100 transactions Z(i). After the value of W has changed, all 100 transactions execute the path B, which accesses a completely different set of storage cells.

This attack is not easy to protect from, yet we found two potential defenses. The first, we can use hints (described in the following section) that can help most nodes avoid prefetching incorrect cells. However, we couldn’t find a solution to the problem that blocks will execute the attacking transactions Z(i) without any state prefetch, even if the attacker will be paying the discounted rent price.

The second potential solution is simply that a transaction with invalid pakHash does not propagate over the network, even if miners can include such transactions in blocks. This solution requires that nodes send the current state (blockchain tip hash) together with transactions so that peers can detect if they are or not in the same state, and therefore the transaction execution would lead to the same outcome. Network nodes should not attempt to execute the transaction in a state different from the one hinted. If they do, and they detect an invalid pakHash, they would need to drop the transaction from the mempool and they would have wasted CPU and I/O resources, which would allow peers to perform resource exhaustion or resource abuse attacks.

Therefore, if a peer receives a transaction with a hint that indicates it was executed in a different blockchain state, the node would need to drop the transaction, and not forward it. This implies that a transaction will be forwarded to a miner only as long as there is a route from its source to the destination miner sharing a similar blockchain state in a short period, which is a strong assumption. This period corresponds to the average block interval, which is 12 seconds on average in Ethereum and 33 seconds on average in RSK.

Rent for CPU Swap Attack

The storageRentGasLimit field was added to prevent a specific attack. Let’s suppose this field is not added to the transaction. An attacker creates a transaction T that, when executed in the mempool, spends all the gas allowed in storage rent, by loading many very old cells.

When the transaction is executed in the block, a different execution path is taken: the transaction spends all the gas in CPU-intensive instructions, not I/O. Therefore the network has wasted a high amount of I/O resources that were unpaid for.

To fix this problem we split the transaction gas limit field into CPU and storage rent payments, and unconditionally charge 50% of each one. Now the transaction T will consume half of the specified rent gas even if it does not access any cell. This separation of gas fields may complicate the design of wallets, but reduces the attack surface.

Transaction Execution Hints

If each network node knows the blockchain state of its peers (using STATUS messages), a node that executes an OSP transaction in the mempool knows if the peer will accept it or not. Therefore, it can refrain from forwarding the transaction to peers that will reject it. For the remainder peers, it can attach to the transaction information to indicate the root state in which the transaction was executed, and a boolean flag that alerts of a state prefetch success or failure (a mismatch with the pakHash). The root state can be given by state hash or by block hash.

For a certain transaction T, a peer can collect hints from its peers before executing or broadcasting T. If the majority of peers suggest that T’s pakHash field is incorrect, the peer can avoid local execution and state prefetching and just forward the transaction with a negative hint. If the majority says the pakHash is correct, a node could also skip execution and forward the transaction with a positive hint. However, it won’t be benefited by the state prefetching. Therefore, in this case the node would execute the transaction anyway.

An alternative is that nodes cache the cell addresses of transactions with positive hints, and we add a wire command to request the cell changeset for a specific recent transaction.

Therefore nodes could re-use these changesets and avoid local execution in most cases, at the expense of a certain level of trust in peers.

To avoid being misinformed by bad or malicious hints, a node can periodically and randomly disobey the majority of hints and execute the transaction anyway. The weight assigned to hints from peers that provided incorrect hints in the past can be decreased to avoid relying on them for future evaluations.

With such a voting system, the capability of malicious actors to cause network disruption based on bad state hashes is reduced to near zero. However, it’s difficult to balance the weight of peer hints because those nodes can also be targets of attacks to incorrectly vote on hints. Nodes that forward votes will be vulnerable to malicious hints.

The Side Channel Attack

A potential drawback of the proposed solution is that users creating transactions are incentivized to negotiate directly with miners in which particular state the transaction will be executed, so that they can specify the context information X with 100% certainty of correctness, and always get the fee discount. This incentive may push miners to accept transactions over a side channel that ensures a certain execution state. The user then can pay the miner a tip for using the side channel, and if the tip is lower than the penalty of a state mismatch, both the user and the miner benefit from this deal. This is a direct incentive to create side channels to talk directly with the miners instead of going through the peer to peer network. If full nodes do not see the transaction beforehand, they won’t be able to prefetch its state. The end result is that transactions are cheaper for users that negotiate directly with miners, but the network as a whole must still perform the same amount of I/O during block propagation. Therefore we created an incentive for side channels with no additional benefit to the community.

However it seems to be very unlikely that this side channel communication system emerges because the incentives are very moderate, and because it’s complex to design and develop. It requires synchronous negotiation with all transaction senders simultaneously during the transaction signing process to assure users that all states specified X(i) are the correct ones. A single non-collaborating user can ruin the protocol and force all other users to miss the opportunity of transaction inclusion. Note that this is not the same problem faced by the zkRollups operators, as operators enqueue received transactions, and the sequencer orders them much later. The user has no guarantee of the state prior transaction execution, and ordering is an asynchronous process.

Therefore we consider this “side channel” a non-problem.

Summary

We’ve presented a proposal that can provide a speedup to transaction processing (possibly up to 6x) based on Optimistic State Prefetching (OSP). To implement it, we defined a new type of transaction (OSP) with two additional fields: pakHash and storageRentGasLimit. The first is a hash of the addresses of storage cells that the user predicts will be loaded or stored during execution of the transaction in a block, and the second is a separate storage rent gas limit. Nodes will execute these transactions when they are in the mempool and before broadcast to prefetch the nodes in the trie corresponding to the cells that will be accessed. When wallets use OSP transactions and the prediction is correct, the sender account gets a discount of transaction fees. Also, as less gas is consumed by transactions, the block gas limit can be increased and more transactions can be processed. Our proposal specifies incentives so that wallets will prefer to use OSP transactions whenever possible, and incentives to miners to include them. We have identified several potential problems, but all of them seem to have practical mitigations. We have not identified any security vulnerability of this scheme, although the attack surface increases as some transactions are executed when in the mempool, which is not standard.