Fraud Proof Wars

Published in

L2BEAT

29 min readAug 22, 2024

Thanks to Gabriel Coutinho de Paula (Cartesi), Ed Felten (Arbitrum), and Kelvin Fichter (Optimism) for reviewing the article.

Introduction

When thinking about optimistic rollups, we originally envisioned systems with fixed challenge periods that any single honest challenger could secure. Because we took these properties for granted, we started focusing on other issues like the Verifiers’ Dilemma: if a fraud proof system works, no one will ever submit invalid roots, so no one is incentivized to check the chain, creating in return an incentive to submit invalid roots. The issue can be solved either by elaborated techniques, or by simply dismissing the problem: explorers, RPC providers, bridges, and other infra providers will have to run full nodes, and since any of them can be the single honest challenger, in practice the chain will always be secure.

We went even further, claiming that economic incentives are not even really necessary to protect the chain: anyone can be the challenger, and assuming they can bear the gas cost of performing the challenges, everything is fine. As it turns out, this statement greatly underestimates the resource requirements that defenders need. In April 2024, an ethresearch forum post titled “Fraud proofs are broken” by Gabriel (from Cartesi) tried to argue that the vision of having fixed delay proof systems with a 1 out of N assumption seems fundamentally impossible. The reason behind this claim is found in resource exhaustion attacks: an attacker can try to make the protocol confirm an invalid state root by economically outlasting honest defenders, who would then have no more funds to protect the system. If defenders need a lot of funds, the N in “1 out of N” might become dangerously small. Techniques aimed at solving this type of attack seem to greatly affect other desirable properties of the system, like fixed challenge periods. This trade-off space will be presented as the fraud proof trilemma, and, as we will see, each project has taken a different approach with its own distinct balance. The goal of the blogpost is to allow users, developers, and protocol designers to better understand these systems and spark a conversation around optimistic rollup security and their incentives.

In this article, we will generally introduce fraud proof systems and their variants, namely single vs multi-round and onchain vs offchain protocols. Then, we’ll present a basic example of a resource exhaustion attack, and how different concurrency approaches can address the issue, but also affect settlement delays. The trilemma will be then defined, highlighting the trade-off between safety, promptness, and decentralization. The 4 major multi-round fraud proof systems will be described and compared in detail, namely the (old) Arbitrum Classic protocol, the (new) Arbitrum’s BoLD protocol, Optimism’s fault proofs (OPFP), and the Cartesi protocol. The challenge period length will be discussed, as well as a possible attack that takes advantage of imperfect information. Finally, we will discuss whether fraud proofs make sense at all if we have ZK.

Background

A blockchain state is computed by applying its state transition function (STF) on an ordering of its transactions. Given that full nodes should converge on the same view of the network, such STF must be deterministic. Rollups, instead of gossiping blocks filled with transactions on their own p2p network and having their own consensus to define the ordering, delegate these tasks to a separate network called the base layer, or L1. While a rollup can also share blocks over p2p and have a separate consensus for sequencing, the final source of truth is always L1 as L2 can equivocate (i.e. post different data or with a different ordering on L1). Once L1 collects the transaction batches and finalizes an ordering, the rollup full nodes are able to compute the final state that results from them. If users want to bridge assets between the two networks, they need a way to tell the rollup that some tokens are locked on L1 and to mint them on L2, and to withdraw they need to tell L1 that tokens have been burned on L2 and to release them. To do this, both systems need to know about each other state. L2 full nodes have to fetch the data and their ordering from L1, implying that they need to run L1 full nodes themselves and can already read its state trustlessly. Since L1 nodes do not run L2 nodes, we need to convince them about the L2 state by other means. Moreover, since the full L2 state cannot be entirely published on L1, a cryptographic commitment of it is published in the form of state roots. Computing such commitments by executing all L2 transactions on L1 is mostly infeasible, so we must use sublinear strategies such as ZK proofs or fraud proofs that allow the bridge to only accept correct state roots. The former succinctly proves that all state roots are correct, while the latter just proves incorrect state roots wrong, accepting the others “optimistically”. Since accepting an invalid state root can allow an attacker to steal all funds in the bridge, we must design these protocols very carefully. In this article, we’ll focus on optimistic rollups.

Transitions of 1024 steps from the initial state (S_0) to the next state (S_1024). The S_0 state is already mutually accepted, like the STF used.

Different types of fraud proofs

Single-round vs Multi-round

How to prove that an incorrect state transition is wrong? The trivial solution is to execute all the L2 transactions past the previous confirmed state on L1, either to entirely calculate the next correct state root or to prove specific faults. This is the approach that was taken by Fuel v1, the first optimistic rollup to be live on mainnet, only supporting simple payments. While this might work for simple app-specific rollups, re-executing an entire general purpose VM execution inside another VM is far more challenging. The initial version of Optimism, called OVMv1, tried implementing single-round fraud proofs for the EVM, but the complexities of purely simulating the EVM inside itself, (e.g. see the optimistic time-travel attack) made them abandon this approach. Celestia implements two types of single-round fraud proofs, one to prove incorrectly generated extended data, which has to do with the Reed-Solomon erasure coding of the data, and the other one for state transitions. The latter is not live yet on mainnet, and, given its minimal VM, Celestia is thinking about moving directly to ZK. Another single-round approach is the one planned by Taiko: nodes can propose state roots of blocks using SGX proofs, which can be later challenged and proved wrong by a single ZK proof.

While ZK can enable single-round proofs for computations that cannot be simply re-executed on the base layer, very long computations and very high throughput chains can be challenging to be proven even with ZK. In such systems, the asserter and the challenger, which agree on an initial state and disagree on the final, work together to restrict the space of disagreement and reduce what is needed to be re-executed on L1, possibly to one single simple instruction, or something that is feasible to be proven with ZK. At the time of writing, the only project using multi-round fraud proofs with final step executed with ZK is Kroma. As their fraud proof system is still a work in progress, it will not be described in this article.

The trivial technique to find the single instruction of disagreement consists in repeatedly publishing onchain the state root produced after executing one transaction at a time from the previously agreed one, or in other words a linear search. For the nerds, a linear search takes O(n) time where n is the number of steps in the trace. Since in the worst case the protocol needs one interaction for each step, the onchain cost will result similarly to executing all the steps, making it impractical.

An example of interactive fraud proof with linear search.

A faster search (O(log n)) consists in performing a bisection over the execution trace, where the asserter asks the challenger whether they agree on the state after executing half of the steps: if they agree on the midpoint, it means they surely disagree on the second half. Otherwise, it means they surely disagree on the first half. The protocol continues recursively by providing the midpoint of either the first half or the second half of the remaining execution trace until a single instruction is found, where both parties agree on a state and disagree right after executing one step. This process also takes the name of bisection game. Finally, given the data provided as DA, the single instruction is executed on L1.

An example of interactive fraud proof with binary search (bisection). The so-found step is then executed onchain.

Multi-round proof systems with bisection games is the approach taken by Arbitrum, Cartesi, and Optimism, which will be better discussed later. In these protocols, the final onchain one-step executor is implemented as a simplified minimal VM where the original code gets compiled to. Specifically, Arbitrum implements an onchain WASM VM, Optimism a MIPS VM, and Cartesi a RISC-V VM that can execute single steps given the appropriate state witnesses.

What happens after a multi-round proof system adjudicates a winner will be discussed later in the article as designs vary.

Offchain vs Onchain

Optimistic rollups that want to implement a trust-minimized bridge with L1 must implement the fraud proof system as a series of smart contracts on L1, so the base layer can adjudicate the winner of the dispute and correctly release the funds from an escrow. In other cases, as in sovereign rollups or standalone L1s, fraud proof systems can be set up as an offchain mechanism and proofs relayed via the p2p network to each node, with the major advantage being that the challenge period can be reduced to the network latency, given appropriate assumptions. As far as we know, the only offchain fraud proof systems practically proposed are single-round, and it’s unclear whether it is feasible at all to implement offchain multi-round proof systems. In principle, each light client should act as a “referee” that gets both asserter’s and challenger’s responses and ultimately adjudicates the winner for itself.

In the rest of the article, we’ll focus on multi-round onchain fraud proof systems.

The fraud proof trilemma

Let’s start to dive into the details of the trilemma with a strawmanned challenge protocol. Imagine a simple system where everyone can submit state roots. In principle, we want one single honest challenger to be able to secure the chain. Let’s also assume that no bonds are needed to propose state roots or to challenge them. Since the bisection game is performed onchain on L1, there is a cost associated with performing each move that affects both the asserter and the challenger in terms of gas. Within this setting, a malicious asserter can attack the protocol by repeatedly proposing state roots until the honest challenger has no more funds or computational resources to fight back. The protocol is secure only if the honest party has more funds and resources than the adversary. Let’s say an optimistic rollup bridge escrows $10B in funds: an attacker might be willing to spend $9B in gas fees to attack the chain, and the defender needs to at least have the same amount of funds to be spent in challenges. If the protocol doesn’t refund each match independently but instead refunds everything to the final standing root, the attack surface is even larger since the attacker can spend $9B to attack a $1M bridge. This attack takes the name of proof of whale attack or resource exhaustion attack, and it’s a form of sybil attack. Since not everyone has $9B sitting on L1 to spend on fraud proofs, this protocol greatly affects decentralization as only a few actors can participate. By relaxing the single honest challenger assumption, the protocol needs honest challengers to have, in aggregate, more funds than the attackers have, in aggregate. If this is not the case, the safety of the system is compromised.

A resource exhaustion attack without bonds.

Now let’s try to add bonds to refund winners at the end of a challenge. The bond size must cover at least the gas fees spent in a challenge, plus an optional reward to be given to the defender. Both the asserter and the challenger need to stake bonds since an honest asserter needs to be refunded in case of malicious challengers and honest challengers need to be refunded in case of malicious asserters. Since the cost occurs in each challenge, let’s design the protocol such that asserters need to stake a bond for each proposed state root, and challengers have to stake a bond for each state root that they challenge. Note that this is not the only possible bond design and other ones will be discussed when describing the protocols proposed by the different teams. At the end of all challenges, the honest party is guaranteed to profit and attackers to lose the bond.

Full vs Partial concurrency

There are two ways the 1v1 challenges can occur: either with full concurrency or with partial concurrency. In the first case, the honest parties need to challenge all the adversaries at the same time, while in the second case, each party is allowed to engage in at most one challenge instance at a time, but different pairs can play in parallel. The advantage of the first approach is that state roots can be finalized after the fixed amount of time of one “challenge period”, while the second approach requires multiple sequential challenge periods based on the number of challenges that a party needs to go through.

Full concurrency vs partial concurrency. In the first example, a root will be finalized after one challenge period (7d), while in the second it needs two (14d) because the second honest asserter has to engage with two malicious challengers, one at a time.

Full concurrency presents a similar exhaustion attack as in the protocol without bonds: if the adversaries have more funds than the honest parties, they can create lots of state roots to consume their resources. If honest parties manage to find the initial liquidity and computational resources needed to win all of the challenges at the same time beforehand, they are guaranteed to eventually profit. In partial concurrency, given that the honest parties play one challenge at a time, each one of them just needs the liquidity for one match, which gets returned after winning plus the profit and can be used for the next match, at the expense of multiple challenge periods of delay.

Parallel vs Sequential proposals

Protocols can decide whether to use an already confirmed state root as the initial state of a proposal, or an unconfirmed state root that precedes the one newly proposed. If the already confirmed state root is used as the initial state, proposals are effectively independent of one another, meaning that the outcome of a preceding root doesn’t affect the subsequent ones. Bonds have to be staked for every root proposed and are returned only when they are confirmed. Assuming a single honest proposer, and that state roots are proposed on average every hour, with a challenge period of 7 days, a total of 168 bonds will be staked at any given time. With the other option, the confirmation of a state root is conditional on the confirmation of all the previous ones: if one state root is proven to be incorrect, it gets pruned with all the state roots following it. This pruning mechanism is crucial since a child might be correct (i.e. impossible to be proven wrong) with respect to its invalid parent, which should be discarded nonetheless. Given this dependence, bonds can be moved from an unconfirmed state root to the next one, which implicitly attests to the validity of all the unconfirmed ancestors too. If only one bond supports a branch, any successful challenge on any root would prune its entire branch since no bond would be supporting it. As we will see, this design choice also affects the ability to protect against certain types of attacks aimed at delaying confirmations.

Parallel vs Sequential disputes. Green checkmarks represent state roots that have been confirmed, while clocks represent state roots still in the challenge period. In the parallel case, the malicious challenge on S_300 doesn’t affect in any way the resolution of the subsequent S_400 state, while in the sequential case, it can. On the other hand, the challenge on S_300’ is independent from S_400.

Safety vs Decentralization vs Promptness

Let’s try to better specify the trilemma. We define a resource function that takes the amount of funds an attacker is willing to spend as input and determines how much funds the defenders need to protect the chain. Similarly, we define a delay function that also takes the attacker’s funds as input and calculates the amount of delay introduced in the system through challenges. Ideally, we want both of these functions to grow as slowly as possible. The resource function represents how safe our system is: if it grows slower, it is easier to protect the chain from a well-funded adversary.

Partial concurrency enables one single stake to defend a protocol independently from how many resources an attacker has, at the expense of a linear number of delays: the resource function is constant, while the delay function is linear. On the other side, full concurrency enables a linear resource function with a slope of 1, with a constant delay function. As we will see, intermediate solutions exist. Decentralization refers to the barrier to entry: how much does it cost to propose one state root? It turns out that, in certain protocols, decentralization can be sacrificed to either improve safety or promptness.

While trade-offs exist between these 3 properties, it’s not clear whether this is truly a trilemma that applies universally. It should be noted that, at the time of writing, no solution is known that can obtain constant resources, constant delays, and high levels of decentralization at the same time.

The (old) Arbitrum Classic protocol

The Arbitrum Classic protocol has been deployed on every version of Arbitrum since 2020, which was first described in the Arbitrum Nitro whitepaper. One improvement that it introduces compared to the simple protocols described so far is branching. Instead of just challenging assertions, the protocol forces actors to submit alternative claims. Each claim requires a stake. The collection of claims and counter-claims can be seen as a directed graph. Since the protocol makes use of sequential disputes, bonds can be moved from one state root to a child. Conflicting claims (siblings) compete against each other in 1v1 challenges to prune losing branches.

What happens when a contestant loses a challenge? As it turns out, the state root they were staked on cannot be immediately deleted, as its defender is able to lose on purpose. The issue arises because providing correct bisections is not enforced, as L1 has no way of knowing what’s the value of the state root after half of the execution trace with just the final state root. For this reason, a state root is pruned only when all parties staking on it have lost a challenge, implying that all honest actors have to simultaneously stake on the correct one since they cannot detect and trust each other.

To prevent resource exhaustion attacks, the protocol uses partial concurrency, meaning that actors can only engage in one challenge at a time. As shown already, this approach, while achieving great levels of safety, allows sybils to induce delay attacks if they’re willing to burn funds for it. The attack works as follows: the attacker creates many sybils that stake on the correct assertion and one that stakes on an incorrect one. Assuming the attacker has also control over transaction ordering (but not over inclusion, i.e. it has limited censoring abilities), it sets up challenges so that all sybils first play with the one staked on the incorrect assertion: those staked on the correct assertion will lose on purpose by bisecting incorrectly, and as slowly as possible to waste the full challenge period. The procedure repeats until all sybils staked on the correct assertions are eliminated, and the honest asserter can finally eliminate the last one that is staked on the incorrect root. Intuitively, a larger bond size (less decentralization) reduces the number of times the number of sybils the attacker can produce, and therefore the number of delays (more promptness).

A delay attack on Arbitrum Classic. The attacker can spend one stake to induce one delay (7d). The protocol forces the malicious asserter staked on the incorrect root to play challenges sequentially.

Arbitrum solves delay attacks by implementing a small whitelist of addresses that can propose state roots and challenge them. It currently contains 14 trusted actors, meaning that the maximum delay is effectively capped and the probability of it happening is minimized. On the other hand, if all of those 14 actors collude, an invalid state transition can be pushed through the protocol, significantly impacting safety.

The Arbitrum Classic protocol moves maximally away from Promptness, achieving great Safety and Decentralization properties.

The new Arbitrum protocol: BoLD

To solve delay attacks on the Arbitrum Classic protocol, Offchain Labs has been developing BoLD, a completely new challenge protocol. A short paper version was released in August 2023, while the full paper was published in April 2024. The contracts code can be found here.

In the previous protocol, honest actors struggled to trust and identify each other due to the risk of incorrect bisections. BoLD improves upon this by incorporating a commitment to the entire execution history (i.e., all intermediate state roots) in addition to the final state root. This enhancement allows bisections to be verified against the history commitment. As a result, if the initial claim is accurate, the protocol guarantees that no challenges will be lost. Honest actors can now recognize valid claims and allow anyone to perform bisections. Consequently, they only need to place a single stake on one assertion, and even pool their funds to cover this stake. This improvement in capital efficiency enables the protocol to operate with full concurrency. Since bisections are tied to their respective root claims, a bisection tree can be used for multiple challenges, eliminating the need to repeatedly publish the same midpoints.

Correct assertion challenged by two dishonest assertions. Midpoints can be shared across challenges, forming a graph. Since bisections are checked against the original claim, anyone can perform bisections anywhere.

In this protocol, one additional invalid state root can only induce one additional challenge. Bonds can now be designed to always be a multiple of the cost of one challenge, or in other words, such that the adversaries, in aggregate, always need to spend a multiple of what honest parties have to spend in response, making resource attacks much less effective. In the paper, this multiplier is called the “resource ratio” of the protocol, since the resource function is linear with a slope that can be arbitrarily set. Higher initial bonds improve the ratio, but also increase the stake requirements for the honest parties, hurting decentralization.

The disadvantage of history commitments is that calculating the Merkle root of each intermediate state and then calculating the Merkle root of all these other Merkle roots is just absurdly expensive. To give a concrete number, the number of steps between two state roots is estimated at around 2⁷⁰ steps by Offchain Labs. Instead, BoLD uses a multi-level approach: it calculates the history commitment for every big chunk of steps, and once the bisection gets down to prove the transition between two of these chunks, it recursively applies the full protocol in between the first and the last root in the chunk. The number of levels used can be arbitrary, and BoLD currently uses 3.

A three-level challenge protocol. The execution of a single step of a higher level is the execution of the full protocol with a tree made of smaller chunks. The last layer bisects over all the steps of the so-found segment, and finally executes one instruction onchain.

Given that the protocol repeats itself at a lower level, we need to ask for new bonds here as well. How big should this bond be? As it turns out, if the same bond size is chosen, a new exhaustion attack can be deployed. The attacker can stake N bonds on the top level to generate N top-level challenges, which result in N additional bonds that the honest parties have to stake at the lower level, meaning that the resource ratio gets close to 1.

A resource exhaustion attack between levels 1 and 2 given bonds of the same size. The attacker is not forced to play in the sub-challenges, but the honest player is.

The attack vector is prevented by requiring a bond size on the lower level that is scaled down by the desired resource ratio compared to the one required on the higher level. Looking at this from another perspective, the bond size at the lowest level should cover at least the cost of performing one full bisection for the honest party, and then it should be scaled up by a factor corresponding to the desired resource ratio when going up. If there are 3 levels, the lowest level bond size is 1 ETH, and the desired resource ratio is 10%, then the top level bond size should be 100 ETH. Both the resource ratio and the number of levels represent a tradeoff with the initial bond size and therefore affect decentralization, i.e. the requirement to participate in the systems.

The intuition (simplification alert!) behind claims confirmation can be explained as follows. Each claim (i.e. initial state root) is given a 7-day “chess clock” that ticks down but stops when a descendant is challenged by a conflicting midpoint. A claim gets confirmed when its clock reaches zero. To make the clock tick again, the corresponding party needs to bisect its own challenged point, until this new point is rivaled too. When reaching the final step, the invalid claim gets deleted, while the correct claim continues to tick down. Given that the correct claims’ clocks can always be made to tick, while incorrect claims’ clocks can always be stopped (which might require going down to the single step), the protocol guarantees the eventual correct resolution.

The challenge period duration in BoLD is set to 7 days: if a state root doesn’t get challenged, its associated clock will reach zero in 7d and it will be confirmed. In case of a challenge, where the honest party responds as fast as possible, the delay is still 7 days because it is always possible to make an honest state root clock tick again as soon as it is challenged. In the case where the attacker is censoring honest party transactions, the correct state root clock will be halted. BoLD assumes censorship attacks not to last more than 7 days, so settlement can take up to 14 days if the attacker manages to censor. The protocol allows stakers to move bonds from one claim to the next ones due to sequential proposals, implicitly asserting the correctness of all unconfirmed state roots that the latest assertion builds upon, as in Arbitrum Classic.

The Economics of Disputes in Arbitrum BoLD document and the BoLD AIP proposal discuss bond sizes. The initial bond is not calculated bottom-up with the cost of a full bisection scaled up by the resource ratio times the number of levels, but it is rather estimated with a top-down approach where its size must be high enough to prevent the delay of settlement in case of censorship from 7 days to 14 days. Given the amount of funds in the bridge, Arbitrum calculates the opportunity cost of not being able to move the funds for one week. Assuming a 5% APY, it results in a huge top-level bond size of 3600 ETH. The second level bond size is then calculated to be 555 ETH, and the third level bond size at 79 ETH. By calculating the ratio between these bonds, the resulting resource ratio is at around 15%, meaning that the attacker needs to have more than 6.5x the amount of funds the honest parties have (in aggregate) to perform a successful resource exhaustion attack. While these huge bond sizes affect decentralization since not everyone has that many funds to defend the protocol alone, it’s important to remember that people can pool funds together and be guaranteed to profit in the end.

The BoLD protocol tries to maximize Safety while achieving a fixed maximum settlement delay, sacrificing Decentralization.

The Cartesi protocol

Cartesi describes their fraud proof system (called Dave) in the Permissionless Refereed Tournaments paper published in December 2022, eight months before the publication of the BoLD short paper. Both protocols independently converged to the same solution regarding invalid bisections through history commitments and solved the big Merkle tree problem with multiple levels. The same reasoning also applies to timers for single challenges.

In contrast to BoLD, Cartesi decided to take the other side of the promptness trade-off and build a protocol with partial concurrency, so that honest parties have a better than linear advantage over the adversaries. To solve delay attacks, where the attacker can spend N stakes to delay the protocol for N challenge periods, the protocol is set up as a bracket tournament. Attackers will mostly eliminate themselves between each other, and the honest parties will only play a logarithmic number of matches compared to the number of sybils. Thanks to this setting, an attacker’s spending only induces logarithmic spending in terms of gas on the honest parties, with the addition of a logarithmic delay.

The honest parties, in aggregate, playing against 7 sybils, participate in just 3 matches (log2 of 8). The current implementation only refunds the winning party at the end of the tournament, but this is expected to be fixed in the next iteration of the protocol.

Note that in this protocol, history commitments are not just an optimization: if players that agree on the same state root cannot join forces and instead have to join the tournament independently, they will eventually be matched together, and it becomes impossible to eliminate either party if they both play honestly (but might defect later).

Honest parties cannot be matched together, therefore the protocol must allow them to collaborate on the same claim.

As we’ve seen in BoLD, multi-level challenges require additional care when thinking about bonds, given that one additional top-level stake can induce one additional lower-level stake for the honest parties and worsen exhaustion attacks. In the Cartesi protocol, since additional stake only induces a logarithmic amount of additional challenges at the same level and therefore only a logarithmic amount of additional stakes at the lower level, bond sizes don’t need to be scaled down. Despite this, each additional level causes exponentially more delay because each sybil can participate again in each lower-level tournament.

Let’s say the challenge period is 7 days, there are 8 sybils and the protocol makes use of three levels. The top level causes 3 weeks of delay for the cost of 7 adversarial bonds. The three matches that the honest parties participate in at the top level cause three more tournaments at level 2. All those tournaments are composed of 3 matches, causing a total of 9 more weeks of delays and 9 other tournaments at level 3, for the cost of 21 bonds. At level 3, 9 tournaments, for a total of 27 matches, cause 189 more weeks of delays for the cost of 63 bonds. The total is 201 weeks of delays for the cost of 91 bonds, which is almost 4 years. If all 91 bonds were to be used in a protocol with one single level, the delay would “just” be 6.5 weeks.

There is a strong incentive to use as few levels as possible or remove them entirely. One solution is to significantly limit the number of steps allowed in one transition, which either means posting root more frequently or entirely reducing the computational capacity of the system. Another approach is allowing larger single steps, which results in less expensive history commitments, using ZK. Cartesi is currently in the process of using RISC Zero to prove larger steps over RISC V, and remove multi-levels entirely. To reduce the impact of delays even more, one can imagine BoLD-style games with a maximum number of players greater than 2 played over a Cartesi-style tournament.

A hybrid protocol where the delay is reduced to a log8 of the number of sybils instead of a log2, times one challenge period. The bond sizes need to be higher than in a standard Cartesi tournament, but lower than in the standard BoLD protocol.

The bond sizes in Cartesi are expected to be around 1 ETH, allowing very high levels of decentralization. If multiple subsequent state roots are allowed to be published, then the economic requirements increase as well: for example, one state root every hour with a 7d challenge period implies 168 pending state roots at any given time, where each one of them could have an independent tournament, and therefore gas expenses, requiring higher bonds in total. Cartesi solves this issue by allowing only one tournament at any given time, meaning that a new state root is published only when the previous one gets confirmed.

If S_100 takes 28d to finalize because of an active tournament, the next state root will also be published after 28 days. The next root will take into account the time delay with the previous one.

As in Arbitrum Classic, higher initial bonds can reduce the delay that an attacker can induce, at the expense of decentralization. Moreover, with one level, if bonds are cleverly distributed (with Account Abstraction! still in R&D) right after each 1v1 match, the logarithmic resource function can be improved to a constant since those funds can be used to pay for gas in the next match (but can’t be just taken away).

The Cartesi protocol presents the same trade-offs as in the Arbitrum Classic protocol, but with a strict improvement over Promptness thanks to the tournament structure of the games.

The Optimism protocol: OPFP

OP Mainnet released its fraud proof system on production in June 2024. No paper has been released, but a specification of the system can be found here, which was first published in September 2023.

The protocol is set up to be fully concurrent with a fixed delay. Interestingly enough, in contrast to BoLD and Cartesi, it doesn’t make use of execution history commitments to prevent incorrect bisection on valid initial claims. Challenges share the same graph, similar to how it was shown in BoLD. Since challenges share the same graph and invalid bisections can be published, the protocol cannot just rely on top-level bonds but requires staking bonds for each bisection move. Because of this approach, honest players can play as a team even if they cannot identify each other, as anyone can identify correct and incorrect graph nodes. In OPFP, a claim is considered “countered” if only one of its attacking children is not countered, recursively. The final single step defines the base case of this recursion since it can objectively adjudicate whether the claim above is correct or not.

An example of a running OPFP game, including invalid bisections of correct claims (S_256 → S_384’) that must be countered (S_384’ → S_320).

A major advantage of this approach is that posting the initial claim is much cheaper than in BoLD, since it doesn’t need to account for the cost of the full bisections. The Bond Incentives spec gives bond sizes in terms of gas and not ETH, but the current implementation live on Mainnet assumes a fixed gas price of 200 Gwei, and therefore also the ETH size of bonds is fixed. The cost of posting the initial claim is set to 0.08 ETH, and it increases by a factor of 1.09493 at each depth. Since the maximum depth for OPFP is 73, the total amount of ETH required to complete one challenge to the single-step execution is 691 ETH. As it turns out, given this setting, since the value increases with each move, a resource exhaustion attack can be performed by creating lots of top-level assertions, spending 0.08 ETH for each proposal, that the honest parties need to attack, spending 0.0875944 ETH for each move. The resource ratio in this case is exactly the factor used to scale bonds, and therefore 109.493%, economically favoring the attacker.

A resource exhaustion attack on OPFP. The defenders need to have at least 109% of the attacker’s funds to protect the chain.

The clocks for OPFP are set to 3.5 days, half of what BoLD and Cartesi use for their protocols, whose consequences will be discussed in the next chapter. These 3.5 days can be extended to 7 days in the same way that BoLD can be extended from 7 to 14 days. Additionally, the protocol allows clock extensions when a very short time is left on a claim before it expires. The number of delays a root can be subject to is bounded by the maximum depth: generally 3 hours extensions, with 6 hours extension at the “split depth” and 1 days at the last depth, resulting in 10 additional days. Thanks to the exponential bond size curve, the cost of inducing additional delays is also exponential. Since the protocol uses parallel proposals, meaning that it requires separate bonds for each initial claim, clock extensions on one state root don’t affect the others, so users can decide to exit with the next root instead. It’s important to note that parallel proposals don’t protect against delays induced by a censorship attack, since it would affect all state roots at the same time.

OPFP seems to optimize Promptness and Decentralization over Safety, but it’s not clear whether the protocol really fits in the trade-off space of the trilemma: even if the bond sizes are increased, the resource function (i.e. Safety) doesn’t get any better.

Challenge period length

A fraud proof system needs to provide enough time for defenders to challenge invalid state roots and perform the bisection game. Assuming that honest challengers are always live and ready to act as fast as possible, the main threat is represented by their transactions not being able to be included on L1 because of a censorship attack. There are two types of censorship attacks, in the form of weak and strong censorship.

Weak censorship takes place when a certain percentage of builders (and relayers) are censoring certain transactions, but a majority of attesters are not. A popular example is builders censoring non-OFAC compliant transactions. Assuming that this percentage is not 100%, a censored transaction inclusion is only delayed and gets eventually included. Techniques to improve the weak censorship resistance property of Ethereum even more are currently being studied.

Even with 95% of blocks built by censoring builders, a transaction gets included with a 99.99% probability after 36 minutes. At the time of writing, the percentage of OFAC compliant blocks is 35%.

Even if these results look promising to reduce the default challenge period from 7 days to something less than one day, the major threat results from strong censorship attacks. In this case, a majority of attesters can decide not to vote for blocks containing the transactions to be censored, and they will never be included. On Ethereum, strong censorship attacks just result in liveness failures, as in funds sitting idle in an EOA, unless time-sensitive applications are involved, like DeFi protocols or optimistic rollups. To prevent loss of funds, the Ethereum community can decide to hard fork the censoring validators out of the protocol and restore its strong censorship resistance properties. Arguably, coordinating a hard fork is not trivial, and a while ago we all agreed that 7 days is a safe margin that protocols should provide for the community to agree on a course of action. In this regard, since Optimism decided to use 3.5 days as their challenge period, they might not be able to protect users in case of a strong censorship attack. The rationale for this decision is that, during Stage 1, an additional withdrawal delay of 3.5 days is implemented after root confirmation to resolve any bugs that could happen at the last minute of a challenge, keeping the total confirmation delay at 7 days. The challenge period is expected to be extended to 7 days in iterations where the Security Council is removed.

Some “semi-automated” techniques are (not very actively) being discussed to reduce the social coordination effort needed to hard fork in case of censorship, but they require changes to Ethereum itself and they are nowhere close to being discussed for inclusion in a future upgrade yet.

Transactions that are weakly censored will eventually be included, while strong censorship prevents inclusion entirely. Note that the diagram is greatly simplified. Proposers can be connected to multiple relayers, and some of them can be censoring as well.

A similar argument can be used to discuss the security of optimistic L3s built on top of L2s that can delay transactions. At the moment of writing, OP Mainnet can delay each L1 → L2 transaction up to 12 hours and Arbitrum One up to 24 hours. If we assume an optimistic L3 with a 7d challenge period, this means that the protocol cannot take more than 14 or 7 moves respectively, ignoring L1 censorship. In practice, fraud proof systems require around 70 to 90 moves to resolve a bisection game, meaning that L3s on top of OP Mainnet or Arbitrum One should have challenge periods of around 3 months to be made secure. The other solution is, of course, to reduce the censorship delays that the L2 can inflict on users, or make the L2 state transition function more explicitly aware of sequencer delays so that L3s can act on it.

Imperfect information

Resource exhaustion attacks only affect honest actors before the conclusion of the challenge period. Even if the protocol requires a lot of funds to be pooled to protect it, one can argue that finding liquidity is not a difficult task since it eventually guarantees very high profits, assuming that the proof system works correctly. We argue that this assumption shouldn’t be taken lightly. Let’s say that an attacker actually spends billions of dollars to attack a protocol, and then signals on a social or with an onchain message that they found a bug in the challenge protocol where defenders are guaranteed to lose their funds. No one knows if the bug actually exists or if it’s just a bluff, but it can be used as an effective deterrent to prevent reaching the target amount of funds needed to save the chain. Better resource functions decrease the capital at risk needed to defend the chain: risking 1 ETH in total in the Cartesi protocol is much better than risking 15% of the amount of funds the attacker has in BoLD, which is, in turn, better than risking 109% as in OPFP.

Summary of the findings

Wait, why are we even discussing fraud proof systems if we have ZK proofs?

We started to develop optimistic rollups years ago thinking that they would have been a medium-term solution until ZK technology would be mature enough to build a zkEVM, which was projected many years in the future. We thought challenge protocols were faster and easier to implement than ZK rollups, but time proved this sentiment wrong: the first universal ZK rollup, i.e. Starknet, was released in May 2022 while the first zkEVM, i.e. ZKsync Era, in February 2023, and we can count at least 3 other independent zkEVM implementations with Linea, Scroll and Polygon zkEVM. The first permissionless fraud proof system, i.e. OPFP, was only released in June 2024 and no others have been deployed yet. While this is true, it’s also important to note that, at the time of writing, no ZK rollup is mature enough to function in a permissionless setting, but arguably very close.

The main advantage of ZK rollups is the settlement delay, which can be reduced to hours or even less as the technology advances, while the >7d in optimistic rollups seems quite fundamental. On the other hand, with ZK rollups it is not possible to have lower operating costs than optimistic rollups, as in the happy case nothing needs to be proved onchain. ZK verification cost can be amortized with universal proof aggregators among multiple projects, but proving cost and time are the biggest limitation: the throughput of a ZK rollup must be capped by the prover throughput, otherwise the head of the chain will increasingly fall behind what is proven on L1. Parallelization techniques can be used to improve proving time, while the proving cost seems fundamental. Optimistic rollups can trivially increase throughput at any time: doubling the number of steps per second is translated into just one more bisection, which needs to be performed only in the rare case of a dispute. Given the fastest ZK prover, which implies the ZK rollup with the highest throughput, adding one single bisection with subsequent ZK proof can trivially produce a chain with double the throughput. The same chain with just 10 bisection steps would have >1000 times the original throughput. The biggest tradeoff is that if the hardware requirements to run a node increase, then fewer people can run a challenger.

In summary, as it appears today, optimistic rollups are suited for those use cases that require many orders of magnitude more scalability than ZK rollups and are able to bear the additional settlement delays, like onchain games or generally isolated environments.