Towards Secure, Trust-minimized Optimistic Rollups

12 min readSep 28, 2022

A few weeks ago we announced Specular—an optimistic rollup with a novel EVM-native design—and were delighted to see the excitement and discussion stimulated by our Twitter thread around how best to design such systems. In this post, we’d like to clarify what we think are some of the key advantages of the EVM-native approach over other optimistic rollup approaches, including ongoing work by Optimism and Arbitrum pursuing EVM-equivalence through a fraud proof system defined over a lower-level VM.

Before we begin, this post assumes a basic understanding of what optimistic rollups (ORUs) are; and how modern Ethereum ORUs rely on an on-chain interactive fraud proof mechanism responsible for refereeing disputes. A broad survey of relevant prior work can be found here.

Background

Optimism and Arbitrum are in the process of rolling out a new architecture that leverages existing Ethereum client infrastructure, in order to gain full compatibility with existing Ethereum tooling and smart contracts at L2 (a property referred to as EVM-equivalence). Disputes over state transitions resulting from the client’s execution are resolved using an interactive fraud proof (IFP), verified on-chain. In order to support IFPs, the client program is compiled down to a lower-level target VM (such as MIPS or WASM), for which state transition proofs can be constructed at (MIPS/WASM) instruction-level. The resulting binary is then committed to on-chain, allowing the on-chain verifier to enforce its execution at the granularity of the lower-level VM’s semantics.

Crucially, this lower-level-VM approach is completely unaware of the EVM’s existence and its semantics. Therefore, instead of enforcing EVM semantics [1] as is the intention, it enforces lower-level semantics for the execution of a specific binary that purportedly implements the EVM. As a result, this technique by design (1) precludes the permissionless and trust-minimized participation of multiple Ethereum client programs [2]; (2) results in a bloated, excessively large trusted computing base (TCB) that is difficult to independently audit and infeasible to formally verify; and (3) suffers from a frequently-triggered, yet opaque upgrade process — both further increasing security audit overhead and hindering the forfeiture of upgrade keys.

To solve these problems, we argue for what we believe to be a simpler, more natural EVM-native approach to IFPs. That is, we propose enforcing EVM semantics explicitly on-chain, at the level of a single EVM instruction. Nothing more and nothing less. Below we describe some of the advantages of this design over the lower-level-VM approach.

Client Diversity

While the Ethereum network is run by a diverse set of clients including Geth, Besu and Erigon (albeit presently with insufficient adoption), no current L2 solution is close to supporting client diversity. This is important because we want to prevent invalid ORU state transitions induced by software bugs/vulnerabilities from slipping by undetected and undisputed.

The Optimism team has proposed one potential solution, which extends the lower-level-VM approach to the multi-client setting by including an on-chain binary commitment for each of a number of whitelisted client programs. Validators must be prepared to participate in challenge phases involving any one of them, only winning a dispute overall if they manage to win a threshold of the challenge phases invoked.

This approach only provides extrinsic client diversity, not intrinsic to the proof system itself. Each client has its own independent proof system associated with it. In fact, this is just a general defense-in-depth technique that can be used by any rollup in a scheme that combines different proof systems using a majority vote (e.g. even combining an optimistic rollup and ZK-rollup into one multi-proof system). While it may offer some semblance of client diversity, it introduces multiple problems when utilized in isolation:

Participation of a client program is permissioned. That is, a client program’s compiled binary must be committed to on-chain as a whitelisted program for it to be allowed to participate in dispute resolution.
• Moreover, since the binary commitment changes from version-to-version, Optimism’s governance process is also on the critical path for any upgrade to any client program (or even just its compilation toolchain).
• Ideally: a client program can participate and be upgraded without going through the ORU’s governance.
There is a strong trust assumption that a majority of client programs implement the EVM correctly. This can be interpreted as K-of-N-version, or N-of-N-version programming.
• As we note in the next section, this is particularly problematic because client programs are difficult to audit.
• Ideally: it is sufficient to require just a single client program to implement the EVM correctly (i.e. using classical N-version programming).
Validators have no choice but to be prepared to run every whitelisted client program to resolve a dispute.
• Not only is this expensive and slow due to the additional redundancy in the dispute process, it is also more operationally complex, as a validator must learn how to configure and run each program.
• The operational complexity grows with the number of clients, placing practical limitations on how many client programs may be included.

Altogether we believe this design does not effectively distribute trust— client program inclusion is permissioned, a majority must be trusted to implement EVM semantics correctly AND the number supported is practically going to be limited by both trust, devops and general operational complexity concerns.

Our approach provides intrinsic client diversity, because the on-chain verifier is agnostic to the client program. All clients interact using the same proof system. Any program that supports EVM semantics can participate permissionlessly, and only one must implement the correct semantics for faults to be detectable/disputed. Validators are also afforded a true choice in running whatever client program they’re most comfortable with, providing much simpler devops.

An EVM-native ORU therefore provides support for permissionless, trust-minimized client diversity. While both approaches enable detection of maliciously injected state transitions (e.g. caused by invalid transactions), only the EVM-native approach enables robust detection and resolution of bugs/vulnerabilities.

Auditability

The TCB of an ORU must be auditable and/or formally verifiable to ensure its trustworthiness. Existing ORUs fall short of this objective.

Client program trust assumptions

In the lower-level-VM approach: because the on-chain verifier enforces the execution of the client program at the target ISA instruction-level (rather than higher-level EVM semantics), inspecting the verifier alone is not sufficient to determine the equivalence of the enforced semantics to that of the EVM. The enforced semantics of the ORU VM are determined implicitly by the whitelisted client binary. It is impossible to determine whether these semantics are equivalent to EVM semantics a priori without auditing the entire client program and compiler. And in the extrinsic client diversity case, this problem is exacerbated further, since semantics are determined by majority consensus (manifesting through the dispute mechanism), requiring audits for all, or at least a majority, of client programs and compilers.

The TCB therefore includes not only the verifier, but also every client program, along with the compiler toolchain and binary commitment generator associated with each program. The bloated size and complexity of the TCB creates a pronounced risk of software bugs/vulnerabilities that can impact the security of the ORU, and commensurately increases auditing overheads.

We emphasize that it isn’t as simple as piggybacking off of security audits conducted by the maintainers of the upstream client program (i.e. Geth) because the program must be custom-tailored to support one-step proof generation (see Geth forks here and here). Furthermore, since the on-chain verifier cannot have access to the Ethereum database—which stores blocks, transactions, and account/storage state—they must plug in a preimage oracle component to allow MIPS/WASM code to query the database.

In the EVM-native approach, EVM semantics are explicitly enforced by the on-chain verifier. With permissionless and trust-minimized client diversity, the overall security of the ORU does not rely significantly on the correctness of any individual client program. For this reason, the TCB of an EVM-native ORU includes only the on-chain verifier. This is great because the limited scope of the TCB allows it to be both more easily auditable and entirely formally verifiable with respect to a formal EVM specification, as we touch on below.

Verifier trust assumptions

Both the EVM-native and lower-level-VM approaches include a one-step proof verifier in the TCB. While a lower-level VM verifier may be slightly simpler to implement and audit (owing to a smaller instruction set), this provides little comfort considering the difficulty of implementing and/or auditing the rest of the system end-to-end—particularly the client and compiler.

Furthermore, formal verification—and in particular, KEVM—levels the playing field. KEVM is an executable formal specification of the EVM implemented using the K-Framework that supports the formal verification of Solidity smart contracts. KEVM is therefore perfectly complementary to Specular because it provides both a formal specification and a suite of verification tools, enabling us to formally verify our on-chain verifier. We’re actively engaged in this research direction and are currently exploring various ways to leverage KEVM in collaboration with the KEVM team.

On the other hand, it’s infeasible to formally verify the TCB of ORUs that take the lower-level-VM approach. While it is possible to formally verify a MIPS verifier, this doesn’t provide any assurances regarding the enforcement of EVM semantics. To achieve such assurances, Geth itself — along with its compilation toolchain — would have to be formally verified; however, given the complex, computationally unbounded and concurrent nature of these programs, this is effectively impossible.

An EVM-native ORU’s TCB is therefore formally verifiable with respect to an existing, well-tested formal specification, while that of lower-level-VM approaches is not.

Compiler trust assumptions

In the lower-level-VM approach, a heavily customized compilation toolchain is introduced to support one-step proof generation. This toolchain, however, is not mature. Both the compiler and target ISA are customized (see here and here). Nitro has added several custom WASM opcodes for the preimage oracle and inbox, while Cannon has added several custom MIPS syscalls. The target runtime models are also modified and bound by several conventions or constraints. For example, both Nitro & Cannon merklize the memory to enable a succinct one-step proof without revealing the whole memory space. Cannon also requires the compiled program and input/output to be loaded at specific positions of the memory. Additionally, both Nitro and Cannon replace floating point instructions with soft-float implementations (see here and here). @pepyakin has an excellent blog post that touches on other complexities specific to the use of WASM as a target ISA.

Any of these customizations may unintentionally change the behavior of the compiled Geth binary, and must therefore undergo extensive auditing. Given the immaturity of the compiler toolchain, consistency between x86 Geth and MIPS/WASM Geth can’t be taken as a given.

The EVM-native approach does not have any of these problems — normal execution and dispute resolution operate at the same abstraction level and in practice closely share code paths.

TCB Upgrades

Frequency

While Ethereum client programs are frequently upgraded, the Ethereum protocol itself tends to hard-fork only a couple times a year. Hard-forks that actually modify EVM semantics are even less common, at around once-a-year. Moreover, many of these changes (e.g. consensus) do not affect execution semantics. As a result, the TCB of an EVM-native ORU needs relatively infrequent upgrades compared to that of ORUs taking the lower-level-VM approach. Comparatively, the lower-level-VM approach requires an L1 contract upgrade every time the client upgrades, since the changes must be reflected in a new binary commitment on-chain.

Furthermore, we expect the Ethereum protocol and EVM specification to eventually stabilize in some form. This has two implications:

In the long-term, the frequency of upgrades to the TCB (or equivalently, the on-chain verifier) of EVM-native ORUs will decrease — eventually tending to zero. On the other hand, individual client programs will likely continue to commonly experience upgrades that intend to improve performance and fix bugs, triggering frequently necessary TCB upgrades in the lower-level-VM approach.
An EVM-native ORU can stabilize in tandem with Ethereum. This is because the community responsible for developing Specular could forfeit its L1 contract upgrade keys without forfeiting the ability to ship performance upgrades and bug-fixes to clients. On the other hand, once the Optimism team throws out its keys (as they state they intend to do), it will be impossible to carry out such upgrades in a timely manner (if at all). This makes doing so a much higher-risk endeavor, even with the minor safeguards afforded from an extrinsic client diversity scheme — to the point that it may never be safe enough to get rid of them.

We therefore argue that the safest and most practical path to relinquishing (fast) upgrade keys is through an EVM-native ORU design.

Transparency

The size and complexity of the TCB in the lower-level-VM approach results in an opaque upgrade process. For example, clients are upgraded in a less transparent manner than the Ethereum specification, which undergoes a deliberate proposal (EIP) process. Upgrades to components of the toolchain specific to the ORU, such as the Golang-to-ISA compiler, are even less transparent still.

In our approach, there is a clear separation between verification of semantics and the client program implementing those semantics. Therefore, it is easier to discern whether or not an upgrade can potentially affect the interpretation of semantics — auditors need only look at the diff in the source code in the L1 contracts.

Conclusion

For the reasons listed above, we believe an EVM-native design provides superior security and trust properties. You can read more about our approach on our website, or follow us on Twitter @SpecularL2 to stay up-to-date with ongoing progress!

Thanks to Patrick McCorry for his feedback and input on this post.

Appendix

In this section, we respond to specific excerpts from @kelvinfichter’s thoughtful blog post in which he summarizes (his perspective of) the trade-offs between the two approaches. While the post acknowledges much of the points above, it also raises some interesting questions.

Upgrades

Optimism can basically freely modify client code without touching any smart contracts

The binary commitment must still change. There should be no meaningful difference between how modifications to security-critical contract code and modifications to security-critical client code, compiled and committed to in the contract are treated in the governance process. Modifications on either side can adversely affect the security of the rollup in the lower-level-VM approach.

Moreover, while it’s true anyone can compute a diff between versions of the client-side code to see what changed in an upgrade, it’s much more challenging in practice to audit changes to a codebase that spans several tens to hundreds of thousands of LoC because the impact of any given modification is less clear.

Additionally, since changes to the client don’t need to be propagated to the proof, EVM updates don’t trigger a new smart contract audit.
A native EVM fault proof is necessarily at the mercy of changes to the upstream EVM. Ethereum hardforks approximately twice per year and at least one of these hardforks will typically introduce EVM tweaks (some more than others, like the introduction of EIP-1559 in the London hardfork). If a native EVM proof system wants to maintain upstream EVM equivalence, the proof contracts must be upgraded.

The same hard-forks that will require audits in the EVM-native approach will also likely trigger audits in the lower-level-VM approach. In our view, the difference in engineering/auditing effort for a single client isn’t substantial — and as argued earlier, the EVM-native approach pulls ahead in the multi-client case.

Introducing new features

A system with a native EVM proof may struggle further when attempting to introduce new features outside of the EVM. Take for example calldata compression […]

We agree that certain features outside the purview of the EVM — such as calldata compression — may be useful for achieving low transaction costs for users (although with recent developments in data availability layers that leverage existing trust in Ethereum, this specifically may not be immediately necessary). The lower-level VM approach does have some more flexibility here. However, all hope is not lost.

By separating L2 extensions from EVM semantics, we can focus on ensuring the security of the on-chain verifier. A distinct verifier can be introduced to support L2 extensions, keeping the core EVM verifier separate and intact. We currently plan to plug in zero-knowledge proof techniques for verification of such extensions. While executing EVM semantics in a SNARK may be complex, L2 extensions such as data compression are simpler to handle, since the specification of semantics is completely under our control (and can therefore be designed to be SNARK-friendly). We note that this is still an ongoing area of research.

Engineering effort

The native EVM proof contract is significantly more complex than the low-level proof, sufficiently so that even building a single native EVM proof will be a massive engineering effort. However, the existence of KEVM means that it might be possible to formally verify the correctness of the EVM aspect of the proof.

We believe this is an overestimation, given the amount of engineering resources we’ve dedicated thus far to building Specular. Furthermore, as discussed above and acknowledged in the post, many of the concerns can be practically addressed with formal methods.

Footnotes

[1] We use the term EVM-semantics inclusively of both the inter- and intra-transaction semantics defined in the Ethereum yellow paper.

[2] For disambiguation, we refer to software that implements Ethereum as client programs and their instantiations (i.e. actual nodes running the software) as clients.