How to Measure Layer 2 Performance and Scalability

Published in

Offchain Labs

3 min readSep 12, 2019

One way to compare smart contract scaling solutions is by performance. Vendors like to throw around Transactions Per Second numbers — we have done this ourselves at Offchain Labs. But performance, besides being only one of the things developers and end users care about, is more complicated than a single number. As the saying goes, your mileage may vary — and is mileage even the best way to guesstimate your cost?

As Ben Jones said in his well-attended talk at EthBoston, the community needs to get beyond simplistic Transactions Per Second claims and look at more robust ways to measure performance. In this post I’ll draw on the long experience of the computer systems community in understanding how best to measure and characterize system performance and scalability.

Measuring Performance

As a starting point, we want to measure both throughput (how much work gets finished per second on average) and latency (how long it takes something to get finished). Driving a truck full of hard drives from New York to San Francisco has pretty high average throughput — maybe better than an Internet transfer — but a latency of several days will be problematic. In the blockchain space, latency is measured as finality time: the elapsed time from when a client submits a transaction until the result is fully finalized (so that the result will not be lost even if systems crash and the main chain reorgs).

Measuring throughput is more complicated. Transactions per second measures how many (typically trivial) transactions can be done. For applications with more complex transactions, computation per second may be more important, because the compute time for a transaction may dominate over the time to process a completed transaction. (For Arbitrum we measure computation in Ethereum gas equivalent computation per second.) In some applications, storage accesses per second may be the most important factor.

Measuring Scalability

All of this, so far, measures the performance of a single contract doing simple things. We also want to measure scalability: how performance changes as we increase the demands on a system. Several dimensions of scalability matter. What happens as the number of contracts increases? What happens as the number of users per contract increases? What happens as the amount of storage in use increases? In each case we can measure performance curves, or simply look for the limit point — what is the maximum number of contracts, users, or storage cells before the system fails?

Measuring Off-Chain Protocols

For systems that rely on off-chain protocols, there is a fourth dimension to consider, beyond latency, throughput, and scalability. For these systems, the performance can different a lot depending on the behavior of the nodes participating in the system. Typically we can break this down into three cases. In the optimistic case all nodes are available and acting cooperatively. In the unavailable case some nodes are unavailable but the available nodes are all cooperating. And the worst case is the malicious case where some nodes are acting maliciously. In addition to the usual concerns about correctness in these cases, we also care how they affect performance. Many systems are fast in the optimistic case, but how much performance do they lose in the unavailable case or the malicious case? And does the mere possibility of the malicious case impose limits on scalability?

Making Sense of This

There’s a lot to think about. What should you be doing if you’re developing one of these systems? Or using one?

For now, we can take one step at a time. We can move past simple (and sometimes contrived) transactions per second, and start asking about other measures. For off-chain solutions, we can ask about the unavailable and malicious cases.

In the longer run, the community can follow what happened in the computer systems community. First, single numbers. Then multiple numbers with each vendor reporting results on their own custom (but open source) benchmark tools. Then agreement on benchmark tools. Then, finally, benchmark suites based on real applications, to measure performance in realistic application scenarios and boil down the results to one number.

The blockchain world is young and has a long way to go. But we’ll all do better — and most importantly our end users will be happiest — if we all improve our measurement and our performance.

We’ll try to lead by example. Expect to see our team at Offchain Labs up our game and publish more benchmark data, more transparently, as our product, Arbitrum, nears full commercial release.

How to Measure Layer 2 Performance and Scalability

Written by Ed Felten