A Simplified Look at Ethereum’s Casper
There are multiple versions of Casper, the documentation is scattered and constantly evolving, it’s now being genetically spliced with sharding — and even if none of that were true, it’s heavy stuff. But if you set aside the proofs, I think what it looks like in operation — at least the essence of it — is pretty intuitive. Here’s a simplified sketch of how I understand Casper to work.
(There are various important caveats here, which I’ll list at the bottom. They come down to: some of this may be wrong; some of this is wrong; I am not a Casper expert, take with a grain of salt.)
A walkthrough of voting in Casper
Normally one block at a time is added to the chain, stakers all vote for it, it’s finalized, everything’s groovy.
But now and then competing blocks at the same height may split the chain. Due to lag, or temporary network partitions, an attack, whatever.
When a split occurs, it may take several rounds of voting before any block receives the 2/3 required to be nominated for finalization. And even when a block is nominated, it may fail to get the 2/3 finalization votes in the next round. (Stakers are meant to vote for a block each round, but including a finalization vote is optional — there can be 100 votes, but 0 finalization votes.)
Crucially, however, even a failed nomination has implications for its voters. A staker who voted to finalize a nominee cannot vote for a conflicting nominee — one not descended from the one they voted to finalize — until a new one gets nominated by other voters. A staker who violates this “no-reneging” rule gets slashed, ie, loses their stake!
Eventually, some nominee gets >1/3 finalization votes. This is a crucial turning point, because after this, no conflicting branch can win 2/3 nomination without some staker getting slashed. So once a block exceeds 1/3 finalization votes, no branch conflicting with that block will probably ever get nominated/finalized.
The overall flow is:
- When there’s a split, stakers vote each round until a block N gets 2/3 votes and becomes the current nominee.
- In the round after N gets 2/3, stakers have the (optional) chance to vote to finalize N. One of three things happens:
a) N gets ≥2/3 finalization votes: N is finalized, and the chain is reunited.
b) N gets >1/3, <2/3 finalization votes: not finalized, but N’s branch is soft-finalized — the eventual finalized block is highly likely to be a descendant of N.
c) N gets ≤1/3 finalization votes: not finalized. Voting resumes until another nominee gets 2/3, and it may conflict with N.
The flow is summarized by this state transition diagram:
The slashing conditions
As described (and proved) in the Casper FFG paper, the slashing conditions are there to ensure desirable properties:
- “Accountable safety”: basically, no conflicting blocks get finalized without ≥1/3 of stakers getting slashed
- “Plausible liveness”: finalization never gets “stuck”, with no block able to achieve finalization without someone getting slashed
But another way of looking at the slashing conditions is, they enforce the nominee-voting logic above (no voting for a new nominee after voting to finalize another, etc), and that logic ensures these properties. In particular, while the FFG paper’s first slashing condition is common sense (you can only vote once per round), the second — the no-spanning rule — is more confusing: “no vote of yours can span another.”
In Casper FFG, every vote specifies both a target block (the one you’re voting for) and a source block (an ancestor of the target that 2/3 previously voted for). The no-spanning rule means that, for example, if you cast a vote for a target block at height 7 with source at height 5, you can’t cast another vote for a target at height ≥8 with source at height ≤4: the second vote spans the first.
The no-spanning rule is not quite the same as the no-reneging rule I invoked in the walkthrough above (in short, “no voting for a conflicting fresh nominee after voting to finalize”):
- No-reneging rule: as a staker, you get slashed if:
1. You voted (at height h) to finalize block A (at height h-1); and
2. Later (ie, at height h+k), you voted for block B, conflicting with A; and
3. No ancestor block of B between A and B (ie, at height ≥h, <h+k) received 2/3 votes.
However, it turns out that the no-spanning rule implies the no-reneging rule. So, since Casper FFG enforces the no-spanning rule, the no-reneging rule also applies. Though the two rules aren’t exactly identical:
I would argue that the no-reneging rule is more intuitive, and also preferable in that it eliminates the confusing source parameter in votes. However, clause 3 of the no-reneging definition above (“No ancestor block of B…”) is arguably clunkier to enforce than the no-spanning rule. One way to implement clause 3 is a multi-step slashing protocol: I post a challenge to your pair of votes (finalizing A, voting for B); you have some period (eg, a month) to respond with a block at height >A, <B that won 2/3 votes; if you don’t, you get slashed; if you do, I get slashed. Or there may be simpler ways.
In any case, either the no-spanning rule or the no-reneging rule leads to the nominee-voting logic described in the walkthrough above.
As Patrick Dugan noted, these rules, especially the no-reneging rule, have clear precedents in prior consensus protocols like Jae Kwon’s Tendermint and its locking rules: “Once a validator precommits a block, it is locked on that block. Then, 1. It must prevote for the block it is locked on 2. It can only unlock, and precommit for a new block, if there is a polka [2/3 vote] for that block in a later round.”
Proofs
By popular request, here are proofs that the no-reneging rule ensures accountable safety and plausible liveness, as FFG’s no-spanning rule does.
Safety: We want to show that if two conflicting blocks are both finalized, ≥1/3 of stakers get slashed. Let the two blocks be A at height h and B at height h+k. Let B’ at height h+j be the ancestor of B (or B itself) with the lowest height >h, which got 2/3 votes. (Since B itself got 2/3 votes, there is always at least one such block: B.) It follows directly from this definition that no ancestor of B’ between A and B’ got 2/3 votes, and that B’ conflicts with A. Therefore, by the no-reneging rule, the ≥1/3 of stakers who voted for both A and B’ get slashed.
Liveness: We want to show that some new higher-height block can always be finalized without anyone getting slashed, as long as new blocks keep getting added. But this is simple: descendants of the current nominee (ie, the highest-height block to have got 2/3 votes) can always be voted for and then finalized by 100% of stakers without violating the no-reneging rule. (The rule only prevents stakers who have cast a finalization vote from joining in the nomination of a conflicting nominee: once it’s already received 2/3 votes, as the current nominee always has, they’re free to vote for it.)
No-spanning implies no-reneging: As a bonus, here’s a proof that any violation of no-reneging is also a violation of no-spanning. Suppose you violate no-reneging, meaning: 1. You voted at height h to finalize A at h-1, 2. Later (ie, at height h+k) you voted for conflicting block B, and 3. No ancestor block B’ of B between A and B got 2/3 votes. Since a vote’s source must have 2/3 votes, it follows that your source in your vote for B must have been at height h-1-j: ie, lower height than A. Therefore, your vote for B (height h+k, source height h-1-j) spans your vote to finalize A (height h, source height h-1).
Caveats
- This doc represents my understanding only and may contain errors — if so, let me know! I am not a Casper expert.
- This is intended to clarify Casper’s workings for people already vaguely familiar with it. If you’re coming in completely green you may find these helpful side reading: the Ethereum proof-of-stake FAQs, Blockgeeks’ Casper crash course, jon choi’s Casper 101, and the Casper FFG paper.
- I’ve intentionally simplified some important aspects. Eg: Casper FFG operates on checkpoints, not on every block, because voting to finalize every block would produce a lot of overhead for little benefit. But there’s no conceptual reason the algorithm wouldn’t work on every block: a checkpoint is literally just every 100th block. So for simplicity I’ve presented everything in terms of blocks.
- Simplification #2: I talk about “before” and “after”, eg, “no voting for fresh conflicting nominee B after you vote to finalize A.” But in practice, time is hard to enforce: just because I detect your vote v1 before v2 doesn’t mean you sent them in that order. So when I use “after” in this post, I really mean it as shorthand for “at a higher height”. In other words, I’m assuming stakers cast votes in increasing order of height — well-behaved stakers should do this. But slashing rules can’t really enforce timing, only height.
- Simplification #3: I assume every staker votes every round, and the set of stakers never changes. I don’t think removing either of these assumptions would be fatal to any arguments above, but it would make them messier.
- In parts of the walkthrough, I come close to assuming no staker will behave (vote) in a way that would lead them to get slashed. Of course there are attacks where the attacker accepts the cost of being slashed, but these will be very rare, and my primary goal is to show what Casper looks like in operation “most of the time”.
- I’ve used some idiosyncratic terminology where I found it clearer or more concise — “nominee” rather than “justified”, “staker” rather than “validator”, etc. Apologies for any confusion this causes.
Most of this post is just fleshing out ideas I originally tried to fit into tweets. Thanks to Justin Drake, @matthew_d_green, Dan Robinson, Danny Ryan and others for very educational discussion and corrections. Also to Jane’s Addiction for the timeless second half of Ritual de lo Habitual, which helped me get this done.