Understanding the Stellar Consensus Protocol

Published in

Interstellar

22 min readMar 13, 2019

The Stellar Consensus Protocol was first described in a whitepaper by David Mazières in 2015. It is a “federated Byzantine agreement system” that allows decentralized, leaderless computing networks efficiently to reach a consensus outcome on some decision. The Stellar payment network uses SCP to provide a consistent view of the network’s transaction history to all participants.

Consensus protocols have a reputation for being difficult to understand. SCP is simpler than most but still shares that reputation — due in part to the mistaken idea that “federated voting,” which the whitepaper spends its first half describing, is SCP. But it’s not! Instead, it’s an essential building block used by the second half of the whitepaper to construct the actual Stellar Consensus Protocol.

In this article, we’ll give some brief background about what an “agreement system” even is, what can make one “Byzantine,” and why you’d want to make a Byzantine one “federated.” We’ll then explain the federated voting procedure described by the SCP whitepaper, and finally explain SCP itself.

Agreement systems

An agreement system allows a group of participants to reach the same decision about something — for example, what to order for lunch.

At the offices of Interstellar, we have implemented our own lunch-agreement system: we order whatever our operations manager, John, says. It’s a simple and effective agreement system. We all trust John to order something interesting and nutritious each day.

But what if John were to abuse that trust? He could unilaterally decide we must all become vegans. After a week or two of that we’d probably depose him and give his authority to Elizabeth, but maybe she’s on an avocado-and-anchovy-sandwich kick and thinks we all should be too. Power corrupts, we might decide, and so we would seek some more democratic method: some way to make sure different preferences are heard while still reaching a timely, unambiguous outcome, so that we don’t end up with no one ordering lunch, or five of us placing competing lunch orders, or no decision about what to order until 4pm.

It might seem that the solution is simple: just conduct a vote! But this is deceptive. Who gets to collect ballots and report the results? And why should the rest of us trust what they say? Perhaps we could first vote on a leader whom we all trust to run the vote — but then who gets to run that vote? What if we can’t agree on a single leader? Or, what if we can agree, but then that leader gets stuck in a meeting, or goes home sick?

Similar problems are common in distributed computing networks. All the participants, or nodes, must agree on some decision, such as whose turn it is to update a shared file or pull a task from a processing queue. In a cryptocurrency network, nodes repeatedly must decide what the complete history of the shared ledger looks like, from among multiple possible versions that occasionally conflict. This network-wide agreement allows the recipient of a crypto coin to have faith that the coin is both (a) valid (not counterfeit) and (b) not already spent elsewhere. It also assures them that they’ll be able to spend it in the future, because the new recipient will have the same faith in it, for the same reasons.

Any agreement system in a distributed computing network needs to be fault-tolerant: it must produce consistent results despite errors like slow communication links, unresponsive nodes, and misordered messages. A Byzantine agreement system is additionally tolerant of “Byzantine” faults: nodes that give false information, whether due to error or in a deliberate attempt to subvert the system or gain some advantage.¹ Consider the owner, Alice, of a crypto coin, who has to choose between buying a delicious gelato with it from Bob, or paying it to Carol to settle a debt. Alice might like to have it both ways by fraudulently paying the same coin to both Bob and Carol. To do so she must convince Bob’s computer that the coin was never paid to Carol, and she must convince Carol’s computer that the coin was never paid to Bob. A Byzantine agreement system can make this effectively impossible using a form of majority rule called a quorum. A node in such a network refuses to commit to a particular version of history until it sees that enough of its peers — a quorum — are also prepared to commit. Once that happens, they’ve formed a voting bloc large enough to force the remaining nodes in the network to agree with their decision. Alice might be able to cause some nodes to lie on her behalf, but if the network’s large enough, her attempt will be overwhelmed by the votes of honest nodes.

How many nodes does it take to form a quorum? At least a majority and more typically a supermajority to combat errors and fraud. But knowing when you have a majority means knowing how many total participants you have. In the Interstellar office, or in a county election, those numbers are easy to know. But if your collection of participants is a loosely defined network that members can join and leave at will, without needing to coordinate with any central authority, then you need a federated Byzantine agreement system: one that can determine quorums not from some predetermined roster of nodes, but dynamically, from an ever-changing and inevitably incomplete snapshot of membership at a point in time.

It might not seem possible to construct a quorum from only the limited perspective of a single node in a sprawling network, but it is. This quorum can even create confidence in the outcome of a decentralized vote. The SCP whitepaper shows how to do this using a procedure called federated voting.

For the impatient

The rest of this article describes federated voting and the Stellar Consensus Protocol in some detail. To serve as a guide to what follows — or if you don’t care about the detail and just want the tl;dr — here’s an overview of the process.

Nodes conduct rounds of federated voting on “nominees.” A round of federated voting means:
• A node casts a vote for some statement, such as “I nominate value V”;
• The node listens to votes from its peers until it finds one it can “accept”;
• The node seeks a “quorum” that also accepts the statement. This “confirms” the statement.
As soon as a node can confirm one or more nominees, it starts trying to “prepare” a “ballot” via more rounds of federated voting.
As soon as a node can verify that a ballot is prepared, it starts trying to “commit” the ballot via still more rounds of federated voting.
Once a node can confirm that a ballot is committed, it can “externalize” the value in that ballot, using it as the outcome of consensus.

These steps involve multiple rounds of federated voting that collectively form a single round of SCP. To understand what each step means, why so many are needed, how it all works, and what can go wrong, read on!

Federated voting

Federated voting is a procedure for discovering whether a network of participants can agree on a proposal. In a round of federated voting, each node must choose one of potentially many possible values as the outcome of that round. It cannot do so until it is sure that the other nodes in the network won’t choose any different outcome. To be sure of that, the nodes exchange a flurry of messages back and forth allowing each of them to confirm that a quorum of nodes accepts the same vote. The rest of this section explains the terms in that sentence and how such a confirmation can be achieved.

Quorums and quorum slices

Let’s start with identifying a quorum. As we discussed above, in a decentralized network with dynamic membership, it’s impossible to know ahead of time how many nodes there are, and therefore how many make up a majority. Federated voting solves this by introducing the novel idea of a quorum slice: a small collection of peers that a node trusts to convey information about the state of voting in the rest of the network. Every node defines its own quorum slice (of which it is also ipso facto a member).

To form a quorum, start with a quorum slice. For each member, add the members of its quorum slice. Then add the members of those members’ slices, and so on. As you continue you’ll encounter more and more nodes that you can’t add because they’re already included. When there are no more new nodes to add, stop: you have formed a quorum from a “transitive closure” of the starting node’s quorum slice.

…then add the members of those nodes’ slices.

Continue until there are no nodes to add.

No remaining nodes to add. This is a quorum.

In fact each node may have more than one quorum slice. To form a quorum, choose just one of the slices and add the members; then choose any one slice for each of the members and add those members, and so on. This means that each node is a member of many possible quorums.

Choose just one quorum slice at each step.

How does a node know the membership of another node’s quorum slices? The same way it knows anything else about other nodes: from the broadcasts that each node sends to the network whenever its voting state changes. Each broadcast includes the details of the sending node’s slices.²

Recall that in a non-federated Byzantine agreement system, a quorum is defined as a majority of all nodes.³ Once a proposal passes the quorum threshold, the rest of the network members are convinced that any competing proposals will fail. This is how the network converges on an outcome.

But in a federated Byzantine agreement system, not only can there be no majority (because no one knows the total size of the network), but the concept of majority is not even useful! If membership in the system is open, then someone could gain a majority simply by conducting a so-called Sybil attack: joining the network many times using multiple nodes. So what is it about the transitive closure of a node’s quorum slice that makes it into a quorum, and what makes that able to overwhelm competing proposals?

Technically, nothing! Imagine a network containing Alice, Bob, Carol, Dave, Elsie, and Frank. Alice has Bob and Carol in her quorum slice. Bob has Alice and Carol, Carol has Alice and Bob. Meanwhile, Dave, Elsie, and Frank all have one another in their respective quorum slices. The Alice-Bob-Carol subgroup can reach a decision that the Dave-Elsie-Frank group will never hear about, and vice versa. There is no way for this network to achieve consensus (except by accident).

So SCP requires that, in order for federated voting to work (and for the paper’s important theorems to apply), the network must enjoy a property called quorum intersection. In a network with this property, any two quorums you can construct always overlap in at least one node. For determining the prevailing sentiment of the network, this is as good as having a majority. Intuitively, it means that if any quorum agrees to statement X, no other quorum can ever agree to not-X, because it will necessarily include some node from the first quorum that has already voted for X.

If the network has “quorum intersection”…

…then any two quorums you might construct…

(Of course it could be that the overlapping nodes are all Byzantine — lying or otherwise misbehaving. In that case, having quorum intersection doesn’t help the network agree at all. For that reason, many of the results in the SCP whitepaper rely on explicitly stated assumptions, such as that the network enjoys quorum intersection even if misbehaving nodes are removed from the network. For the sake of clarity we’ll leave those assumptions implicit for the remainder of this article.)

It might seem unreasonable to expect a collection of independent nodes to organize their slices in such a way that the network will reliably enjoy quorum intersection. But there are two reasons why this isn’t so far-fetched.

The first reason is the existence of the Internet itself. The Internet is the perfect example of a network of independent nodes with quorum intersection. Most nodes on the Internet connect to just a few other local nodes, but those small sets overlap enough that every node is reachable from every other node by one route or another.

The second reason is specific to the Stellar payment network (the most widespread application of SCP). Each asset type in the Stellar network has an issuer, and Stellar best practices require each issuer to designate one or more nodes in the network for handling redemption requests. It’s in your interest to include those nodes in your quorum slices, directly or indirectly, for each asset you care about. The quorums for all nodes interested in a given asset will then overlap in at least those redemption nodes. Nodes interested in multiple assets will include in its quorum slices all the relevant issuers’ redemption nodes, and these will tend to bridge all assets together. Further, any assets that are not connected in this way to others on the network don’t need to be — it’s OK for the network to lack quorum intersection there. (Think of the way that banks operating in dollars sometimes want to trade with banks operating in euros and banks operating in pesos, so they are on a network together, but none of them care about the separate network of kids trading baseball cards.)

Of course, expecting that the network should enjoy quorum intersection is not the same as a guarantee. Other Byzantine agreement systems owe much of their complexity to making guarantees about quorums. An important innovation of SCP is that it removes the responsibility for making quorums from the consensus algorithm itself and pushes it into the application layer. This suggests that, although federated voting is general enough to work with any kind of value being voted on, in fact its robustness depends critically on the broader meaning of those values. Some hypothetical uses might not lend themselves as readily to producing well-connected networks as others.

Voting, accepting, and confirming

In a round of federated voting, a node optionally begins by casting a vote for some value V. This means broadcasting a message to the network saying, “I am node N, my quorum slices are Q, and I vote V.” When a node votes in this way, it promises that it has never voted against V and never will.

A node can see how its peers are voting from their broadcast messages. Once the node collects enough such messages, it can traverse the quorum slices in them to find quorums. If it can see a quorum of peers that all vote for V also, then it can move to accepting V, and it broadcasts this new message to the network. (“I am node N, my quorum slices are Q, and I accept V.”) Accepting provides a stronger guarantee than mere voting. When a node votes for V, it can never vote for not-V. But when a node accepts V, no node in the network will ever accept not-V. (Theorem 8 in the SCP whitepaper proves this.)

Of course, there’s a good chance that N won’t see a quorum of nodes agreeing with its V vote right off the bat. Other nodes may vote for other values. But there is another way for a node to advance from mere voting to accepting. N can accept a different value, W, even if N didn’t vote for it, and even if it doesn’t see a quorum voting for it, as long as it sees a blocking set accepting it. A blocking set is just one node chosen from each of N’s quorum slices. As its name suggests, it is capable of blocking any other value. If all nodes in such a set accept W, then (by Theorem 8) it will never be possible to form a quorum accepting not-W, and so it’s safe for N to accept W too.

B-D-F is a blocking set for N: it includes one node from each of N’s slices.

B-E is also a blocking set for N, because E appears in two of N’s slices.

But a blocking set is not a quorum. It would be too easy for someone to fool node N into accepting a value when it shouldn’t, if they can just subvert one node in each of N’s slices. So accepting a value is not the end of voting. Instead, N must confirm the value, meaning it sees a quorum of nodes all accepting it. If it gets this far, then as the SCP whitepaper proves (in Theorem 11), the rest of the network will also eventually confirm the same value, and so N has reached the end of federated voting with the value as its outcome.

The process of voting, accepting, and confirming makes up one complete round of federated voting. The Stellar Consensus Protocol combines many such rounds to create a complete consensus system.

The Stellar Consensus Protocol

The two most important properties of a consensus system are safety and liveness. A consensus algorithm is “safe” if it can never produce different results for different participants. (Bob’s copy of history will never disagree with Carol’s.) Meanwhile, “liveness” means the algorithm will never fail to produce a result — i.e., that it won’t get stuck.

The federated voting procedure just described is safe in the sense that if a node confirms value V, no other node will confirm a different value. But “won’t confirm different things” is not the same as “will confirm something.” There may be so many different values being voted on that nothing even reaches the “accept” threshold. This means that federated voting lacks liveness.

The Stellar Consensus Protocol uses federated voting in a way that guarantees safety and liveness.⁴ The idea, in a nutshell, is to conduct multiple federated votes on multiple values until one makes it all the way through SCP’s various voting phases, described below.

The values on which SCP seeks consensus may be Stellar ledgers or lunch orders or anything else, but it’s important to note that these are not the values that SCP’s federated-voting rounds vote on or accept or confirm. Instead, federated voting happens on statements about those values.

The first rounds of federated voting happen in the nomination phase, on a family of statements of the form, “I nominate V,” for possibly many different values of V. The goal of nomination is to find one or more such statements that can make it all the way through acceptance and confirmation.

After finding confirmable nominees, SCP moves to the balloting phase, where the goal is to find some ballot (which is a container for a nominated value) and a quorum that can commit to it. If a quorum commits to this ballot, its value is the outcome of this round of consensus. But before a node can vote to commit to a ballot, it must first confirm that all lesser ballots are aborted. These steps — aborting ballots in order to find one that can be confirmed committed — involve multiple rounds of federated voting on multiple statements about ballots.

The following sections describe nomination and balloting in more detail.

Nomination

At the start of the nomination phase, each node may spontaneously choose a value V and vote for the statement “I nominate V.” The goal at this stage is to confirm the nomination of some value, via federated voting.

It’s possible that enough nodes vote for enough different “I nominate” statements that it’s a free-for-all — no nomination can ever reach the “accept” threshold. So in addition to casting their own nomination votes, nodes “echo” the nominations of their peers. Echoing means that a node voting “I nominate V,” upon seeing a message from a peer nominating W, now votes both “I nominate V” and “I nominate W.”⁵ Conceptually these are separate federated votes happening in parallel, each separately able to reach acceptance or confirmation. In practice SCP protocol messages batch these separate votes together.

Although a vote to nominate V is a promise never to vote against nominating V, the application layer above federated voting — SCP in this case — gets to define what “against” means. SCP defines no statement that contradicts an “I nominate X” vote — there is no “I reject the nomination of X” — so a node can vote to nominate as many values as it likes. Many of those nominations may go nowhere, but eventually there will be one or more that a node can accept or confirm. Once a nominee is confirmed, it is called a candidate.

SCP nomination using federated voting. There may be many values “B” nominated by peers and “echoed” by the node.

Nomination may produce multiple confirmable candidates, so SCP requires the application layer to supply some method for combining candidates into a single composite. The combining method can be anything the application layer calls for, as long as that method is deterministic, meaning that every node combining the same two candidates produces the same composite. In a lunch-ordering system, “combining” might simply mean discarding one of the two candidates. (Deterministically, though: every node must choose the same value to discard. The alphabetically earlier choice, for example.) In the Stellar payment network, where nominees are ledgers, combining two proposed nominees involves taking the union of the transactions they contain, and the later of their two timestamps.

The SCP whitepaper proves (Theorem 12) that nomination causes the network eventually to converge on a single composite candidate by the time nomination ends. But there’s a problem: federated voting is an asynchronous protocol. (So is SCP.) In other words, nodes are not coordinated by time, only by the messages they send. From a node’s point of view it’s not clear when nomination has ended. And although all nodes will eventually arrive at the same composite candidate, they may take different routes to get there, producing different, valid composite candidates along the way, never able to tell which is the final one.

This is OK though. Nomination is only about producing — and limiting the number of — candidates for consensus. The rest of consensus is handled by a process called balloting.

Balloting

A ballot is a pair: <counter,value>, where counter is an integer starting at 1 and value is a candidate from the nomination phase. It could be the node’s own candidate, or some peer’s candidate that the node comes to accept. Roughly speaking, balloting makes repeated attempts to get the network to reach consensus on some candidate in some ballot by conducting potentially many federated votes on statements about ballots. The counters in ballots track the attempts made, and ballots with higher counters take precedence over ballots with lower counters. If ballot <counter,value> appears to be getting stuck, a new vote begins, now on ballot <counter+1,value>.

It’s important to distinguish between values (e.g., what the lunch order should be, such as pizza or salads), ballots (a counter-value pair), and statements about ballots. A round of SCP includes multiple rounds of federated voting on such statements, specifically these:

“I am prepared to commit to ballot B,” and
“I commit to ballot B.”

From the point of view of a given node, consensus is reached when it finds a ballot B for which it is able to confirm (find a quorum accepting) the statement “I commit to ballot B.” At this point it is safe to act on the value in B— e.g., to place that lunch order. This is called externalizing the value. Once a ballot is confirmed committed, a node can be sure that every other node has externalized, or inevitably will externalize, the same value.

Although conceptually many federated votes take place on statements about many different ballots, far fewer actual protocol messages are exchanged, because each one encapsulates a range of ballots. A single message thus advances the state of many federated votes at once, e.g.: “I accept that ballots in the range <min,V> to <max,V> are committed.”

So what do “prepared” and “committed” mean?

A node votes to commit to a ballot when it is convinced that other nodes won’t commit to ballots with different values. Becoming convinced of this is the purpose of the prepare statement. A vote that says “I am prepared to commit to ballot B” is a promise never to commit to any ballot less than B — i.e., one with a smaller counter.⁶ Those lesser ballots are said to be “aborted” by the prepare vote, while B is “prepared.”

Why does “I am prepared to commit to ballot B” mean “I promise never to commit to ballots less than B”? It’s because SCP defines “abort” to be the opposite of “commit.” A vote to prepare a ballot is implicitly also a vote to abort some other ballots, and as we discussed earlier, voting for something is a promise never to vote against it.

Before ever casting a “commit” vote, a node must first find a ballot that it can confirm is prepared. In other words, it conducts federated voting on “I am prepared to commit to ballot B” for possibly many different ballots until it finds one that a quorum accepts.

Where do ballots come from for the prepare votes? The first prepare vote that a node casts is for <1,C>, where C is the composite candidate produced by the nomination phase. However, even after prepare votes begin, nomination may produce additional candidates, so these become new ballots. Meanwhile, peers may have different candidates, and they could form a blocking set that accepts “I am prepared to commit to ballot B2,” which would convince the node to accept it too. Finally, there’s a timeout mechanism that causes new rounds of federated voting to begin on new ballots with higher counters if the current ballots seem to be stuck.

Once a node finds a ballot B that it can confirm is prepared, it casts a new vote for “I commit to ballot B.” This vote tells peers that the node will never abort B. In fact, if B is the ballot <N,C>, then “I commit to ballot <N,C>” is implicitly also a vote to prepare every ballot from <N,C> through <∞,C>. This extra meaning helps other nodes receiving this message to catch up, if they’re still in earlier phases of the protocol.

It’s worth re-emphasizing at this point that these are asynchronous protocols. Just because one node is sending commit votes doesn’t mean that its peers are, too. Some might still be voting on prepare statements, others might already have externalized a value. SCP explains how a node should handle each type of peer message no matter what phase it’s in.

If “I commit to <N,C>” can’t be accepted or confirmed, perhaps <N+1,C> can, or <N+2,C> — or, in any case, some ballot with C in it and not any other value, since the node has now promised never to abort <N,C>. By the time a node is casting commit votes, it’s C or bust as far as consensus goes. However, this isn’t enough for the node to externalize C yet. Some Byzantine peers (amounting to less than a quorum, based on our safety assumptions) could be lying to the node. Accepting and then confirming some ballot (or range of ballots) is what gives the node the confidence finally to externalize C.

SCP balloting using federated voting. Not depicted: a timer may fire at any point to increment the counter in the ballot (and perhaps produce a new composite from additional nomination candidates).

And that’s it! Once the network has come to consensus, it’s ready to do it all over again. In the Stellar payment network, this happens roughly once every 5 seconds: a feat that requires both the safety and liveness guaranteed by SCP.

SCP is able to accomplish this by relying on multiple rounds of federated voting. Federated voting is made possible by the concept of quorum slices: sets of peers that each node has chosen to trust as part of its own (subjective) quorum. This configuration means that it’s possible to come to consensus, even in a network with open membership and Byzantine failures.

Endnotes

“Byzantine” fault tolerance — the ability to trust a group decision even when some members of the group might be lying or otherwise not following the decision-making rules — is so named because of a parable about generals of the Byzantine Empire trying to coordinate an attack. Anthony Stevens has written a good description of it.
The SCP whitepaper does not specify a communication mechanism. Implementations will typically use a gossip protocol to ensure that messages propagate throughout the network.
A Byzantine agreement system is designed in terms of the question: How many misbehaving nodes should the system survive? In a system of N nodes intended to survive f failures, a node must be able to make progress after hearing from N−f peers, since f of them may have crashed. However, after hearing from N−f peers, it’s possible that all the remaining f peers (that the node hasn’t heard from) are honest, and up to f out of the N−f peers (that the node has heard from) are malicious. To ensure nodes all reach the same conclusion, then, a majority of the N−f nodes one hears from must be honest, meaning we need N−f > 2f, or N > 3f. So typically a system designed to survive f failures will have a total of N=3f+1 nodes and a quorum size of 2f+1.
The safety and liveness guarantees of SCP are subject to theoretical limits. The design of SCP trades a very strong safety guarantee for a slightly weaker liveness guarantee: given a sufficient time bound, consensus will be reached with high probability.
Not all votes from peers get echoed by a node during nomination, because this could lead to an explosion of different nominees. SCP includes a mechanism for throttling these echoed votes. In short, there is a formula for determining the “priority” of a peer from a node’s point of view, and only high-priority nodes get their votes echoed. The longer nomination takes, the lower the threshold gets, so a node expands the set of peers whose votes it will echo. The priority formula includes the slot number as one of its inputs, so a high-priority peer for one slot may be low-priority for the next, and vice versa.
SCP requires that the values in ballots have some order defined on them. So ballot <N1,V1> is less than <N2,V2> if N1<N2, but also if N1=N2 and V1<V2.

Acknowledgments

Many thanks to those who contributed feedback about, suggestions for, and content to this document: David Mazières; Tess Rinearson; Oleg Andreev; Vicki Niu; Cathie Yun; Adam Diehl; Lindsay Lin; Suzanne Glickstein.