Simply explained: Stellar Consensus Protocol

Nope, it’s not PoW nor PoS


This article is the second in a series of articles titled “Blockchain definition of the week”. Every Wednesday, I will publish an article addressing an important definition of the decentralized world. These definitions may be relevant to the technology as a whole, some individual aspect of blockchain, or be regarding a specific blockchain protocol.

The choice of “Stellar Consensus Protocol” as the topic of the second week was made to offer a comprehensive explanation of a different form of reaching consensus in decentralized systems, which underlies one of the most well-known distributed technology platforms, Stellar Network.

Additionally, if you speak Portuguese, you can check out my other series, titled “Smart Contracts”, with articles published every Tuesday.


IMPORTANT:

This article requires the reader to have some previous understanding of distributed networks and nodes.

Also, SCP is a complicated concept to grasp, and this article is meant to be an introductory step into understanding the protocol fully. Simplifications were used to limit the complexity of the article and hence may not be technically correct in its entirety. Mathematical proofs and theorems were not displayed for the same reason.


Introduction & Key Concepts

Before we dive deep into the technicalities of the SCP, there are a few terms we should cover:

Stellar Network

Stellar is a platform released in 2014 with the aim of utilizing blockchain to provide more accessible financial services to people worldwide. It has IBM, Deloitte and Stripe as partners, and its token, called Lumens (XLM), has the sixth largest market cap of all cryptocurrencies (as of this writing). The Stellar Network boasts very fast transactions with extremely low fees, and while its focus is on providing a platform for cross-border payments, it also permits the creation of smart contracts.

You can find more info about Stellar on their website www.stellar.org. Today, we will only discuss the consensus protocol utilized by its platform.

Byzantine Generals Problem (BGP)

This problem is inspired by the army of the Byzantine Empire, which was an important power for more than a thousand years

A large army, after defeating most enemy forces, now encircles the capital city, coming from every direction. The army is split into divisions, and each division is controlled by one general. The generals only communicate between each other through messengers. They all face a decision: should they attack or retreat? For either of these to be successful, they must do it with full force. In an ideal scenario, generals will submit their votes and the majority vote will determine the outcome to be followed. However, in reality, there are traitors among the army, messages can get lost or changed along the way, and communication is not the best even within the same division. A general can, for example, send one message to one peer, and a different message to another, if he is traitor. Hence, it is very difficult to reach consensus on what the army should do in order to succeed.

*The above is a simplified version of the BGP.

Decentralized networks such as blockchains are a large Byzantine Generals Problem. Security of these networks must come from an underlying principle that not all participants in the network have its best interests in mind, and some may want to attack it. How can we build a ledger accepted by presumably untrustworthy peers as fact and agree on what should be included in that ledger? To solve this problem, there needs to be consensus, which can be achieved in a variety of forms, one of which is presented in this article.

Consensus (in distributed networks)

“Consensus is the judgment arrived at by most of those concerned.” — Merriam-Webster dictionary

When we talk about ledgers, there must be an agreement that the information registered is factual. In the case of traditional centralized ledgers, that agreement is simple: the centralizing power, being the only one with access to the ledger, simply claims the information is true, and that must be accepted by users of the ledger, who trust the centralizing agent.

In the case of distributed networks, this process is a little more complicated. In public blockchains, anyone can add (or attempt to add) information to the ledger, and anyone has the right to verify if that information is or is not true. For that reason, the network must reach consensus, a general agreement that the data is correct. However, that is not an easy task, because the users and validators of the ledger are unknown, and hence cannot be trusted.

Hence, to achieve consensus among untrusted parties and confirm the validity of the data stored in the ledger, there must be a protocol in place to define how information is added to the registry. This is where consensus protocols come in. Famously, Bitcoin uses Proof-of-Work to guarantee validity of transactions. Along with requiring an agreement from the majority, PoW also requires the use of resources, namely computational effort, to provide extra security for the network, since attackers will have to spend resources (money) to attempt to harm the network. Alternatively, those who work in favor of the network are encouraged to continue to do so by financial incentives.

Proof-of-Work, however, has its disadvantages, which has led to the creation of alternatives that aim to be a better option. Proof-of-Stake is one of the most well-known alternatives for PoW, a category which also contains the Stellar Consensus Protocol.

Byzantine Fault Tolerance (BFT)

Byzantine Fault Tolerance is a quality of a system which withstands the inherent failures of the BGP. Distributed networks have many nodes which interact with each other to decide on the current state of the ledger. These nodes can be honest nodes, or they can be corrupt, and the network must be able to handle all the false information spread by the corrupt nodes and still reach a consensus on the data that is true.

When dealing with BFT, no restrictions are set regarding the possible behavior of a node. The network must be secure assuming nodes can act in any way they wish, which is what makes the problem very difficult to solve. A Byzantine Fault Tolerant network solves the double-spending problem, as described by Satoshi Nakamoto in the Bitcoin Whitepaper.

BFT is extremely important in the field of Distributed Ledger Technology (of which blockchain is a part of), but its applications are not limited only to DLT, also being used, for example, in the control systems of a Boeing 787.

Federated Byzantine Agreement

Image taken from: https://medium.com/a-stellar-journey/on-worldwide-consensus-359e9eb3e949

Stellar Consensus Protocol is the first implementation of a type of consensus protocol called Federated Byzantine Agreement (FBA). FBA was idealized in 2015 by Professor David Mazières of Stanford. It describes a way for nodes in a network to reach an agreement, and can be used for blockchains, yet its use is not limited to them. The key features of FBA are, as per the SCP Whitepaper:

  • Decentralized control: Anyone can be a validator in the network. There are no restrictions.
  • Low latency: Consensus can be reached in a few seconds.
  • Flexible trust: Participants are free to decide what other nodes they trust, without any restrictions.
  • Asymptotic security: Digital signatures and hash families are used in a way to protect the network even against immensely high computing power.

The Stellar Network uses SCP to reach consensus in its platform, however, unlike other common consensus protocols, SCP does not define a system of incentives for nodes, nor does it provide a system for coin emission (useful in the case of cryptocurrencies and public blockchains). Hence, Stellar Network’s emission of Lumens (XLM), is independent from its consensus protocol.

Quorums, slices and intersections

At the essence of the SCP are quorums and quorum slices. As denoted in the SCP Whitepaper, the definitions of those concepts are:

Quorum: A set of nodes sufficient to reach agreement.
Quorum slice: A set of nodes sufficient to convince another node of a statement’s validity. Also called ‘slices’ (definition slightly adapted).

Nodes in the Stellar Network decide, independently, what other nodes they trust for information. Hence, the set of nodes selected as “trusted” by node n and node n itself define the quorum slice of n. If n has l and m in its slice and both accept a certain piece of information, n will accept that information as well. Now, if l and m both have two other nodes in their respective slices, these nodes are also necessary to form the quorum. Hence, a quorum is a set of nodes that contains at least one slice from each of its members(David Mazières: “The Stellar Consensus Protocol” | Talks at Google).

Let’s visualize this:

Figure 1: Quorums and quorum slices

Quorum slices are determined by the arrows. Example: N5’s slice contains itself, N6 and N7.

Don’t worry if it looks a little complicated, let’s break it into parts.

Quorum 1

Composed of nodes: N1, N2, N3 and N4.

Composed of slices:

  • N1, N2, N3
  • N2, N4
  • N3, N4

Quorum 2

Composed of nodes: N5, N6 and N7.

Composed of slices:

  • N5, N6, N7
  • N6, N7

Hence, we have the following considerations:

  • N4 can determine the agreement of Quorum 1
  • N7 can determine the agreement of Quorum 2
  • One quorum’s agreement does not interfere with the other’s, since no nodes are shared between them.
  • We have two separate quorums that do not intersect, also called disjoint quorums.

Figure 2: Larger quorums

Alternatively, if there was no N7, and the slices were arranged as they are above, we would not have two separate quorums, but instead one larger quorum.

Figure 3: Quorum intersections

With a change of direction in some of the arrows of the diagram above, we can now introduce another concept, quorum intersections.

Quorum intersections happen when two quorums share a node essential to agreement, hence interfering in each other’s internal agreement. In Figure 2, N1 is at the point of intersection. N1 is needed for agreement in both quorums. A strong consensus protocol needs these intersections, because otherwise the network could reach separate agreements. In fact, every quorum needs at least one intersection with another for consensus to work.

From the diagrams above we can also notice that, naturally, various tiers of nodes will form, ranked in order of importance for consensus. In Figure 2, for example, N4 is the most important node of the quorum, followed by a second tier containing N2 and N3, and so on.

Slots

Slots are unique updates to the ledger, which all nodes must agree to. The SCP whitepaper notes:

“For instance, slots may be consecutively numbered positions in a sequentially applied log.”

Based on the definition above, we can derive that, in blockchains, these slots are, in fact, blocks. All participants in a blockchain network must agree on the validity of a block and update their personal ledgers to include it once it is published. Once that is done, consensus is achieved.

Node failures

By now, we discussed how quorums, slices and intersections are used in order for nodes to reach an agreement. However, consensus in the network cannot happen that easily because of node failures. In an ideal scenario, all nodes are well-intentioned and helpful for the network. However, as stated by BFT, to create a system that is truly fault-tolerant, no assumptions can be made about the behavior of nodes i.e. we must prepare for the worst. Take a look at the diagram below to understand the possible types of node failure.

Ill-behaved

Ill-behaved nodes are those that do not follow protocol rules and do not act in the network’s best interests. They do not choose sensible quorum slices, may have crashed, may be intentionally sending false messages or may be modifying the software. These nodes suffer byzantine failure, hence, by default, they have failed. Independently of its intentions, be they malicious or not, ill-behaved nodes cannot be trusted to maintain consensus in the network.

Well-behaved

Well-behaved nodes, on the other hand, are those that follow protocol and best practices. They not only are well-intentioned, but also take the necessary precautions to ensure safety and the proper functioning of the network. However, they can still fail due to external factors. There are two ways in which they can fail, listed below:

Blocked: Nodes that cannot output a value for agreement without relying on another failed node.

Divergent: Nodes that output a value that diverges from other nodes for the same input.

Inputs above refer to a set of data (slot) for which the node must provide an “answer”. As an analogy, it is equivalent to the block data (transaction data, previous block hash, difficulty, etc) that miners use to find a hash and solve the block in Proof of Work. Outputs refer to the “answer” given by the node regarding the set of data that is published to the network. In the same analogy from above, it is equivalent to the hash found by the miner for a block.

Example: Let’s assume the input is a transaction. The output must then denote if that transaction is valid or not. If a node’s “opinion” regarding the validity of the transaction relies on an ill-behaved node (because it is a part of its quorum slice), it is considered blocked. That is because its output is compromised by a faulty node. If a node, although well-behaved, outputs “valid”, when the transaction is in fact not valid as per the network consensus, that node can be considered divergent.

Correct nodes are those that did not fail in any way.

Federated Voting

To assure network consensus remains unharmed despite node failure, SCP introduces Federated Voting, which used with quorums and slices, allows for a network to achieve consensus without the need of all the nodes in the network. FV has three phases:

Vote → Accept → Confirm

Given the input “Are tomatoes fruits?”, nodes will vote on what they think is valid. In the diagram above, N1 can vote “Yes” or “No”. Once voted, nodes cannot change their vote, but they can accept another output.

Let’s say N1 votes “No”. N1 will then look at its quorum. If the quorum voted or accepted “Yes”, N1 will then accept “Yes”. These influence systems will then, ideally, push the network towards a consensus on the correct output.

Intact nodes, those whose output does not depend on any failed nodes, will always output the same value.

After accepting, nodes must then confirm the output value. This is done to guarantee that intact nodes that could not accept an output can confirm that that output was accepted, and also to add security to the protocol. The reasons for why this is true will not be treated in this article.

Essentially, after accepting an output value, the network of nodes will then check that “Hey, we all accepted this, right?”, and confirm the value, which will then be externalized by all nodes.

Ballots

Assuming a minimal amount of failed nodes, the system will be able to reach a consensus through quorum slices influencing each other. But all that is not enough. Depending on the arrangement and quantity of failed nodes, the system may get stuck. That is, quorums cannot reach agreement on what value should be confirmed.

Enter ballots. To solve the problem of the network getting stuck, SCP institutes a system whereby a referendum is held to commit or abort ballots. Ballots are associated with output values from nodes. Before they are eligible for voting, there is a nomination process that disqualifies invalid ballots. Nodes will then vote to either commit and accept a value (ballot), or to abort a ballot, rejecting that specific value from being accepted. Hence, if ill-behaved nodes are outputting different values, these can be disqualified by the well-behaved nodes.

There are various reasons for doing this in ballot form, rather than a simple vote to accept a value. The reasons and mathematical proof are provided by theorems that can be found on the SCP Whitepaper. One important consideration is that when voting to accept an output, you may have a split vote. However, with ballots, there is the possibility to remove (abort) values from consideration, re-starting the voting process and attempting to reach consensus with a more limited set of options.

A note on security

A few aspects cause the network to inherently tend towards more security, because of, yet not limited to:

  • Well-behaved nodes will not want to keep failed nodes in their slices, as it affects them negatively, creating a natural selection of “good” nodes.
  • The natural tiering of the network into nodes more important to consensus happens because participants independently choose some nodes as their “trusted parties” more often than others. That puts the nodes that the network regards as the most trustworthy on top.

Limitations

As stated by its creator, David Mazières, SCP has the following limitations:

  • It does not provide a system for minting digital coins
  • It does not provide an incentive scheme to encourage good behavior from nodes
  • It does not tell nodes who they can trust, meaning that nodes can choose bad quorum slices (although they have incentives not to) and possibly harm consensus

Conclusion & Final considerations

Stellar Consensus Protocol is the first implementation of Federated Byzantine Agreement and provides a new way for distributed networks to reach consensus. Although it is the protocol utilized by the Stellar Network to reach consensus, its applications go beyond blockchains.

If you would like to dig deeper, you can visit the links below, also used by me as references for this article:

Alternatively, if this article was already too complex, you can check out “Adventures in Galactic Consensus”, by Stellar.org.


Thank you for reading. Comment below if you still have any questions and also what definition you would like to hear about next.