Stellar FBAS intuition

A Stellar consensus building block

Pieterjan
Pieterjan
Nov 14, 2020 · 15 min read

Introduction

The Stellar network is an open and distributed network for storing and moving money. Every node has its own copy of the shared ledger that holds a list of all the accounts, balances,... In order for the network to be safe, it’s important that all the nodes have the same copy of that ledger at all times. Double spending should be impossible, the situation where Bob spends his money in a store connected to a specific node, and can spend that same money in another store that is connected to another node. The network should also be live and keep processing transactions timely even if parts of it are failing or taken over by malicious actors.

To have safety and liveness in an open network where anyone can spin up a node and participate is quite the challenge. To solve this, Stellar lets every node individually decide which other nodes in the network they trust. The network of trust that emerges between the nodes is called a federated byzantine agreement system, or FBAS in short. Every node can then run the SCP Stellar Consensus Protocol to update its ledger. Think of SCP as a series of steps to take, a contract to follow. However not all FBAS are created equal. If the trust network between nodes lacks certain properties, SCP will not be able to guarantee safety and liveness in the network.

In this blogpost we will try to gain an intuition about an FBAS and its properties by exploring a simple network, starting from a single node. To help solidify your knowledge you can experiment with the example network yourself. Not everything is handled in full detail, so if you want to dive deeper explore the links at the bottom.

A simple agreement system

An agreement system allows the participants of the system to agree on things. In our case they decide what set of valid transactions are applied to the shared ledger, for example the payments between users.

Let’s start our own Stellar network. Every node in the network will have its own copy of the shared ledger that contains all account balances and transactions.

We want a network that is both safe and has liveness. Safe meaning that only valid transactions are added to the ledger, and that all the ledgers on all the nodes are perfectly in sync. Liveness meaning that the network will keep processing new transactions, even if some parts of the network are failing or misbehaving.

We start by spinning up a node, SDF1, the only node in the network. All transactions are sent to this node and get recorded in a big pool of possible transactions. Every five seconds SDF1 picks valid transactions from the pool to record in the ledger. A transaction is valid if it passes the necessary checks on the node, for example if a payment is correctly initiated by an owner (or signer) of the account. Agreement with other nodes is straightforward as it’s the only player in town.

There are some obvious problems with liveness and safety in this setup however. If SDF1 can’t handle the traffic or goes down, no more transactions are possible. It’s also an intriguing target for hackers to try and corrupt the ledger. Let’s expand our network.

A federated agreement system

The federated part of an FBAS means everyone is free to join the network. There is no central authority that acts as a gatekeeper. Every node can also decide which other specific nodes they depend on to reach an agreement on the shared ledger.

In our example, SDF, the organization behind our node SDF1, gets to know Lobstr and SatoshiPay. The three organizations decide to collaborate and, to make the network more robust, they each supply two nodes. The network now consists of SDF1, SDF2, SatoshiPay1, SatoshiPay2, LOBSTR1 and LOBSTR2.

The nodes create peer to peer connections and exchange the transactions they receive. A common pool of transactions is built up. Now the nodes need only to agree on what valid transactions get applied to the ledger.

The validity of a transaction is checked on every node individually, but how do we decide which valid operations get applied to the ledger? We could for example let a majority of nodes decide. But remember, this is an open network and anyone can join. What’s stopping some malicious organization to spin up 10 nodes and hijack the majority? We want a better solution, a mechanism to only reach consensus with each other, the nodes they know and trust.

Define your trust connections through quorum slices

Every node can define its own quorum slices, sets of nodes that are required for agreement. A node can only add a set of transactions to the ledger, if all the nodes in one of his slices agree to this. Through these quorum slices the nodes create trust connections. If you look up the definition of an FBAS in the Stellar SCP paper, that’s what it actually is, a system of nodes and their quorum slices. Now comes the tricky part however, configuring the quorum slices in the right way to enable a safe and live SCP protocol to add new transactions to the ledger.

Note: Peer to peer connections can happen between every node in the network to exchange messages, trust connections only exist by explicit configuration on the node itself.

Let’s say for example that SDF1 defines the quorum slice {SDF1, SatoshiPay1, LOBSTR1}. Now SDF1 will only apply a set of transactions to the ledger, if SatoshiPay1 and LOBSTR1 also agree to apply this exact same set.

How the nodes land on a specific set of transactions and how they are sure that all nodes apply the same set is the territory of SCP and federated voting, which we’ll not get into here. We will focus solely on the FBAS, and the properties it needs to attain so SCP can work as intended.

Quorumsets, a compact format to define your quorum slices

Because a node wants to be able to continue validating, even when another node it trusts isn’t working properly, it should define multiple quorumslices. For a large network, there could be a lot of interesting slices and this could get unwieldy. Stellar uses the quorumset configuration format to mitigate this. Basically you group (and nest) the nodes you trust and define thresholds for every group. From these groups all combinations of nodes with the requested threshold sizes are generated and will act as quorum slices.

Note: the node itself is implicitly added to the quorumset, so it’s always present in every quorum slice.

Let’s explain with an example. For our network we configure SDF1 with the following quorumset:

SDF1 quorumset

The inner groups with threshold 1 are straight forward, combinations for the LOBSTR group with length 1 are the individual nodes {LOBSTR1} and {LOBSTR2}. The same goes for the SatoshiPay group {SatoshiPay1} and {SatoshiPay2}. The top group has threshold 2, this means take the combinations of size 2 of the provisional slices we found in the inner groups. For example the slice {LOBSTR1, SatoshiPay1}. Finally we add the SDF1 node to every slice, because it has to agree with itself. The resulting quorum slices for SDF1 are {SDF1, SatoshiPay1, LOBSTR1}, {SDF1, SatoshiPay2, LOBSTR1}, {SDF1, SatoshiPay1, LOBSTR2}, {SDF1, SatoshiPay2, LOBSTR2}. This quorumset translates to: SDF1 and at least one node of the other organizations, SatoshiPay and LOBSTR have to agree on a set of transactions to apply to the ledger.

In our example network we configure similar quorumsets for each of the other nodes. Here you can see this network live in action. Click on a node to see its quorumset and trust connections. In the tools section in the sidebar you have a button to inspect the quorum slices. You can also modify the quorumset yourself to inspect the resulting quorum slices.

Trust network
Click on a node to see the trust connections
View quorum slices and modify quorumset

So a quorum slice can convince a single node to include a set of transactions into the ledger. But how do we know that all nodes in the network will agree on the same set?

Quorums emerge

Let’s start with the single node again. It agrees to apply a set of transactions to its ledger if all the nodes in one of its quorum slices agree to apply that same set. But before one of those trusted nodes can agree to apply that set, they themselves need to have a quorum slice that agrees to apply that same transaction set. And this logic repeats for every node in those quorum slices.

So we see that all the nodes that are included in this process to convince the node have to agree on the same set of transactions. We say these nodes form a quorum. More formally: A quorum is a set of nodes that contain a quorum slice for each member.

Because a single node could have multiple quorum slices to choose from, a lot of quorums can possibly emerge. But only the first quorum that completes is responsible for giving the go ahead on a set of transactions. Conversely, if there would be no quorums available, no node would be able to reach consensus. The network would halt. We have found our first necessary property for a working SCP protocol, there need to be quorums.

If there are quorums we attain the ‘quorum availability’ property. And with this property SCP can ensure liveness. The network will not get stuck in choosing a new set of transactions to apply to the ledger.

Let’s do a simple example in our network. Let’s see a quorum that emerges from SDF 1. We first select a quorum slice of SDF1: {SDF1, LOBSTR1, SatoshiPay1}. We add these nodes to our provisional quorum: {SDF1, LOBSTR1, SatoshiPay1}.

Because we want LOBSTR1 to agree on a value, we add one of its quorum slices, for example {LOBSTR1, SDF2, SatoshiPay1}. SDF2, is not yet part of the quorum, so we add it: {SDF1, SDF2, SatoshiPay1, LOBSTR1}.

We repeat the process for SatoshiPay1 and add one of its quorum slices: {SatoshiPay1, SDF1, LOBSTR1}. This slice doesn’t introduce new nodes to the quorum. Now we have only SDF2 that needs a quorum slice, for example {SDF2, LOBSTR1, SatoshiPay1}. Also no new nodes are discovered.

The resulting quorum is {SDF1, SDF2, SatoshiPay1, LOBSTR1}. Great, our example network can ensure liveness and will not get stuck processing new transactions.

So all the nodes in a quorum agree on the same value. But there are many possible quorums, and not all quorums contain the same nodes. How do we know if all the nodes in the network, not just a single quorum, agree on the same statement?

Quorum intersection

Let’s start with an example on how two quorums in our network could agree on different statements.

Let’s again build a quorum for SDF1 with slice {SDF1, LOBSTR1, SatoshiPay1}. For LOBSTR1 we pick the slice {LOBSTR1, SDF1, SatoshiPay1}. For SatoshiPay1 we pick the slice: {SatoshiPay1, LOBSTR1, SDF1}. The emerging quorum is {SDF1, LOBSTR1, SatoshiPay1}

Note: if you look at the quorum from the previous example {SDF1, SDF2, LOBSTR1, SatoshiPay1}, we see that the quorum {SDF1, LOBSTR1, SatoshiPay1} is completely contained in that quorum, it is a subset. This means that every time the big quorum reaches agreement, the small quorum also gets there (faster). For analysis purposes we mostly look at the minimal quorums, the smallest possible subset quorums.

Let’s now build up a quorum for SDF2. We pick the slice {SDF2, LOBSTR2, SatoshiPay2}. For LOBSTR2 we pick a slice {LOBSTR2, SDF2, SatoshiPay2}. For SatoshiPay2 we pick the slice {SatoshiPay2, LOBSTR2, SDF2}. The emerging quorum is {SDF2, LOBSTR2, SatoshiPay2}.

There is no overlap in nodes between the quorums. It could happen that LOBSTR1, SDF1 and SatoshiPay1 agree on a set of transactions, but at the same time SDF2, LOBSTR2 and SatoshiPay2 agree on a different set. When this happens the network is forked, ledgers are no more in sync and funds can be double spent.

Because we know that all the members of a quorum need to agree unanimously, we know that all the members of every possible quorum will agree on the same thing when these quorums overlap and share nodes. We call this quorum intersection, and it’s another necessary property for SCP to work, to ensure safety.

So to answer the question how do we get all the nodes in the network to agree? We have to configure the quorum slices of our nodes in a specific way, so that the quorums they form overlap in at least one node.

As we showed above, our network doesn’t not have quorum intersection. If you look at the minimal quorums by running the network analysis tool, you can check this out for yourself. Follow the link and click on network analysis in the tools section.

Network analysis tool

As a second example network to experiment with that does have quorum intersection, you can follow this link.

network with quorum intersection

Transitive quorum set

If you look at our examples on stellarbeat.io you’ll see mention of the transitive quorum set. The nodes & organizations it contains are listed on the network overview, and the graph view shows it with a blue background. Without going in too much detail, you can think of a transitive quorum set as the most important trust cluster. In a trust cluster every node has a trust path to every other node in that cluster. In graph theory terms it’s called a strongly connected component.

There can exist other nodes and clusters that trust and rely on the transitive quorum set, but the transitive quorum set itself does not rely on other nodes or clusters. It’s the core of the network.

If there is no transitive quorum set, then there can be no quorum intersection. So it’s an interesting concept to see how the core of the network looks and relies on each other, but it’s also a useful heuristic for a correctly configured network.

A transitive quorum set doesn’t imply however that there is quorum intersection, as you can see in our first example. You can explore the concept for yourself by adding a new node in the tools menu, and adding some validators to its quorumset. You can also look at what happens if a node from the transitive quorum set decides to trust your new node back.

New node not in transitive quorum set
Simulate a new node in the tools menu.

Note: If you want to learn more about the transitive quorum set, check out this paper and the section on the greatest strongly connected component.

Byzantine faults

Just like other distributed databases, the system has to keep working in spite of node faults due to hardware issues, network issues,… But because of the open nature of the network, it also has to be able to handle malicious nodes. These nodes could try to disrupt the system by sending out inconsistent messages to different nodes, by not following the protocol. These nodes trigger what are called byzantine faults.

Consider the scenario where two quorums overlap in one node. If this node goes down, in a non-byzantine scenario, those quorums would never complete and agree. In the worst case there would be no more quorums left and the network would halt. In the byzantine scenario however, a malicious node could convince the one quorum to choose a certain transaction set, and the other quorum another set by sending different messages causing a network split. It could also ‘censor’ certain transactions by never agreeing to them.

If we want a safe and live network, we need to have quorums and quorum intersection despite malicious nodes. This is another way of saying that after removing all of the malicious nodes from the quorums, there should still be quorums left that overlap. However knowing which nodes are malicious before they act maliciously could be impossible. Therefore we should configure our nodes in a way that we are not dependent on single nodes or organizations, there should be adequate safety and liveness buffers.

Optimal network configuration

Because the configuration of the network is dependent on the configuration of every single node, and everyone is free to configure it the way he/she wants, it is a challenging problem to get right. Stellar core, the software on the nodes, helps the individual node owner by generating sane quorumset configurations.

Because a node sends its quorumset configuration along with every message, stellar core can check if the quorums that emerge from its slices overlap, and refuse to continue if there is no quorum intersection. After all, a safe network is more important than a (temporary) halted one.

To optimize their configs the node administrators can get together, for example in the keybase stellar validator channel, and discuss paths forward.

Next up we will go over some concepts to assess the risks to liveness and safety in a live network.

Liveness risks: Network blocking sets

In the network there are sets of nodes that intersect every quorum called blocking sets. If one of these sets would go down, the network would halt. Intersect means that at least one of the nodes in the blocking set is present in every quorum, thus without them, no quorums would complete or exist.

Let’s look at our second example network with quorum intersection, run the network analysis again and navigate to the liveness risk tab.

We see that LOBSTR1 and LOBSTR2 form a blocking set and intersect every quorum. If you look at the quorums in the quorum intersection tab, you can confirm that either LOBSTR1 or LOBSTR2 is present. Also if you navigate to both nodes and click the ‘stop validating’ button in the tools section, you can see what the impact is on the network.

It’s also interesting to note that the analysis shows that all the blocking sets are part of a single organization. This means that we are highly dependent on every individual organization for liveness in this example.

Liveness risks can change at every moment. If some nodes go down, the risk increases. It’s important to keep an eye on this and update your configuration if nodes you trust on seem unreliable.

Safety risks: Network splitting sets

If we look at the intersection of two quorums, we get the nodes that could, if malicious, cause the network to fork. We call this a splitting set. If we look at what the smallest possible splitting sets are in our network, we get an indication on how many nodes need to be malicious in order to endanger safety.

Let’s have another look at the network analysis tool and navigate to the safety risks tab.

We can see that every quorum intersects in at least three nodes across three organizations. For our network this is actually a pretty good result, because all the organizations would need to collude or have a node hacked to split the network. If we combine this with the liveness analysis, we could say we have configured a network that has good safety properties, but is in danger of losing liveness. A network split would be unlikely, but two nodes of the same organization could halt the network.

All configurations will have trade-offs between liveness and safety. You can check out the state of the Stellar public network here. With the clock button you can time travel to interesting points in time and inspect quorumsets and the validating status of the nodes.

Note: If you want to learn more about blocking and splitting sets, check out this paper, tool and accompanying website.

In conclusion

We saw how an FBAS is the sum of the configured quorum slices of the individual nodes. A node is convinced by its quorum slices to add specific transactions to the ledger. The nodes form quorums that agree on the same set of transactions and if these quorums intersect we know that all the nodes in the network are in sync. The network could be brittle however, so risks should be assessed and configurations optimized. With this foundation the nodes can run the Stellar Consensus Protocol with safety and liveness ensured.

Further information

Stellar consensus protocol: https://developers.stellar.org/docs/glossary/scp/

The Sum of Its Parts: Analysis of Federated Byzantine Agreement Systems: https://arxiv.org/abs/2002.08101 & https://trudi.weizenbaum-institut.de/stellar_analysis/ & https://github.com/wiberlin/fbas_analyzer

Mathematical Analysis and Algorithms for Federated Byzantine Agreement Systems: https://arxiv.org/abs/1912.01365 & https://github.com/andrenarchy/stellar-observatory

Stellarbeatio

Stellar network visibility

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store