Software Architecture. Consensus Problems in Distributed Systems. Part 1

Dmytro Nasyrov
Pharos Production
Published in
7 min readAug 27, 2024

When developers use cloud infrastructures and various databases and work in clusters of many nodes, they are confident that the data will be complete, safe, and always available. But where do the guarantees come from? In essence, the guarantees that we have are the guarantees of the supplier. They are described in the documentation as follows: “This service is quite reliable, it has a specified SLA, do not worry, everything will work distributedly, as you expect.”

We believe in the best because smart guys from big companies assure us everything will be fine. We do not ask ourselves: why can this work at all? Is there any formal justification for the correct operation of such systems? Most modern distributed systems use the Paxos consensus algorithm and its various modifications. The coolest thing is that the validity and, in principle, the very possibility of the existence of this algorithm can be proven simply with a pen and paper. In practice, the algorithm is used in large systems operating on many nodes in the cloud.

Distributed System

A distributed system is a group of computers (from now on called nodes) that can exchange messages. Each node is an autonomous entity. A node can process tasks independently, but it needs to send and receive messages to interact with other nodes. How exactly messages are implemented and what protocols are used—does not interest us in this context. It is essential that the nodes of a distributed system can exchange data with each other by sending messages.

The definition is simple, but we must consider that a distributed system has several important attributes.

Attributes of distributed systems

Concurrency. The possibility of simultaneous or concurrent events occurring in the system. Moreover, we will consider events occurring on two different nodes concurrently as long as we do not have a clear order of occurrence of these events. And, as a rule, we do not have it.

Absence of a global clock. We do not have a clear order of events due to the absence of a global clock. In the ordinary world, we are used to having a clock and absolute time. Everything changes when it comes to distributed systems. Even ultra-precise atomic clocks have drifted, and there may be situations when we cannot say which of two events occurred earlier. Therefore, we must rely on something other than time.

Independent failure of system nodes. There is another problem: something can go wrong simply because our nodes are not eternal. A hard drive can fail, a virtual machine in the cloud can reboot, the network can blink, and messages can be lost. Moreover, there are situations when nodes work but work against the system. The latter class of problems even received a separate name: the Byzantine generals' problem. The most famous example of a distributed system with such a problem is Blockchain. But today, we will not consider this particular class of issues. We will be interested in situations where just one or more nodes can fail.

Communication models (message exchange models) between nodes. We have already found out that nodes communicate by exchanging messages. There are two known message exchange models: synchronous and asynchronous.

Models of communication between nodes in distributed systems

Synchronous model. We know there is a finite known time delta during which a message is guaranteed to reach from one node to another. If this time has passed and the message has not arrived, we can safely say that the node has failed. In such a model, we have a predictable waiting time.

Asynchronous model. In asynchronous models, we believe the waiting time is finite, but there is no such time delta after which we can guarantee that the node has failed. That is, the waiting time for a message from a node can be arbitrarily long. This is an important definition, and we will talk about it further.

The concept of consensus in distributed systems

Before formally defining the concept of consensus, let’s consider a situation where we need it: State Machine Replication.

We have a specific distributed log. We want it consistent and contain identical data on all distributed system nodes. When one of the nodes learns a new value that it will write to the log, its task is to propose this value to all other nodes so that the log is updated on all nodes and the system moves to a new consistent state. It is essential that the nodes agree among themselves: all nodes agree that the proposed new value is correct, all nodes accept this value, and only in this case can everyone write a new value to the log.

In other words, all nodes objected that they had more relevant information and that the proposed value needed to be corrected. A consensus in a distributed system is an agreement between the nodes on a single correct accepted value. Further, we will discuss algorithms that allow a distributed system to reach a guaranteed consensus.

More formally, we can define a consensus algorithm (or simply a consensus algorithm) as some function that takes a distributed system from state A to state B. Moreover, this state is accepted by all nodes, and all nodes can confirm it. As it turns out, this task is more complex than it seems at first glance.

Properties of the consensus algorithm

A consensus algorithm must have three properties for the system to continue existing and progress in transitioning from one state to another.

Agreement. ​All correctly operating nodes must accept the same value (in articles, this property is also referred to as a safety property). All currently functioning nodes (have not failed or lost contact with the others) must agree and accept some final standard value. It is essential to understand that the nodes in the distributed system we are considering want to agree. We are now talking about systems where something can fail (for example, a node can fail). Still, in this system, no nodes deliberately work against others (the Byzantine general's problem). Due to this property, the system remains consistent.

Integrity. If all correctly operating nodes offer the same value, then each correctly operating node must accept this value.

Termination. All correctly working nodes will eventually accept some value (liveness property), which allows the algorithm to progress in the system. Each separate correctly working node must sooner or later accept and confirm the final value: “For me, this value is true; I agree with the whole system.”

An example of the consensus algorithm in operation

The properties of the algorithm may still need to be completely clear. Therefore, we will illustrate with an example what stages the most straightforward consensus algorithm goes through in a system with a synchronous messaging model, in which all nodes function as expected, messages are not lost, and nothing breaks down.

It all starts with a proposal. Let’s say a client has connected to a node called “Node 1” and started a transaction by sending the node a new value, O. From now on, we’ll call “Node 1” the proposer. As a proposer, “Node 1” now has to notify the entire system that it has fresh data and sends messages to all other nodes: “Look! I’ve received the value “O” and want to write it down! Please confirm that you will also write “O” to your log.”

The next stage is voting for the proposed value. What is it for? Other nodes may have received more recent information and have data on the same transaction.

When node “Node 1” sends its proposal, the other nodes check their logs for data on this event. If there are no contradictions, the nodes announce: “Yes, I have no other data on this event. The value “O” is the most recent information we deserve.”

In any other case, the nodes can reply to “Node 1”: “Listen! I have more recent data on this transaction. Not “O” but something better.”

At the voting stage, the nodes decide: either they all accept the same value or one of them votes against it, indicating that he has more recent data.

If the voting round is successful and everyone is “for”, the system moves to a new stage — accepting the value (Accept). “Node 1” collects all the responses from other nodes and reports: “Everyone agreed with the value “O”! Now I officially declare that “O” is our new value, the same for everyone! Please write it down in your notebook, don’t forget. Write it down in your log!”

The other nodes send a confirmation (Accepted) that they have written the value “0” to themselves. Nothing new has arrived (a kind of two-phase commit). After this significant event, we consider the distributed transaction to be completed.

Thus, the consensus algorithm consists of four steps in a simple case: propose, vote, accept, and confirm acceptance.

If we cannot reach an agreement at some point, the algorithm is restarted, considering the information provided by the nodes that refused to confirm the proposed value.

Feel free to drop a “Hi” at Pharos Production, where we bring software to life! 👋✨

https://pharosproduction.com

“Join our exciting journey with Ludo — the reputation system of the Web3 world! 🌍✨”

https://ludo.com

--

--

Dmytro Nasyrov
Pharos Production

We build high-load software. Pharos Production founder and CTO.