5 Things you need to know about Raft

hiimivantang
The Startup
Published in
4 min readJun 14, 2020

1. Raft is a consensus algorithm designed to be easy to understand.

A fundamental problem in running distributed systems is to ensure that they are reliable in the face of node failures. More often than not, CPU can overheat, HDD can be corrupted, the network can be unreliable, power blackouts can happen and the list goes on. It is crucial to assume that failures will happen and we need a way to ensure that distributed systems can endure failures. Consensus algorithms are used to ensure that distributed systems are fault-tolerant and also to ensure that nodes agree on values. For instance, a cluster of 5 servers will still be operational even if 2 of the servers fail and their states/values are consistent.

A consensus protocol called Paxos that was created by Leslie Lamport, a Turing Award winner, is reputed for being difficult to understand. Lamport’s descriptions are mostly about single-decree Paxos; he sketched possible approaches to multi-Paxos, but many details are missing. Even though there are notable applications that are multi-Paxos based such as Neo4j, Google Spanner, and Cassandra, they bear little resemblance to Paxos.

Raft is created to be easy to understand and also performant. It was designed for a large audience to understand the algorithm comfortably. It must also be possible to develop intuitions about the algorithm, so that system builders can make the extensions that are inevitable in real-world implementations. An experimental study performed on students at two universities has shown that Raft is empirically easier to understand as opposed to Paxos. Diego Ongaro has also shown that Raft’s performance for the leader election algorithm is on par with Paxos.

2. Raft uses a leader-based approach.

Raft consensus protocol is a leader-based approach as opposed to a peer-to-peer (P2P) approach like Paxos. Raft implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated log. You must be wondering what is a replicated log. Let me explain.

https://raft.github.io/raft.pdf (figure 1)

Consensus algorithms essentially are implementations of replicated state machines which are used to solve a variety of fault tolerance problems in distributed systems. Replicated state machines are typically implemented using a replicated log, a sequence of commands, that exists in each of the servers. The consensus module’s job is to ensure that the replicated log is consistent throughout the cluster. State machines are therefore deterministic, i.e. each computes the same state and same sequence of outputs.

3. Each node is in one of the three possible states at any given time.

https://raft.github.io/slides/uiuc2016.pdf (page 10)

In Raft, there are three possible states; leader, candidate, and follower.

The leader is responsible for log replication to the followers. It regularly informs the followers of its existence by sending a heartbeat message.

Each follower has a timeout (typically between 150 and 300 ms) in which it expects the heartbeat from the leader. The timeout is reset on receiving the heartbeat.

If no heartbeat is received the follower changes its status to candidate and starts a leader election.

4. Raft’s guarantees on safety properties.

The following safety properties are guaranteed by Raft:

  • Election safety: at most one leader can be elected in a given term.
  • Leader append-only: a leader can only append new entries to its logs (it can neither overwrite nor delete entries).
  • Log matching: if two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index.
  • Leader completeness: if a log entry is committed in a given term then it will be present in the logs of the leaders since this term
  • State machine safety: if a server has applied a particular log entry to its state machine, then no other server may apply a different command for the same log.

5. Kubernetes’ backing store is based on Raft

https://coreos.com/blog/kubernetes-cluster-federation.html

Kubernetes is backed by a distributed key-value store called etcd, which is used for storing and replicating the state of the cluster. Under the hood, etcd uses Raft to ensure consistency and fault tolerance. Without etcd, Kubernetes will be unable to coordinate tasks such as configuration, deployment, service discovery, load balancing, job scheduling, and health monitoring across all clusters, which can run on multiple machines in multiple locations.

Hopefully, you have gained a high-level understanding of Raft’s inner workings and can also appreciate the benefits that Raft brings. If you are interested, do check out this page for a great visual simulation of a Raft.

References:

--

--