Paxos made simple. For real
Paxos is a consensus protocol proposed by Leslie Lamport in 1989. It is notorious for being hard to understand for most computer scientists. This post is a reflection of my understanding of the protocol. I hope it presents a new way of thinking about how the protocol works.
In Paxos jargon, there are three roles in a distributed system (a bunch of computers that act as one) that is trying to agree on a single value:
What are they trying to do?
Trying to agree on a value. Example: Imagine the computer to be your bank’s servers. Every single server should reflect the same balance of your account. The problem is since these servers are different machines and could be geographically distant from each other, they need to communicate over a network connection which may or may not be reliable. So how do they all show the same value? Answer: They follow what is called a ‘consensus protocol’ to come to an agreement. Paxos is one such protocol. At a high level, these are the steps followed:
- A server (called a proposer) proposes what it thinks is the correct balance to the rest of the servers (called acceptors).
- Rest of the servers look at this proposal and either accept or reject it.
- If a majority of the servers accept, then the balance is chosen to be the actual balance and is broadcast to all the servers (called learners).
Sounds simple. What’s the problem then? The problem is many servers could be proposing what they think is the correct balance at the same time. If the proposed balances by different servers are different, then the system as a whole cannot come to a consensus of what the actual balance is and your bank can’t tell you how much money you have.
To prevent this situation, the Paxos protocol has well defined rules that all the servers should follow while proposing and accepting balances.
- Each proposal should be of the form <n, v> where n is a strictly increasing proposal number and v is the balance. Think of each proposal as a ‘retry’ and the proposal number represents the number of retries so far.
- A proposer sends a proposal of this form to other servers (acceptors). This is called a prepare request.
- When an acceptor receives a prepare request, it will respond with a promise that it will not accept any other requests with a lower proposal number. For example, if the proposal is <5, 100> then the acceptor will not accept any requests of the form <i, v> where i < 5. This also means that if the acceptor has already responded to a proposal of the form <i, v> where i > 5, then <5, 100> will be rejected or it may choose not to respond at all and simply ignore the request.
- If the acceptor has already accepted proposals in the past, then it will respond to the current proposal <5, 100> with <i, v> in addition to the promise above. Here <i, v> is the proposal where i is the greatest number less than 5. For example, if it has accepted <1, 20>, <2, 30> and <4, 40> in the past, it will respond with <4, 40>.
- When the proposer receives these responses from a majority of acceptors, then it will send an accept request to each of these acceptors with proposal number 5. The balance in the proposal could be either a) a new balance or b) the balance in the highest numbered proposal of all the responses. So if point 3 above applies, then the proposal will be <5, 100> or if point 4 applies, the proposal will be <5, 40>
- When an acceptor receives this accept request, it will accept it only if it has not responded to another prepare request in the mean time with a higher proposal number.
- If acceptor accepts the request, it then broadcasts it to all the other servers (learners) that it has accepted it. Since a majority of servers do this, this accepted value will be the agreed value of the whole system.
That’s it. I will discuss safety and liveness guarantees, fault tolerance, practical implementation and byzantine faults in a future post. More info can be found in this paper: https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
Other resources: http://harry.me/blog/2014/12/27/neat-algorithms-paxos/