Paxos as I understood it

4 min readMay 30, 2017

Paxos is a consensus algorithm that widely finds it’s use in lot of consensus systems. The google’s chubby lock service made paxos very famous. Paxos in general is considered as one of the tough algorithms to understand. However it’s also regarded as one of the beautifully crafted algorithms in the history of distributed systems.

The three phase commit protocol described previously handles the “fail stop model” of failures gracefully. However, with the other failure models like “crash recovery” model, where a node recovers after a crash,the 3pc protocol flounders. Let’s assume that coordinator crashes after sending the requests for precommit phase. Now a recovery coordinator takes over and it queries their status from the participant nodes and understands that previous coordinator has moved to the precommit phase. Now it shepherds the protocol to commit. All good so far.

However, what happens when the coordinator assumed to be dead in the first place kicks back alive. To understand recovering from a failure, we need to understand what could be failures in the first place. The various failures are as follows,

1. System crash

2. Delay in scheduling the process because of rise on computational load on the machine

3. Network disruptions leading to loss of messages.

The items 2 & 3 have the ability to auto correct and hence the node will be back alive. The recovered node will exhibhit the tendency to interfere with recovery coordinator. For example, the recovered node might not have received responses for the pre commit initiation it has sent before it’s failure and it might ask all the participants of a transaction to abort. However, the recovery coordinator might be guiding transaction to commit. It’s a possibility that some participants might receive abort message before commit message resulting in confusion.

The above example demonstrates that we need a more efficient protocol to handle crash recovery failures.

PAXOS PROTOCOL

This is the most important idea about paxos.

“A paxos system comprises of a set of processes that allows us to achieve consensus over the sequence of actions applied on a replicated state machine maintained by each of these processes.””

When reading a value from a paxos system, we ask from the multiple set of server processes and go by the majority value.

While writing a value the client can propose the value to any of the process in the paxos system. That’s the point about paxos.

No one process is sacred. Every process has a equal value in the system.

When Proposing a value to it’s peers, the proposing server process also proposes a sequence number that’s associated with the write.

The acceptors on receiving a proposal checks if this proposal has the highest sequence number it has ever seen. If so then it accepts the proposal gleefully. When an acceptor accepts the proposal then it makes a promise to the proposer that it will not accept any lower numbered proposal from other server processes. Once the proposer receives acceptance from N/2 + 1 processes then the proposer knows it has the quorum and asks the members to accept the change.

So paxos differs from 2PC in two different ways:

1. It adds ordering of proposal that can be used to choose the latest proposal in case of conflicts.

2. Paxos expects only quorum so that a single failure can’t hold the system to ransom.

The precise paxos protocol :

The below protocol defines one round of paxos. Assumes one round of paxos as one state change in a distributed state machine or one entry in a distributed log.

Prepare Phase

1. One among the servers becomes a proposer, chooses a value n as sequence number, sends message to acceptors soliciting a quorum.

2. The acceptors on receving the message checks whether the new proposal with sequence number n is the highest proposal they have seen so far. If yes, then they will respond with the highest number less than n that they have seen.

Acceptance Phase

1. The proposer on receiving acceptance from majority of acceptors now knows that it has a quorum.

2. The proposer now examines the values received from acceptors. If there are no values accepted then proposer can propose it’s value. Else the proser picks the higest number value among the responses and asks acceptors to commit that value.

Commit Phase

1. The acceptors on receiving the commit message checks if the value is same as any of the previously accepted proposols. If not it rejects the commit message. Else it commits the value

2. The proposer on receving success from majority of the acceptors deems the protocol as success and closes the round.

This appears simple. Now the failure cases :

1. What happens if majority of processes fail after accepting the proposal@ sequence number N, but before committing the propoal@N ?.

The system is in a weird state where only a minority of process have participated in the commit phase and hence the phase isn’t complete. Since Paxos follows a quorum on reading also the newly accepted value from the moniroty of process will not stand the quorum.

2. What if another proposer proposes a new value at a higher sequence number (say N + 1) after the acceptance of proposal at sequence number N but before it’s committed ?. This is really a significant question, because the promise returned by a process is not to accept any proposal less than N. So acceptance to proposal doesn’t signify that processes will not accept higher numbered proposals.

This is a dueling proposers scenario where two proposers are trying the gain the paxos round by keeping on proposing values at higher sequence numbers. This really affects the liveness of paxos protocol and it could be fixed only if one of the dueling proposer yields to another.