Two Phase Commit With Apportioned Queries (2PAQ)

Charlie Murphy
Princeton Systems Course
6 min readMay 16, 2017

By: Charlie Murphy and Divyarthi Mohan

Background:

In the words of Tupac Shakur, “you grew up, we grew up B.C. — before [CRAQ]. That’s just saying it all.” This project is inspired by the work of CRAQ that extends chain replication with apportioned queries [Terrace and Freedman, 2009]. Chain replication is a protocol for replicating data across multiple nodes to achieve strong consistency amongst all participating nodes [Van Renesse and Schneider, 2004]. Chain replication — as the name suggests — forms a chain topology of all nodes. The head of the chain handles all write requests and the tail of the chain handles all read requests. This achieves strong consistency as all writes propagate from the head to tail, meaning that any operation visible at the tail must have already reached all other nodes. CRAQ adds on top of chain replication apportioned queries — every node may answer read requests — , while maintaining strong consistency.

To maintain consistency, the authors of CRAQ propose that each write operation stores a new version for the keys it is modifying. When a write operation is committed, all previously committed versions for the key may be safely discarded. In CRAQ, a key is said to be clean if it is associated with only one version, otherwise the key is dirty as at least one write operation to the key is in progress. When receiving a read request for a key, any node can immediately respond if the key is clean; however, for dirty keys the node must fall back to asking the tail of the chain for the definitive version to return. Thus, apportioned queries work best for read mostly workloads.

System Design:

This project takes the idea of apportioned queries, and applies it to two phase commit (2PC), to create Two Phase commit with Apportion Queries (2PAQ). 2PAQ has a lot in common with CRAQ. We again assume only one node may receive write requests (the 2PC leader) — any other node will simply redirect the request to the leader — and any node may respond to a read request. However, in 2PC there is no single tail, the system has a leader-followers structure (or more generally a tree structure). Therefore, the definitive source of what is committed is the leader.

Two phase commit consists of two phases: staging and committing. When the leader receives a write request, it stages the write operation — performing all necessary actions to commit the write without committing the new value — and sends a stage message to each of its followers. The followers then stages the write operation and sends the leader an acknowledgement or abort message (if the follower was unable to stage the operation). When the leader has been acknowledged by all followers, the leader commits the new value and sends a commit message to all followers who then commit the write themselves. If the leader receives an abort messages, it instead sends an abort message and the write operation is unstaged and discarded. Figure 1 shows the lifecycle of a commit (non-aborting).

Figure 1. Stage, Acknowledge, Commit Phases

Like CRAQ, 2PAQ not only stores the committed values but also each version of a key corresponding to a new write operation. That is, when a node stages a write operation it adds a versions to the list of versions for the given key. Then only when a node commits the write operation does it discard the older versions. When a follower node receives a read request for a dirty key, it must determine the correct version to return. To do so, it queries the leader for the most recent committed version of the key and returns the corresponding value. As the frequency of writes to the system goes down, the more likely it is a node will be able to immediately answer read queries rather than contact the leader. Thus, like CRAQ, 2PAQ is suited for read-mostly workloads. The primary difference between CRAQ and 2PAQ is simply the protocol used for maintaining strong consistency across the nodes in the network.

To ensure that a key is dirty forever — due to a failing node — it is important that the leader be able to detect node failures. When a node fails during an in progress write operation, it is unable to send an acknowledgement or abort message. Thus, holding up everyone else to commit the write operation. To discover failed nodes, the leader periodically sends a heartbeat to each of its followers. If the follower has not responded back before the next iteration, it is considered to have failed — it has either truly failed or is performing too slow and is rejected from the system. The leader then presumes acknowledgments on behalf of the failed node and commits if necessary. If a follower was wrongly assumed to be dead, it will notice that it stopped receiving a heartbeat and rejoins the system, updating its values as necessary.

Experiments:

We evaluate our implementation of 2PAQ against our implementation of 2PC, both running simple key value stores. Ultimately, we are interested in how well 2PAQ performs in terms of read throughput versus vanilla 2PC. We also examine the effects of read throughput and latency as the percentage of writes in a workload increased. All evaluations are performed on AWS free-tier (t2.micro) EC2 instances.

In order to try to find the maximum read throughput, we ran each system with 3 nodes at 1 percent write workload with a varying number of clients connected to the system and measured the aggregate read throughput of all clients. Figure 2 shows the results of this experiment. While we were unable to find the full saturation point of either system, 2PAQ is able to more than double the number of read responses even at 1% write workloads. The jumps in the 2PC line performance correspond to a new instance connecting to the leader node which had not yet reached its maximum saturation. In 2PAQ, we see a steady increase in read throughput with each new client, regardless of what node it connected to.

We also performed experiments to determine the throughput and latency of read operations as the percent of writes in the workload changed. Figure 3 shows, ss expected, 2PAQ outperforms 2PC for read mostly workloads (less than 1% write operations) with performance degrading slightly as percentage of write operations increase. 2PC is run with 3 nodes and 3 clients while 2PAQ-i runs i servers and i clients connected to the node. Figure 4 shows the corresponding latency for the same experiment but shows the latency of the two systems; both systems are running 3 nodes and with 3 clients connected. As more nodes join the system, the latency for committing write operations and responding to read queries increases.

Figure 2. Read Throughput vs Number of Clients (for 3 nodes and 1% write workload)

Figure 3. Read Throughput vs % write of workload

Figure 4. Read Latency vs % write of workload

Conclusion:

From the above experiments, we are able to conclude that apportioned queries greatly increases read throughput for read mostly workloads. This is because more nodes (not just the leader) are able to respond to queries directly and the leader is free to spend more time to handle coordinating all nodes in the group.

References:

[1] Jeff Terrace and Michael J. Freedman. “Object Storage on CRAQ: High Throughput Chain Replication for Read Mostly Workloads.” USENIX Annual Technical Conference. 2009.

[2] Robert Van Renesse and Fred B. Schneider. “Chain Replication for Supporting High Throughput and Availability.” Operating Systems Design and Implementation (OSDI). Vol. 4. 2004.

Note: Code and experimental data available on GitHub at: https://github.com/tm507211/2PAQ

--

--