Re-implementing Bayou in Golang

Published in

Princeton Systems Course

10 min readMay 13, 2017

By: Lance Goodridge and David Gilhooley

The purpose of this assignment was to reproduce academic work and try our hands at a implementing a distributed system. We decided to re-implement Bayou in golang. Our project can be found on Github. Our results and conclusion can be found at the end of this post.

Bayou was released by Xerox in 1995. The goal of Bayou is to have a highly available system that will work in the face of arbitrary network partitions. These two goals require Bayou to use an eventual consistency model. It is the first practical system to use eventual consistency. Bayou decided to manage the large conflicts that result from eventual consistency systems at the application level, which was a novel decision at the time. Since Bayou, other systems have followed the eventual consistency model, notably CouchDB and Google Drive.

The Bayou System

Bayou, as a system, is important for the for the following three reasons:

Eventual Consistency: Clients are not guaranteed to see updates after they have been made. Different clients can be put into different states that break any model of linearizability.
Application-Defined Conflict Resolution: The application is responsible for providing a given hasConflict, merge and undo function. These functions are stored with each server in order to fix problems that have occurred due to the Eventual Consistency model.
High Availability: Each device should theoretically be running its own client-server pair. Servers are allowed to go off the network at any time without affecting the system’s ability to do meaningful work. A device can even make changes while offline that will eventually be reflected across the devices upon re-entering the network.

Availability and Partition Tolerance

A visualization of the CAP theorem. Bayou exists in the AP region.

The CAP theorem states that any distributed system has to choose two of the following three: Consistency, Availability, and Partition Tolerance. Bayou has chosen Availability: users can make reads and writes to their immediate machine without waiting for it to propagate through the network. Bayou has also chosen Partition Tolerance: the Bayou connectivity model allows meaningful work to be made even if there are only ever two machines connected at a given time. However, as a result of these choices, Bayou can not have a stronger consistency model than eventual consistency. When a user makes an update it is left unconfirmed, and earlier or later updates can be accepted by the system without respect to temporal or logical consistency.

Bayou Implementation

In Bayou, a device should run a client and a server, but it is not necessary. Below are the responsibilities for a single server:

Replicate all information seen from the client or received from its peers.
Propagate information with peers using anti-entropy sessions.
Maintain a list of committed and tentative writes.

Bayou’s Device, Client, Server model. Server’s connect to each other. Bayou will work under arbitrary connections as long as a server eventually connects to the network.

Each server has various methods of storing data. Bayou keeps two databases available: one database for a committed view and one for a full view. These databases store all of the client data that has been created by the committed and full commands the server has received. Each server also keeps a Write Log which contains all of the writes seen by that server. This log is what enables a server to compare its state with another during anti-entropy. The Undo Log allows any of the writes from the Write Log to be rolled back. This is helpful because committed Writes come before tentative writes in the log, and the tentative writes will all need to be rolled back when a new committed write is received.

Bayou’s Logging System with Committed, Tentative writes.

The GoLang Implementation

In our project we were successfully able to recreate the server-client relationship of Bayou. We included application defined writes with per-write conflict, merge, and undo operations. For the client we built a small calendar app where users claimed rooms for a given time. We also implemented the tentative and committed data structures, the undo and write logs, and the anti-entropy process.

We used Golang to write both the client and the server, and Go’s RPC protocol to call functions over HTTP. The Bayou Server used a SQL-Lite database instead of Bayou’s original hand-made tuple store. Instead of passing user-defined functions over HTTP, which would not work in a compile-time language like Go, we passed SQL strings to be executed on the server. We found the expressiveness of SQL allowed us to write Query, Check, and Merge functions.

We are using Golang’s Gob package to keep the Write Log on persistent storage. The Gob package allows us to easily compress the in-memory Write Log data structure onto disk whenever it has changed.

The entire project can be found on Github. It is made to run inside Vagrant and we have included a Vagrant file which installs all of the necessary packages. Multiple devices are emulated inside a single Vagrant machine, and they communicate to each other through the localhost HTTP ports and RPC. The network topology is created at the start of a test and is maintained throughout the test. We simulate network partitions by stopping an RPC from being called or arbitrarily introducing latency.

Commit/Tentative Implementation

Bayou is able to perform meaningful offline work because of its distinction between tentative and committed entries. Committed entries are guaranteed to be globally ordered and cannot be removed from the network. Tentative entries have no such guarantees, but they can be made and shared offline, and can even propagate through the network so that everyone agrees on them.

Bayou uses a single server to act as a “commit server”. The commit server is responsible for moving entries from tentative to committed. Because it is a single server, it is easy to guarantee that committed entries are globally ordered. If the committed server is offline, the other servers will eventually come to a consensus on the ordering of tentative entries. However, tentative entries that conflict may cause some entries to be dropped in Anti-Entropy, as we will see in the next section.

Below is the data structure for a single log entry. An entry contains all of the information to check conflicts, apply the entry, undo the entry, or perform a conflict merge. This information is application specific. The VectorClock will be discussed in another section.

Log Entries are used for Both Committed and Tentative Entries

Below are the log entries as stored in a Bayou Server object:

The Bayou Server contains multiple log data structures for entries.

Next we will discuss how entries are shared between multiple servers.

Anti-Entropy Implementation

Servers will send anti-entropy requests to their known peers once a second. This setting can be configured in the server.go file. A server sends requests to all peers in a round robin fashion, waiting for each update to finish first. The Anti-Entropy function relies on a mutex lock, so a server will not accept another update while the process is happening. Servers store the last known tentative and committed write seen by each of their peers. This allows a server to send their peer only the log entries that they might have not seen. The LogEntry data structure contains all the necessary information to check for conflicts, apply the entry, and undo it if necessary. Entries are globally marked by a timestamp and a write ID, so they are not duplicated even if a server is given multiple copies.

It is worth mentioning that the original Bayou paper did not discuss the anti-entropy techniques, so we were left to implement it ourselves.

Below is the data structure for an anti-entropy reply:

The data structure for an anti-entropy gossip request.

When a server receives an anti-entropy message it performs the following actions if there are new committed entries:

The receiving server rolls back all of the tentative entries, using the user specified undo function.
The receiving server applies all of the committed entries that have not been seen before.
The receiving server applies all of its tentative entries that were recently rolled back.

This process is done to ensure that committed entries are globally ordered across all devices. Only committed entries require global ordering, so the tentative writes use the actions below:

The receiving server compares its log entries with the Anti-Entropy entries. One of the logs is marked as the “most recent”, based on the timestamp of the latest tentative write.
The most recent log is applied first. If this is the Anti-Entropy entries, then the receiving server rolls back its entries, applies the given entries, and then re-applies its earlier entries.
If the receiving server has a more recent log, it will apply the given entries on top of its log. It will then send an Anti-Entropy message back to the sending server with the updated information, and that server will then update its log using the rollback and apply method in #2.
Clients attached to the servers will receive update messages if their tentative entries have been kicked from the log by the re-apply method. It is then up to the client to re-register any entries that they would like.

Vector Clock Implementation

Using Vector Clocks helps ensure a global ordering of events for a distributed system. Vector clocks are implemented as an array of logical clocks, where each clock tracks the latest update received by the corresponding server.

This diagram shows how vector clocks are incremented across events in a distributed system.

In our project, the vector clock is implemented as an int array of size N, where N is the number of peers in the system. Each integer in the array represents a logical clock for the corresponding peer. The logical clock starts at zero and monotonically increases when it receives an update originating from its corresponding server. When a server receives a write from a client, it increments its own logical clock by one. This algorithm is illustrated in the image above.

The current vector clock timestamp is added to every LogEntry the server sends or receives to ensure consistent ordering of entries between peers. A timestamp is considered to be “before” another timestamp if and only if each logical clock is less than or equal to the other timestamp’s clock for each peer, and at least one of those clocks is strictly less than. Since the logical clocks are monotonically increasing, and a logical clock is incremented for each Write a server receives, this guarantees that the clocks for any two Log Entries will always have one that is strictly “before” another.

Results

Reproducing the Original Paper

The original Bayou paper barely touched on the performance of the system. The only performance metrics included showed how a single server scaled with tentative writes. Below we have represented this data in a graph:

Now we have reproduced this graph for our implementation:

It is clear that both implementations have linear time operations for undo and redo, which is important for when servers share entries through Anti-Entropy. Our implementation was run on a 2015 Macbook, which has a much faster processing speed than the SPARC/20 used in the original ’95 paper.

New Performance Testing

We decided to test the speed of tentative and committed writes in our application. Below are our results. We found that it is the same cost to perform a tentative write as a non-primary server as it would to perform a committed write as a primary server. Both exhibit quadratic growth, which is a result of the application that we are using. The dependency check for an operation is linear on the number of writes because the operation needs to check each of the previous writes to see if the room is available. Had we included a constant time conflict function, the write speed would be linear.

Network Configuration, Entropy Speeds, and Update Times

Our Anti-Entropy algorithm makes it very simple to calculate how long it takes information to propagate across the network. Because each node propagates information incrementally, the commit time for a server is linearly proportional to its minimum distance to the commit server.

Dark Blue is the commit server. Light blue are normal servers. This image shows that commit time is a function of distance from the commit server. This is assuming an Anti-Entropy setting of 1 Hz.

Increasing the Anti-Entropy rate decreases the propagation time across the network and decreases the number of entries for each message. However, as messages get smaller and more frequent, this increases the network bandwidth used by Bayou, and the overhead of a single message is larger in proportion to the data being passed back and forth.

This shows that increasing the Anti-Entropy speed decreases the time for a single anti-entropy update. This makes sense, because there are less Entries to check and possibly rollback and redo.

This shows that as updates become more prevalent, the Anti-Entropy Requests use more bandwidth. This graph is taken with an Anti-Entropy Update speed of 5 Hz.

Conclusion

Thanks for making it to the end of this post! Here is a summary of what we accomplished.

We implemented a Bayou server with the following features

Distinct server/client functionality
Application specific updates, conflicts, and merges
Anti-Entropy system that works with network partitions, keeps a global ordering of committed entries, and keeps tentative entries eventually consistent

We designed a the following features:

A testing framework that verifies the above functionality.
An Anti-Entropy system based on the sparse description in the paper.
A calendar application that uses well-defined conflict, undo, and merge operations

We tested the following:

Bayou’s original metrics of linear undo and redo functionality for Anti-Entropy.
The speed of Reads and Writes on a single server as a quadratic function.
The propagation speed across different network topologies.
Anti-Entropy update speeds affect on traffic congestion.

Thank you for reading the post! If you have any more questions on our implementation, please be sure to check out our Github repo.