Implementing Chubby, a Distributed Lock Service

By: Sherry Bai and Haochen Li

Sherry Bai
Princeton Systems Course
11 min readMay 23, 2019

--

1. Introduction

For our COS 518 project, we decided to implement a distributed lock service similar to Chubby, a lock service used by Google and described in this paper in OSDI ‘06. Chubby is used in Google to provide coarse-grained advisory locking, with an emphasis on availability rather than on performance. In this blog post, we describe the motivation for and design of Chubby as presented in the paper. We then discuss our distributed lock service design and implementation, highlighting some minor assumptions and simplifications we chose to make due to the sparsity of implementation details in the paper. Finally, we present an evaluation of our system.

2. Chubby Overview

Chubby is a lock service that provides coarse-grained advisory locking. Locking is coarse-grained rather than fine-grained, meaning that locks are intended to be held for periods of hours or days instead of for seconds. This means that Chubby emphasizes reliability and availability rather than performance. Locking is advisory, meaning that locks only interfere with other attempts to acquire a lock. Chubby also provides limited filesystem-like operations (e.g., reading from and writing to lock files), but holding a lock is not required to perform any of these operations.

The primary motivation for Chubby was to allow developers to easily tackle coarse-grained distributed coordination problems, such as leader election, without needing to directly deal with complicated consensus libraries. Developers often do not initially plan for availability, and locking may be easier to integrate into an existing system than a consensus protocol. Furthermore, most developers are more familiar with lock-based semantics than with consensus protocol semantics.

Chubby’s system architecture consists of two components:

  1. Servers: Servers are arranged into groups of five called cells. Servers within a cell use a consensus protocol to elect a leader, and clients interact with the cell by sending RPCs to the leader.
  2. Client library: The client library implements a filesystem-like API that allows clients to interact with the Chubby service. The client library implements each API function by sending RPCs to the cell leader. The library also handles session maintenance for the client, including sending KeepAlives to the cell leader and handling the leader failover (jeopardy) event.
Figure 1: Chubby system architecture.

Each Chubby client maintains a session with the Chubby cell, and all client requests to Chubby are performed as part of the session. Each session has an associated lease, and clients periodically send KeepAlives to the server in order to extend the lease. Sessions end when either the client or server explicitly terminates the session, when the server’s local lease timeout has elapsed, or when the client’s local lease timeout and failover grace period have both elapsed. When a session ends, all locks held by the client during the session are released. This allows other clients to acquire the locks and make forward progress in the event of client failure or a network partition.

Sessions do not necessarily end when a Chubby leader fails. If the client’s local lease timeout is triggered, the client suspects that a leader failure has occurred. The client then attempts to contact the new leader during the failover grace period. If the client succeeds, the session is saved, and the client sends information about the session to the new leader. If the failover grace period elapses before the client can contact a new leader, the session terminates, and the client loses all locks. This recovery process is called jeopardy. During jeopardy, the client blocks all API calls until contact is established with the new leader.

Figure 2: Chubby’s jeopardy process. (1) The client initially exchanges KeepAlives with the green server (leader). (2) The leader fails, and the client receives no KeepAlive responses. The client’s local lease period times out. (3) Jeopardy is triggered, and the client attempts to make contact with the new leader. (4) Contact is established, and the client exchanges KeepAlives with the new leader.

The Chubby paper also describes a number of other features provided by the service, including event subscription, ephemeral files, file data and metadata caching, and file mirroring. The paper also discusses more sophisticated mechanisms for ensuring lock validity, such as lock delays and lock sequencers. Due to time constraints, we do not discuss these features and leave implementation of these features for future work.

3. Implementation

The two main components of our system are the server and the client library. We also implement leader election using the client library in order to demonstrate that our system can be used in a practical setting.

3.1 Server

Chubby servers run a consensus protocol in order to elect a cell leader and to replicate persistent state. While the original Chubby service uses some form of Paxos for consensus, we choose to use the Hashicorp implementation of Raft. The Raft backend storage is BoltDB, a simple key-value store written in Go. We use the BoltDB Raft backend to connect the Raft implementation to the BoltDB database.

In Chubby, “files” and “locks” are equivalent: users can read content from and write content to lock files. Each lock is represented as a key-value pair in the key-value store, where the key is the filepath of the lock and the value is the content of the lock. Representing lock as a key-value pair makes sense because Chubby only allows for whole-file reads and writes.

Each lock can be held in either shared or exclusive mode: acquiring a lock in exclusive mode interferes with all other acquire attempts, but acquiring a lock in shared mode only interferes with attempts to acquire a lock in exclusive mode. The leader stores lock state (owners and mode of each lock) in a non-persistent in-memory structure in order to avoid the overhead of persisting lock information across nodes.

The leader also maintains session information for each client in a non-persistent in-memory structure. Only the content of the lock is persisted in the key-value store. This implies that when the current leader fails, all the information about lock states and sessions will be lost. To allow sessions to persist after leader failure, we require that each client locally maintain session and lock information; i.e., the names and modes of locks it holds. This information is sent to the new leader when the client first connects to the new leader after the jeopardy process.

For simplicity, in our Chubby implementation, we only allow each client to open at most one session with the server at the same time. (It is unclear from the paper whether the original Chubby service allows clients to maintain many sessions.) During a Chubby session, the client sends KeepAlive messages to the server. On receiving a KeepAlive message from the client, the server blocks the message until the server local lease timeout is almost over. The server then extends the lease timeout by a fixed lease timeout period (12 seconds), then sends this new timeout to the client in the KeepAlive response. The client then sends another KeepAlive to the server. This process means that KeepAlives are exchanged between client and server every 12 seconds.

If the server does not receive a KeepAlive from the client before the lease timeout elapses, the server assumes that the client has failed. The server then tears down the sessio3.1 Servern and releases all locks held by the client in this session to allow other clients to make forward progress.

3.2 Client Library

The client library implements the following API:

  • func InitSession(clientID api.ClientID)(*ClientSession, error): Establish a session between client with ID clientID and the Chubby server.
  • func (sess *ClientSession) OpenLock(filePath api.FilePath) error : Create a lock at path filePath if it does not exist. Otherwise, do nothing.
  • func (sess *ClientSession) DeleteLock(filePath api.FilePath) error : Delete the lock at path filePath.
  • func (sess *ClientSession) TryAcquireLock(filePath api.FilePath, mode api.LockMode)(bool, error) : Acquire lock located at filePath with mode mode.
  • func (sess *ClientSession) ReleaseLock(filePath api.FilePath) error: Release lock located at filePath.
  • func (sess *ClientSession) ReadContent(filePath api.FilePath)(string, error) : Read and return entire contents of the file located at filePath.
  • func (sess *ClientSession) WriteContent(filePath api.FilePath, content string)(bool, error): Write content content to file located at filePath.

We note that this is a simplified version of the API provided by the original Chubby service. The most notable difference between our API and the API described in the original paper is our use of file paths rather than file handles as API function parameters. We choose not to implement file handles for the purpose of simplicity, but avoiding file handles does prevent us from implementing more advanced filesystem features offered by the original Chubby service, such as certain forms of access control.

The client implements the API functions by making RPC calls to the server. After the client initiates a session with the server, a separate goroutine sends KeepAlive messages to the server. If the client receives a response from the server, the client extends its local lease timeout according to the KeepAlive response, then immediately sends another KeepAlive. KeepAlive requests are typically blocked at the server for the length of the server’s lease timeout period (12 seconds), so KeepAlives are typically exchanged every 12 seconds.

If the client does not hear a response from the server within its lease timeout, the client assumes that the current leader has failed and enters into jeopardy. During jeopardy state, the client contacts all the servers attempting to establish connection with the new leader. If the client successfully connects with the new leader before the end of the failover grace period (45 seconds), the client eagerly sends its local information about the locks it holds to the new leader. If the failover grace period times out and the client still has not heard from the leader, the client tears down the session with the server and discards all the information about the current session. All locks are assumed to be lost.

3.3 Application: Leader Election

To demonstrate the use of our system in a practical setting, we implement leader election using the Chubby client library. During leader election, all clients call TryAcquire in exclusive mode to try to acquire the lock. The client that successfully acquires the lock becomes the leader and writes its name in the lock file. The other clients read from the lock file to find the identity of the new leader. The other clients periodically send new TryAcquire requests to try to acquire the lock in the event of leader failure. If the current leader fails, some other client will be able to acquire the lock and will become the new leader.

4. Evaluation

4.1 Evaluation Setup

We used Docker Compose to bring up five Docker containers, each running a Chubby server process and communicating on a bridge network. We then used the chaos testing tool Pumba to introduce realistic network delays of around 50 milliseconds for traffic on the Docker bridge network. This allowed us to emulate the effects of geographic distance between servers within a cell, like in Google’s “global” Chubby cell, as well as geographic distance between clients and servers.

4.2 Leader Failure

We created a client that repeatedly acquires and releases a single lock, while simultaneously measuring the number of such acquire and release operations that can be completed per second. In Test 1, we use Pumba to pause the Chubby leader for short intervals (2 seconds). In Test 2, we pause the leader for long intervals (30 seconds). Figures 3 and 4 show plots for number of operations per second over time in Test 1 and Test 2, respectively.

In Test 1 (Figure 3), the cell leader is paused twice, and each pause lasts only 2 seconds. Because the leader is offline for only brief periods, Raft leader election is not triggered, and jeopardy does not occur. We observe that the number of operations per second goes to 0 very briefly (~2 seconds), then recovers.

Figure 3: Results for Test 1.

In Test 2 (Figure 4), the cell leader is again paused twice, but each pause lasts 30 seconds. These long pauses trigger both jeopardy and Raft leader election. After the local lease time of the client expires, the client enters jeopardy and finds the new Raft leader. Although the leader is paused for 30 seconds, the system can actually recover from the leader failure in less than 30 seconds because the recovery time is approximately the maximum value between the time it takes Raft to elect a new leader and the time it takes for client’s lease to expire. Raft is usually able to elect a leader in less than the lease timeout period (12 seconds), so recovery time in this test is approximately 12 seconds. This test shows that for non-Byzantine server failures, session recovery time is capped at around 12 seconds.

Figure 4: Results for Test 2.

4.3 Client Failure

For this test, we created two clients. Client 1 opens and acquires a lock in exclusive mode. Client 2 sleeps for 5 seconds, then tries acquire the same lock in exclusive mode. We use Pumba to kill Client 1 and measured how long it took for Client 2 to acquire the lock. We expect the time it takes Client 2 to acquire the lock to be approximately the lease timeout period used by the server. To test this hypothesis, we vary the server’s session lease length and plot the time it takes for Client 2 to acquire the lock against the server’s session lease length in Figure 5. As expected, the time it takes for Client 2 to acquire the lock is roughly linearly proportional to the session’s lease length, with a slope of approximately 1.

Figure 5: Results of client failure test.

While longer lease timeout periods cause the time required for the server to detect client failure to increase, there are at least two reasons for why a longer lease timeout may be desirable. First, if the lease timeout is too short, the server may tear down a session in response to an ephemeral connectivity problem with the client, leaving the system non-resilient to minor network issues. Second, short lease timeouts result in more frequent KeepAlives from the client, resulting in greater load on the network and at the server. While this is not necessarily a problem when number of clients is small and lease timeout periods are on the order of seconds, this may become problematic for large numbers of clients. For example, the Chubby paper describes a “global” Chubby cell that is likely to be heavily loaded by a variety of applications across Google.

5. Conclusion

In this project, we re-implemented the core functionalities of Chubby, Google’s distributed lock service. We were able to produce a viable system and to replicate some key properties of Chubby. We were also able to demonstrate practical use of our system through an example of leader election. Finally, through our evaluation, we were able to demonstrate that our system is resilient to both server and client failures.

Through this project, we learned some valuable lessons about building a distributed system. This project was very rewarding and a really nice way to wrap up what we learned throughout the semester!

Source code for our implementation can be found here: https://github.com/sherrybai/chubby

--

--