Distributing Lamport’s bakery with Automerge, and a touch of TLA+

Gregory Terzian
10 min readJul 28, 2023

--

Leslie Lamport discovered the the Bakery Algorithm in 1974 as an alternative solution to Edsger W. Dijkstra’s mutual exclusion problem. Lamport describes the algorithm’s significance as implementing “mutual exclusion without relying on any lower-level mutual exclusion.”

In a subsequent paper last expanded in 2022, Lamport goes back to the original algorithm, and through a sequence of transformations shows how it leads to an algorithm for a distributed state machine(that isn’t fault-tolerant). An intermediary step in the paper presents a distributed version of the algorithm, and today we’ll look into implementing it in Rust using Automerge. In honor to the algorithm’s discoverer, and with the added bonus of making our work easier, we will also make an effort to specify our implementation using TLA+.

Automerge and it’s Repo

Automerge is a Conflict-Free Replicated Data Type (CRDT): a data structure that can be shared between peers, and that allows for concurrent changes to be merged automatically, without the need for a central server. It comes with, among others, a Rust implementation.

An Automerge document is able to apply incoming sync messages and generate outgoing ones. But, one still needs to orchestrate the use of it in a local application: the document needs to be read from and written to in at least one place, another place might want to watch for changes to it, you will definitely need to integrate it with a network layer with regards to said sync messages, and probably also with a persistence layer. Just share the document between those various places, perhaps using a mutex, and you’re done, right? It turns out not to be that simple.

Enter Automerge-repo: an integration layer taking care of these mundane matters for you. It provides a “repo” abstraction, to which you can hook up network connections as well as your favorite persistence layer. You can then use the repo to either create new documents, or request existing ones from peers. In both cases a “doc handle” abstraction is returned, which you can use to work with the document, and watch for changes to it. Although it is still in a kind of pre-alpha status, many edge cases have been dealt with, and using Automerge in your favorite Rust application has already become much easier.

Before implementing the distributed bakery algorithm using Automerge-repo, we first must establish what this exactly means.

Queuing for a croissant revisited

As distributed systems lore has it, Lamport’s impetus for the discovery of the algorithm was the queueing system of a local bakery he patronized. Instead of using a machine to get a number(which would have acted as a mutex device), customers would queue the old school way: ask everyone what their number is, and pick one higher than everyone else. When you’ve got the lowest number, you’re up next. In the confusion of peak time — customers would be shouting numbers left and right — customers would occasionally pick the same number: courtesy then dictated breaking the tie using names. And so the bakery operated for years; peacefully, and without customers arguing about being served at the same time.

A computer version could be expressed in the following way:

  1. Read all numbers.
  2. Set your number to any number higher than the highest number read.
  3. Wait for your number to be the lowest non-negative one, and when it is: enter the critical section.
  4. Set your number back to zero when exiting the critical section.

The above version is incorrect due to concurrency: it is possible for a process to read all numbers as for example zero, and then sleep. Other processes would then continue making progress with the algorithm, reaching a point where say, the lowest number is 2. The other process could then wake-up, and set it’s number to 1(higher than all the zero’s it had read before going to sleep): we would then end-up with two processes in the critical section.

The above problem is remedied by adding a choosing flag for each process:

  1. Set your choosing flag to true
  2. Read all numbers.
  3. Set your number to any number higher than the highest number read.
  4. Set your choosing flag to false
  5. Wait for your number to be the lowest one(this means re-reading them), and for the choosing flag of all processes to be false. When both conditions are met: enter the critical section.
  6. Set your number back to zero when exiting the critical section.

With this version, the scenario sketched above is not problematic anymore: a process wanting to enter its critical section would have to wait for the others to set their choosing flag to false(and then it would take the lower number into account).

The algorithm still allows for processes choosing the same number, hence ties are broken by process names.

Also note that the number and choosing variables of each process are read by all others, but only written by the process owning them: a concept known as “safe registers”. It provides a natural way to think about distributed processes communicating via message passing: each process owns it’s local data, but the state of that data can be communicated(read) by other processes through message-passing.

And thus, the distributed version of the algorithm does not require a choosing flag. Instead, it relies on FIFO links for message passing: used by processes to send their numbers to others, and to acknowledge receipt of numbers received from others.

The algorithm is transformed in the following two parts: At each process, there are two tasks running:

Task 1

  1. Pick a number that is higher then all the numbers you have received.
  2. Send a message containing your number to all other peers.
  3. Wait until you have received acknowledgement messages back from all peers, and until your number is lower than all other numbers you have received in the meantime. When both conditions are met: enter the critical section.
  4. Set your number back to zero when exiting the critical section, and communicate this to others.

Task 2

For each message received:

  1. If it is a positive number, update your local data with that number, and send an acknowledgement back.
  2. If it is an acknowledgment, note that the peer has acknowledged your number.
  3. If it is a zero, note that in your local data(no acknowledgement is required).

How does this work? An informal argument is that, assuming FIFO links between peers, by the time a peer receives an acknowledgement back, either one of two things has, or will, happen:

  1. The peer receiving the acknowledgement will first have received an number from the same peer sending the acknowledgement, and that number will be taken into account at step 3 of Task 1.
  2. The peer sending the acknowledgement will choose a number that is higher than the one acknowledged.

Once again, ties are possible and broken by names.

Here is the corresponding TLA+ spec:

As you can see, the theorem asserts that the specification respects the MutexInvariant, which asserts that only one peer can be in its critical section at a time. The BakeryInvariant is my attempt at coming-up with a proper inductive invariant explaining why the mutex invariant is respected(the formal version of the last paragraph above). I can’t completely vouch for it tough, because even though TLC can check that it holds as an invariant, I am not sure it qualifies as inductive or that it implies the other invariant.

By the way, as I am going through the excellent materials of this 22 year old course on concurrency, I become increasingly aware of the toy nature of my TLA+ specifications. But, I still find them useful. It took me a while to understand the bakery algorithm, and it was going through several iterations of a spec, and looking at the states in which it would fail, that finally gave me what, I hope, is an intuitive understanding of it.

Now, the time has come to implement the bakery.

The Rust Implementation

We will now implement the distributed bakery algorithm, using the following tools: Rust as the programming language, Axum to provide an HTTP API for interaction with individual nodes, Tokio as a front-end runtime and as a means to communicate between nodes, Automerge wrapped by Automerge-repo, and, to work directly with Rust values, the excellent Autosurgeon. The full example can be found here.

The system consists of a network of peers that communicate over TCP — satisfying our need for FIFO links. Each peer also provides an HTTP API for clients, and acts as a client to the other peers by continuously requesting a new number, asserting that the number is monotonically increasing. The source of this number is shared between peers — as part of the single Automerge document used — and written to by all peers. This would normally not be safe, but it is made safe by only allowing a peer to write to this number when it is inside its critical section. The critical section is made safe through the bakery algorithm.

Note that the algorithm, while guaranteeing mutual exclusion, is not fault-tolerant: progress requires all peers to be up and running.

On start-up, each peer starts a new Spanreed repo. For the storage layer integration, we pass an struct that, while implementing the necessary trait, does nothing: the bakery needs no persistence.

https://github.com/automerge/spanreed/blob/3fcf21d7f23840e3efa829028a146117081c094a/examples/distributed_bakery.rs#L346

Each peer then starts a TCP server, and also tries to create an outgoing TCP connection to all other peers. Once a connection is established, it is passed to the repo so that the sync protocol can be run with the connected peer. Below is the code for the server, which passes incoming connection to the repo. The interesting part is the connect_tokio_io call.

https://github.com/automerge/spanreed/blob/3fcf21d7f23840e3efa829028a146117081c094a/examples/distributed_bakery.rs#L350

The first problem to solve is allowing peers to find the shared document. One peer is the designated bootstrap peer, and it creates a new document, using the new_document API. It then makes a first edit to the document for the initial state of the bakery, using the with_doc_mut API of the doc handle, as well as the reconcile API of Autosurgeon.

https://github.com/automerge/spanreed/blob/3fcf21d7f23840e3efa829028a146117081c094a/examples/distributed_bakery.rs#L350

The state of the bakery consists of the following:

  • A map of customers, where a customer has a number as well as a map of views of numbers of other customers.
  • An output, which is not part of the algorithm but rather used to produce a monotonically increasing number for clients.

The other peers then get the id of the document via an HTTP API call, and get the actual document using the request_document API of Spanreed, which will first attempt to load the document from storage, and fallback to requesting it from connected peers, resolving to a fully synced document ready for editing.

https://github.com/automerge/spanreed/blob/3fcf21d7f23840e3efa829028a146117081c094a/examples/distributed_bakery.rs#L411

Each peer having a local document, two background tasks are spawned:

  1. One that continuously requests new increments from the other peers, via an HTTP call to an endpoint named increment.
  2. One that continuously acknowledges changes made by other peers.

The second tasks is more interesting, as it uses the document and shows another API: the changed method of a doc handle.

https://github.com/automerge/spanreed/blob/3fcf21d7f23840e3efa829028a146117081c094a/examples/distributed_bakery.rs#L152

I will not copy the entire code for the task here, instead highlighting the two salient features of it:

  1. changed resolves once the document has changed compared to the last time that the given handle did anything with the document(handles can be cloned, but each clone keeps track of this individually). Therefore, before entering the loop, we do a first acknowledgement by copying all the numbers of customers into our view of it. By the time changed resolves, the document is guaranteed to have changed compared to the state of the document when the transaction in the with_doc_mut call is committed.
  2. Since we are communicating with peers using an Automerge document, we do not need explicit acknowledgement messages as in the TLA+ specification. By updating our view of others, and communicating that update, others can treat the update, once received on their end, as an acknowledgement of their own changes.

As noted above, each peer runs a task that will request a new output from other peers, and asserts that the output is monotonically increasing. The increment HTTP handler is where the bakery algorithm is run, and where, once the peer has entered its critical section, the output of the bakery is incremented and returned to the client.

https://github.com/automerge/spanreed/blob/3fcf21d7f23840e3efa829028a146117081c094a/examples/distributed_bakery.rs#L22

Clean shutdown is always harder than you think

Wondering what the semaphore in the above code snippet is for? It turns out that another complicated part of the example was shutting it down cleanly. The stop API of the repo will close all the network sinks, as well as save the documents via the storage API provided(in our case it is a no-op). Performing these actions safely requires guaranteeing that no tasks can still write to the document using the with_doc_mut API of the doc handles, which was achieved with the below code, highlighting the need for the semaphore in the above code snippet(by the way I think it probably also make sense to not run multiple algorithms concurrently on one peer).

https://github.com/automerge/spanreed/blob/3fcf21d7f23840e3efa829028a146117081c094a/examples/distributed_bakery.rs#L463

In conclusion

Using an Automerge document wrapped by Spanreed is an easy way to prototype a distributed system. There is no need to define network messages, one simply works directly on a kind of semi-shared state. Even though Automerge gives no other guarantee than eventual consistency, one can build stronger guarantees on top of it, simply by:

  1. Assuming FIFO links.
  2. Thinking in terms of safe registers
  3. Thinking in terms of reacting to changes to the document

While a CRDT is often associated with collaborative editing, I find it is also a natural way to structure other kinds of distributed system. After all, most distributed system involve peers engaging in some form of “collaborative editing” of data, and I wonder what other distributed systems could find Automerge useful, such as for example distributed deep training.

Thank you for reading, and a special thanks goes out to the people at Ink & Switch: Alex, Orion, and Peter, as well as Issac and Han Hui at Bowtie, the Automerge sponsor that made the initial work on Spanreed possible.

--

--

Gregory Terzian

I write in .js, .py, .rs, .tla, and English. Always for people to read