How a Fluid Framework Service Works

Kurt Berglund
The Startup
Published in
5 min readSep 17, 2020

Fluid Framework allows you to build distributed, collaborative apps with lightning fast speed and buttery smooth performance. And it does so with a surprisingly simple and cheap service.

Running collaborative services at scale on the web can get expensive fast. The two predominate algorithms have their own strengths and weaknesses. Most OT implementations follow the Jupiter approach of a hub and spoke model. This makes convergence of the data model simpler but requires running code on the server to transform operations. To avoid this cost being O(N²) most implementations limit clients to only having one outstanding operation at a time which lets the server operate in linear time. But this hurts latency and forces clients to batch operations which complicates the transformation function. CRDTs allow you to eliminate a server entirely but come with their own set of scaling and use issues including lacking distributed GC, needing to download large index structures, and needing to translate from your CRDT data structure to your app data structures. Many systems skip both algorithms entirely and just lock parts of the document to avoid concurrency. But even this comes with complexity and some less than ideal user scenarios.

When we started Fluid we asked ourselves if we could build a scalable system that ran with near zero cost. This would obviously require pushing as much code as possible to the edge. But we also wanted to make sure the data structures running on the edge nodes could scale. To solve this we built a service that provides what we call optimistic consensus. Here’s how it works.

Fluid Service ordering messages sent from Alice and Bob

Clients start by sending their local messages to the server. The server then creates an ordered log of these messages. As part of creating this log the server stamps each message with a sequence number and then broadcasts the stamped message to all clients. The server may interleave messages from different clients but it is required to preserve the relative ordering of messages from an individual client (i.e. in the above example B will always have a larger sequence number than A).

With this we have a total ordering of operations. But on its own this isn’t particularly useful. There isn’t enough context to deal with concurrent edits. We need to add a bit more data to each message to make it work.

When a client creates a new operation it will have processed the totally ordered log up to some sequence number. We call this the reference sequence number. Essentially the reference sequence number indicates what the state of the world was when the operation was created. We update our clients to also include this piece of information when they send a message to the service.

Now data structures can start to get places. Each sequenced operation has a total ordering which can be used to break ties. And since each sequenced operation includes its reference sequence number you also can recreate the local state of each client and use this information to correctly update your data structures.

But we need one last bit of information to fully scale. With the system as defined clients need to handle any inbound reference sequence number. This can quickly get out of hand as data structures might need to store a lot of context information to correctly process the operation (i.e. an operation could make a change relative to the start of the document). To solve this we introduce another field — the minimum sequence number.

The minimum sequence number is defined as the minimum reference sequence number of all clients connected to the server. It is guaranteed to be monotonically increasing and the server requires that any inbound message must have a reference sequence number greater than the minimum sequence number. A server is free to kick out any client at any time in order to move the minimum sequence number forward. This sound severe but it’s non-disruptive to the local user. Given the eventual consistency property of the data structures, the user can keep using the app like nothing happened. The framework takes care of reconnecting, processing all operations up to the minimum sequence number, and then resubmitting the user’s operations (this is also how offline works).

Fluid Service message flow with reference sequence number and minimum sequence number

What the minimum sequence number lets clients do is have a point in the log before which they can ‘forget’ all state. Since clients are guaranteed to never see an op whose reference sequence number is before the minimum sequence number they can use this information to garbage collect. The minimum sequence number also provides a way to do distributed consensus. When the minimum sequence number changes you’re guaranteed that all clients will have seen and processed up to that op. We like to call the messages between the minimum sequence number and the sequence number the collaboration window. It’s the range within which connected clients may be making concurrent edits.

And that’s all a Fluid ordering service ever needs to do! It inbounds messages, checks that the reference sequence number is above the minimum sequence number, stamps a sequence number on it, updates the minimum sequence number and includes it in the message, and then broadcasts the message to all clients. The great part about this is it never needs to look at the payload. This makes it easy to define new data structures as well as encrypt the messages.

The simplicity of the protocol also makes it easy to leverage existing systems like Kafka, EventHub, Pulsar, Kinesis, NATS, etc… that are very good and very fast at creating ordered logs. You can let these systems handle a lot of the hard work of creating these logs. And then just append the necessary Fluid service details on top. This is exactly how the reference service works.

Hopefully with this message flow you can start to get an idea for how the distributed data structures achieve eventual consistency. Their algorithms range from simple (map) to complex (sequence). The internals of both I’ll save for a future post.

--

--