The enigma of Collaborative Editing

12 min readSep 11, 2019

Collaboration is a necessity in today’s globalized world, and collaborative editing is a powerful feature. But what made it possible. Exploring the “Operational Transformation Algorithm” that is the fuel behind it.

The Old Days!

Remember the days when you collaborated with your teammates over a project without the aid of any collaborative editing or version control software. Chances are you have, and chances are you remember the pain!

It all starts with one of you creating the first draft and circulating it to the others via email for collaboration and feedback. The teammates will download their own local copy, and edit the document offline at their leisure. As soon as they are through with the changes, they will revert back with their edited copies one by one.

Individually they all can add new things or remove redundant things or modify existing items, correct typos, apply formatting, or change the name of the document. The sender (the poor fellow) will have to collate everything at his end, and go through the herculean, and excruciatingly painful process of merging all the changes and create the final copy of the document, which he sends, with a sigh of relief and a huge victory sign, to all of them for reference.

And then… somebody has second thoughts (we all have second thoughts, let’s face it :)).

The new ‘final 2’ version was released, followed by ‘final 3’, ‘super final’, ‘latest super final’, and after 100 other final versions, the matter was put to rest by the sender by tearing out his hairs, deleting his email account and throwing his laptop out the window.

Collaborative Editing!

Thus, this new technology was born, to reduce the misery of collaboration. It allows multiple users to edit the document simultaneously, and provides a mechanism to make the document in the same consistent state by incorporating all the edits.

It comes in two flavors,

Asynchronous collaborative editing:

The changes are not synced at real time. The users can concurrently edit their own copy but to sync with the server and other clients, they will have to manually Commit or Update the changes.

The system manages conflict resolution on its own, merges multiple edits and brings the copy to the consistent state. However, there are rare cases where it needs the help of the user to resolve the conflicts.

All revision control technologies Git, Subversion etc fall under this technology.

Synchronous (real-time) collaborative editing:

The changes are synced at real time. The users can concurrently edit their own copy but will also see the changes made by other clients instantaneously. Users are not required to manually commit or update the changes, the system does it on its own.

The system takes complete ownership of conflict resolution, merging multiple edits and bringing the document to a consistent state.

This concept was popularized by a product called Writely, which offered browser based editing. It was acquired by Google in 2006 and made into Google Docs.

Today, this feature is heavily popular in Google Office Suite. Microsoft has come up with his own feature in his Office suite called Co-authoring. Few more popular players are Etherpad and Mockingbird.

Locking Mechanism

Whenever we talk about shared resource, the first thing that comes to our mind is Locks.

This problem cannot be solved using traditional locks (so called Pessimistic locks) like Mutex or Semaphore because they give turn based access, and only allow one user to get edit access at a time. It is highly infeasible when there are hundreds of users accessing the document concurrently.

To solve this problem, we need the Optimistic locking mechanism. In the optimistic locking scheme all the users can have edit access of the document at the same time. We need to adhere to certain rules to resolve the conflicts.

It works by mandating every shared server resource to have a version or a timestamp or a hash associated with it. When a client reads the server data, it remembers the version associated with it. It then processes the data locally. When it wants to write the new data back to the server, it should increment the version. Now, if in the meantime, another client updates the data and the version, the first client cannot write it back since its local copy was not branched from this server version. So, it should discard its local data, fetch the new server data again, and operate from start.

Sync Strategies

There are two ways to sync the local changes to the server.

Event passing (Operational transformation): The client can send raw events (operations) character by character to the server (like character M inserted at position 3, character O deleted from position 10 etc.). This is also called operational based sync.
Differential sync: The client can send the delta (difference) between server version and local copy. The server will not know of the exact operation performed at the client in this case.

Operational Transformation (OT)

The brain behind the famous real time collaborative editing applications is a cool algorithm called ‘Operational Transformation’. It is designed to resolve conflicts between users in such environment. The ultimate intention of Operational Transformation is to maintain and synchronize a consistent state between any number of users in a shared document in high latency environments such as Internet.

Concurrent Operation Problem

In this scenario, the users start with an initial string “abc”, User-01 intends to insert the character “Y” at the index “1” and User-02 intends to insert the character “Z” at the index “1”. Both of the users execute the operation locally, resulting in the strings “abYc” and “abZc” respectively. After local execution, they both send their operations to each other and execute it straight away — subsequently resulting in two different strings “abZYc” and “abYZc”.

This problem illustrates how naively sending operations between clients will result in inconsistent states between users. The solution to this problem is to transform both the operations with respect to each other, and then apply it to their counterparts.

Transformation

Google defines this transformation process with the mathematical identity;

The math form is little daunting, but it is best explained graphically using the OT Diamond illustration.

Two clients perform two operations simultaneously (operation a and b) at their respective ends. These two operations can be send to a transform function that will generate two new operations called (a’ and b’).

When b’ is applied to a on one client end, and a’ is applied to b on another client end, both the clients will come to the same consistent state.

To make it more clear, let’s go through our example.

Our original string was ABC (when everything was in sync).

User 1 performed operation a (insert Y at index 1) resulting in the state: abYc.

User 2 performed operation b (insert X at index 1) resulting in the state: abXc.

Note the timestamp is maintained for both the operations to stick to the laws of Optimistic Locking. Based on the timestamps we will come to know operation a happened before operation b.

When we pass both these operations and their order to the transform function, it will result in two new operations.

operation a‘ : insert Y at index 1

operation b’ : insert X at index 2

User 1: Apply b’ after applying a. It will first insert Y at index 1 to the original string (ABC), and then insert X at index 2 to the intermediate string (abYc), getting the resultant string as abYXc.

User 2: Apply a’ after applying b. It will first insert X at index 1 to the original string (ABC), and then Y at index 1 to the intermediate string (abXc), getting the resultant string as abYXc.

After applying the transformed operations, both clients arrive at the synchronous state.

Similar transformation using OT can be applied to any two set of operations (Insert-delete, delete-insert or delete-delete). The operations could also be formatting of data, annotations etc.

Compound OT

The diamond problem that we just witness is pretty much the ONLY thing OT algorithm can solve at the fundamental level. Given two upper sides of the diamond, OT can transform them wrt each other and complete the remaining bottom two parts.

But life is not so rosy when things scale up. Adding just the third concurrent operation in the diamond pose a challenging problem. Let’s explore it in a client — server architecture. The server is a central authority which is maintaining the consistent state of the document (Single source of truth). The clients can edit the document concurrently and send the changes to the server to sync.

Consider a case when a client performs two consecutive operations a and b. It sends the operations one by one to the server. The server, meanwhile, got operation c from another client working concurrently elsewhere.

So, now server has to first merge a and c because those two operations are branched from the last synced server state. The Server can apply transformation to operation a and c, and figure out a’. It can then apply a’ on top of changes from c and get a consistent state. So far so good.

Ideally, at this point, the server should send c’ to the client to make the client state consistent. But just when it was about to do so. Operation b arrives, the server is in fix. It has nothing to transform operation b to because it was derived from a, and not from any synced server synced state. To fix this the server uses a technique called Bridging.

Bridge

To incorporate changes of operation b onto the server side, we need to somehow find a way to transform b. But we only have one side of the diamond. This is where server has to create a bridge to form the other side of the diamond (as shown by the dotted green line).

As it turns out, the bridge is nothing but transformation of operation c against a which the server knows how to perform because a and c both are derived from the same last synced server state. Once the bridge is established, the upper two sides of diamond is complete for operation b. So now, the server can compute b’ (the lower side of the diamond). Apply it on its local copy (the one formed after applying a’ earlier) and make everything consistent at its end. The server now has applied operation c, a and b at its end successfully.

However, the client is yet to be in sync. It needs a operation which it can apply to its own state to make his end consistent. This is achieved as shown below,

The server will transform c against the composition of a and b, producing c’. It sends c’ back to the client which applies to its own state and become consistent.

Problems with this approach

Although the above demonstrated approach will work functionally to any number of client and server operations, and make everything consistent, it has huge scalability issues mainly because the server is required to do all the heavy lifting.

The server is performing all the transformations. The server is required to create these bridges, and for that it has to hold onto every intermediate operations. In the above example, after transforming a and c against each other, and getting a’, the server cannot forget c’ because it could be further needed to create bridges (like in case of operation b’s bridge).

This sounds like a trivial problem but imagine if there are hundreds of operation pumping in from a client and hundreds of un-synchronized operations pending on the server end. The server has to remember all intermediate bottom half OT operations for bridging. It all seems practically impossible.

Solution to scale up the system

To solve the above problem, we need to add following rules in the system.

All the operations that client sends to the server needs to be branched out from a known state in the server history. Otherwise, the server will discard the operation. So, operation b in the above example will be discarded because it was not branched from a known server state.
Client will only send one operation at a time, and wait for the acknowledgement from the server before sending the next one.
Server will broadcast the new changes it has applied to its own state to all the clients as soon as it applies it.

Of-course, simply rejecting the divergence will make the whole collaborative system all but useless. So, the server offloads all the heavy lifting to the client. We don’t want the server to maintain intermediate states of thousands (or millions) of clients. The server offloads all the compound OT work to the client. Server will only perform simple OT in which there will never be any need to remember any intermediate state.

The client will keep the track of its own local unsync operations and also listen for new server operations that happened after its own last synced version. The client has to keep track of just one server, since there’s never going to be any more than one (logical) server. This does simplify things from the server’s perspective but the client will not do double the work as illustrated below.

The Most Complicated Case

To being with, client performs an operation a from the last synced server state. It starts building the bridge until the operation is synced with the server. It sends operation a to the server, and waits.

Client performs another operation (b) locally but as the rules says, it must not send it to the server until it receives acknowledgment from the server about operation a. So, it keeps operation b in the buffer.

Meanwhile, at the server end, before operation a, two more operations were pending to be synced. (operation c and d). They happened before a as validated by the timestamp. Server applies c to its local copy and creates a new state. It then broadcasts c to all clients.

Our client receives operation c which it transforms against its two local operations (a and b). It generates c’ which it adds to the bridge.

Client performs another local operation e locally. It adds this operation to the bridge and also adds to the buffer behind b.

Meanwhile, the server performs operation d at its end (on top of c), and broadcasts the message to all the clients. Our client receives the message, it transforms d against the composition of all its buffered operations (a, b and e), and generate d’. It adds d’ to the bridge.

Finally, server picks up operation a from its queue, transforms it against operation c and d (these two operations were branches out from the same parent as that of operation a). Generates operation a’. It applies a’ to its local state making it in sync. Broadcasts it to every client.

Our client will receive operation a’. It will come to know it is its own operation because with every operation client generates a unique id, which server returns back after transformation and broadcasting.

Now, the client will pick the next operation from the buffer (operation b) transform it wrt a’ since that is now the latest server state, and send it to the server for sync, and so on.

Conclusion

Just like its name, the algorithm has transformed the way collaborative editing is done. It is much more convenient, efficient and powerful. The same principles will soon be replicated to other industry domains like multimedia or engineering drawing collaboration platforms in not so distant future. As it all unfolds, and as you leverage the services of these platforms, space a thought for this amazing algorithm that is making it all possible under the hood.

Mehul Gala

The latest Tweets from Mehul Gala (@mehulgala77). Software developer and blogger. mumbai

twitter.com