Working code for Operational Transformation/CRDT hybrid

Raph Levien
5 min readFeb 23, 2017

--

Last year I published an essay entitled Towards a unified theory of Operational Transformation and CRDT. In it, I proposed that a certain class of Operational Transformation algorithms (those satisfying the TP2 property) also satisfied the requirements for being Conflict-free Replicated Data Types, and had overall desirable properties. I have now released a prototype, and in this follow-up I will explain it, at least in broad strokes.

I could have just implemented GOTO (Generic Operational Transformation Optimized) with Tombstone Transformation Functions, and that that would have made a pretty good demonstration of collaborative editing with a small, simple codebase. That would have had most of the advantages I’m claiming, including the ability to work in arbitrary peer-to-peer network topologies rather than requiring a centralized server. In addition, since OT techniques represent the string as a string rather than a graph or tree of character nodes (as in WOOT and similar CRDT’s), they are quite RAM-efficient. I don’t consider GOTO to be harmful.

But I went farther. In particular, my code has an efficient (balanced binary tree) data structure for representing sets of indices. Even so, the entire codebase, including admittedly very sparse UI, is 400 lines of JavaScript, probably one of the smallest and simplest collaborative editing prototypes ever shown.

Efficient tombstones

The tombstone approach represents the document as a sequence of all characters ever inserted, whether they were deleted or not. The traditional tombstone approach would be to store those characters in a string, alongside a set of deleted characters. To reconstruct the actual document, you’d apply a read function consisting of deleting the characters from that set (this is called viewToModel in the TTF paper).

However, it’s not actually necessary to store the deleted characters in the string, only to record the fact that the character has been deleted. The TTF paper (in section III.D) describes how to do this; you delete characters out of the string, and keep a data structure alongside that tracks deleted characters. There are two coordinate spaces, one indexing the string, and one indexing the (virtual) string that also contains the tombstones. The TTF paper suggests a straightforward linear-time algorithm for computing the transform between these two spaces.

My code stores the set of deleted indices in a balanced binary tree, and can compute both the forward (xi in the code) and inverse transformation (xi_inv) in O(log n) time. Operations to delete a single character (union_one) and increment all indices past an insertion point (xi_one) are also logarithmic. The data structure is about 150 lines of JavaScript, a non-trivial fraction of the total, but the efficiency improvement is substantial.

Using a set to represent a coordinate transform has some interesting mathematical properties. Here’s a mathematical description of the forward transformation of the index i through the set S:

Ξ_{S}(i) = maxⱼ|{k | k<j ∧ k ∉ S}| = i

And here’s a picture of the transformation, for a single character at index 1. Note the similarity to the traditional OT transformation for insertion:

Example of xi transform for a singleton set

The forward transform is a strictly increasing function over the natural numbers. There is a one-to-one correspondence between sets and such functions; for example, the set of non-primes would represent the function giving the nth prime. The composition of such functions is associative, which is a valuable property for manipulating them more efficiently. (The JavaScript codebase doesn’t have a function for directly composing two such sets, but there is an implementation in xi-editor, based on a run-length rather than balanced binary tree representation of sets). I don’t know of any existing mathematical literature that deals with or generalizes these concepts, but I wouldn’t be surprised to learn of it.

Efficient merge using coordinate transformation

The inner loop of GOTO’s merge operation strongly resembles an insertion sort; it does pairwise transposition of adjacent operations in the log until it’s ordered appropriately (which in this case means that the operations present in the incoming operation’s context all precede those not present in that context). That algorithm is, like insertion sort, O(n²), where here n is a measure of the number of concurrent operations in flight.

The traditional OT model requires consideration of one operation against only a single other operation. In that model, it’s hard to imagine doing better than quadratic. However, my code is not constrained to compute IT and ET transforms on pairs of operations. Rather, I use that same O(log n) data structure to represent the effect of jumping over a whole sequence of n operations. Thus, my merge algorithm (merge_op in the code) is O(n log n), a significant speedup from, as far as I know, any previous known OT algorithm.

If you look at the merge_op code, it’s fairly dense mathematically, even though not very many lines of code. For the brave and curious who want to dig into it, it will probably help to know that it computes exactly the same thing as Algorithm 2 in the GOTO paper, just more efficiently. The loop over transform_ins near the end of the function is pretty much just LIT from the GOTO paper.

An invitation

I believe the ideas in this code would make a pretty good academic paper. Among other things, the asymptotic complexity bounds are significantly improved from previous results. Further, while this implementation is obviously a toy, I think it could be the basis for a practical, efficient collaborative editing system.

Unfortunately, this work has been very much a side project, and it’s very hard for me to find the time to prepare a proper academic paper. There’s quite a bit to do, including making a convincing argument for the correctness of the whole thing. I personally believe it’s correct based on automated randomized testing of an earlier prototype, but an analytic argument would be better for an academic paper.

I think that taking this work to a publishable paper would make an excellent project, potentially at the scope of a Master’s thesis. If there’s anyone out there who is interested, I’d be interested in a collaboration.

In the meantime, I did want to get this code out there, so that people building the next generation of collaborative editing systems can see the value in adopting a hybrid approach that combines the best advantages of both Operational Transformation and CRDT.

--

--

Raph Levien

Raph Levien is a software engineer on the Fuchsia team, and creator of xi-editor. Opinions here are his own, not those of his employer.