Operational transformation

the newest and coolest tech from the 80s 

Victor Grishchenko
Revisiting *** series
3 min readNov 13, 2013

--

Operational transformation is the technology behind real-time editing features of Google Docs, Google/Apache Wave, Etherpad/Hackpad and many other products and APIs. OT allows to merge edits in real-time thus enabling collaborative real-time editing. The theory of OT dates back to 1980s when the first collaborative editor was implemented by Ellis and Gibbs. The novel part of the approach was to decompose an edited text into a chain of atomic insert/delete operations, often the size of a single letter,instead of comparing snapshots of the entire text. By academic standards, the OT theory had an obscure life full of troubles and misadventures. Some OT papers were later found to lack in accuracy, others couldn’t fix obvious consistency issues. Worse of all, some authors started playing with the definition of success.

Due to the need to consider complicated case coverage, formal proofs are very complicated and error-prone, even for OT algorithms that only treat two characterwise primitives (insert and delete) — Li&Li, 2010

That notorious complexity/clarity issues of OT stem from the fact the theory relies on volatile entity identifiers. Namely, OT addresses edits and parts of text by their respective offsets in the text; the problem here is that every edit changes those offsets. In the rest of computer science, permanent identifiers are sacred. In those terms, OT is pretty much a satanic cult of computer science land. Every mutation changes identifiers, mutations originate asynchronously at different locations and the resulting mess has to be reconciled somehow. Well, eventually professionals have figured out that how. One corporation enjoying extreme surplus of brain mass was particularly successful.

Unfortunately, implementing OT sucks. There’s a million algorithms with different tradeoffs, mostly trapped in academic papers. The algorithms are really hard and time consuming to implement correctly. … Wave took 2 years to write and if we rewrote it today, it would take almost as long to write a second time — Joseph Gentle, 2011

Well, do we have another way around, without renumbering volatile ids? We do. The entire problem of changing offsets may be trivially bruteforced by assigning every letter an unique id. Well, that sounds insane by the standards of the 80s, but... consider a web page that has 10KB of text, 100KB of HTML/CSS markup, 1MB of images and 10MB of video. Another 100KB of text metadata will hardly make things worse. Meanwhile, permanent letter ids provide a plethora of new possibilities. A text becomes akin to a spreadsheet then; every little bit can be referenced precisely.

Some examples of this approach are systems by INRIA researchers: WOOT and Logoot. Your humble servant described another one in an article: Causal Trees (CT). As a mathematician by education, I’ve made a terrible mistake in devising the concept of that article: I decided to implement key algorithms of my system using regular expressions. Mathematically, regexes are deterministic finite automata, a concept of ultimate computational simplicity. I thought, that will illustrate the simplicity and efficiency of the approach. Unfortunately, most of the readers associate regexes with black magic, so the trick failed. Marketing and logic are totally different things. Also, because of one technical inaccuracy, I’ve made the correctness proof twice longer than it should be; some advanced trickery wasn’t actually necessary.

Later, when I implemented a production system in JavaScript, the merge/correctness related code took next to nothing. Maybe 100-200LoC, depending on how to count. Formatting and HTML rendering took way more. Server-side issues, such as authorization, synchronization, storage, quotas and other boring stuff took even more. Actually, more than half of the effort was sunk into “real world issues” (browser compatibility, performance, stability, etc).

Well, given my rather modest brain mass I am very happy I didn’t choose OT. I experimented with it in the past and I remember my experiences well. CT gave me a bunch of whole new possibilities. The operation log was trivially mergeable to the degree that mixing two edit logs produced a merged document. Any formatting overlay was trivial to add. Highlighting recent changes in a page was equally trivial.The only guarantee needed from the underlying infrastructure was a TCP-like correctness condition that edits either come in order with no losses or not come at all.

I must say, I am rather satisfied with the outcome.

My main conclusion from the OT story is that tradeoffs change. While some kinds of resources (traffic, storage, RAM) become cheaper and more abundant exponentially, other expenses (round-trip time, code complexity) stay at the same mark for years and decades. Hence, what was an optimization in the 1980s might turn the opposite in 2010s.

--

--

Victor Grishchenko
Victor Grishchenko

Written by Victor Grishchenko

Researching deep hypertext, distributed systems and the general information metabolism of the society.