Lessons learned from creating a rich-text editor with real-time collaboration
How we approached collaborative editing
Real-time collaboration is a feature we wanted to introduce since the inception of CKEditor 5. The research that we made back in 2012 and some failed attempts that we observed all around showed us that full support for collaborative editing for rich-text data cannot be added on top of existing projects. A proper architecture has to be designed and implemented from scratch, with real-time collaboration treated as a first-class citizen in the entire project.
As simple as it sounds, for us it meant leaving behind years of WYSIWYG HTML editor experience and a rock-solid code base of CKEditor 4 that we were proud of and that our customers appreciated. Leaving a code base estimated at 50+ man-years and r.e.s.t.a.r.t.i.n.g.
We were quite seriously scared of repeating the infamous history of Netscape which was a well-known example of failing to successfully release a newer version of a popular software after deciding to rewrite it from scratch. Fortunately, it did not happen in our case.
It took us nearly 4 years, but we succeeded. CKEditor 5 Framework was built with real-time collaboration in mind, from its very foundations. The integrity of the platform was validated by CKEditor 5 Collaborative Editing — a set of features that enables users to create and edit content together in a real-time collaborative environment.
This article describes how we approached the problem and what challenges we had to overcome in order to provide real-time collaborative editing capable of handling rich text. Check it out if you are interested in:
- Learning what problems you may face when implementing real-time collaborative editing.
- Building a rich-text editor with support for real-time collaboration.
- How we approached collaborative editing in CKEditor 5.
Real real-time collaboration
Since collaborative editing is a highly desired feature (see , , ), many projects boast to support it. However, very few solutions are able to provide the top quality and completeness. Additionally, the terms “collaborative editing” and “collaboration” are quite broad and can be understood in a variety of ways which leads to even more confusion among potential users.
This article describes how real-time collaborative editing was implemented in CKEditor 5. The terms “collaboration” and “real-time collaborative editing” are used interchangeably throughout the document to refer to “real real-time collaboration” as implemented by CKEditor 5.
Since the beginning, our goal was to provide a solution that would bring no compromises when it comes to collaborative editing. There are many shortcuts one may try to use to enable collaboration in an application which has not been designed for it, but in the end, they all result in a poor user experience:
- Full or partial content locking. Only one user can edit the document or a given part of the document (a block element: paragraph, table, list item, etc.) at the same time.
- Collaboration features enabled in “read-only” mode. Users are able to make comments on text but only if the editor is in “read-only” mode.
- Manual conflict resolution. Edits in the same place would have to be resolved manually by one of the users.
- Only basic features enabled in collaborative editing. You can bold the text or create a heading, but forget about support for tables or nested lists.
- Lack of intention preservation. After conflicts are resolved, the user ends up with a different content than what they intended to create (in other words: poor conflict resolution).
We wanted to avoid all these pitfalls. It required creating a truly real-time collaborative editing solution that enables all users to simultaneously create and edit content without any limitations or features stripping. We always had one idea: the editor should look, feel and behave the same, no matter if collaborative editing is on or off.
It’s all about conflicts
During collaborative editing, users are constantly modifying their local editor content and synchronizing the changes between themselves. When two or more users edit the same part of the content, conflicts may, and will, appear. Conflict resolution is what makes or breaks the collaborative editing experience.
For example, when two users remove a part of the same paragraph, their editors’ states need to be synchronized. However, this is problematic: when User A receives information from User B, this information is based on User B’s content — which is different than what User A is currently working on.
This is one of the simplest scenarios but even that, without proper mechanisms in place, would lead to lack of eventual consistency — a fundamental requirement of any collaborative editing solution. Some editors introduce full or partial content locking to prevent this from happening, but this was not the kind of limitation that we would accept.
Side note: One may think that in real-life use conflicts will not happen frequently and, perhaps, you do not need a sophisticated solution to them. Could we not simply reject changes if we discover a conflict? It turns out that in reality conflicts are quite frequent and rejecting one user’s changes when they happen leads to an awful user experience.
Our take on Operational Transformation
There are several approaches to implementing conflict resolution in real-time collaborative editing. Two main candidates are Operational Transformation (OT) and Conflict-Free Replicated Data Type (CRDT). We chose OT and perhaps one day we will write down our thoughts on the ongoing OT vs. CRDT battle.
Long story short, CKEditor 5 uses OT to make sure it is able to resolve conflicts. OT is based on a set of operations (objects describing changes) and algorithms that transform these operations accordingly so that all users end up with the same editor content regardless of the order in which these operations were received. As a concept, it is well-described in IT literature (, ) and it is proven by existing implementations (although none that could serve as a stable and powerful enough base for our needs).
Therefore, in 2015 we started working on our take on OT implementation. We quickly realized that basic Operational Transformation (as usually described and implemented) is not enough to provide the top quality user experience for rich-text editing. OT in its basic form defines three operations: insert, delete, and set attribute. These operations are meant to be executed on a linear data model. They are responsible for inserting text characters, removing text characters and changing their attributes (for example to set bold). However, a powerful WYSIWYG HTML editor requires more than that.
Support for complex data structures
The linear data model is a simple data model that is sufficient to represent plain text. On the contrary, HTML is a tree-based language, where an element can contain multiple other elements. An HTML document is represented in the browser as the Document Object Model (or DOM), which is tree-structured. It is possible to represent simple, flat structured data in a linear model, but this model falls short when it comes to complex data structures, like tables, captioned images or lists containing block elements. Elements simply cannot contain other elements. For example, a block quote cannot contain a list item or a heading.
Hence, we needed to make a step further and provide Operational Transformation algorithms that work for a tree data structure. Back in 2015, there was literally one paper about OT for trees () that we could find and no evidence of anyone working on OT for trees. We based on that research, but the reality turned out to be even more challenging than we could have expected. The first implementation took us over one year, with several significant reworks over the next two years. The result is, however, outstanding. We not only managed to build the engine for real-time collaboration but also implemented a complete end-user solution which verifies what would be a theoretical work otherwise.
The diagram below shows how a simple structured content can be represented in a linear data model:
The diagram below shows how a more complex piece of rich text can be represented in a tree-structured data model:
Advanced conflict resolution
Switching to the tree data model was not enough to implement a bulletproof real-time collaboration. We quickly realized that the basic set of operations (insert, delete, set attribute) is insufficient to handle real-life scenarios in a graceful way. While, perhaps, these three operations provide enough semantics to implement conflict resolution in a linear data model, they did not satisfy the semantics of rich-text editing.
Below are some examples of situations where users simultaneously perform an action on the same part of content:
(1) User A changes the list item type (from bulleted to numbered) while User B presses Enter to split that list item:
(2) User A and User B press Enter in the same paragraph:
(3) User A wraps a paragraph into a block quote while User B presses Enter:
(4) User A adds a link to a sentence, while User B writes inside that sentence:
(5) User A adds a link to some text, while User B removes a part of that text and then undoes the removing:
To properly handle these and many other situations we needed to heavily enhance our Operational Transformation algorithms. The most important enhancement that we made was adding a set of new operations to the basic three (insert, remove, set attribute). The goal was to better express the semantics of any user changes. That, in turn, allowed us to implement better conflict resolution algorithms. To the basic three operations we added:
- The rename operation, to handle element’s renaming (used, for example, to change a paragraph into a heading or a list item).
- The split, merge, wrap, unwrap operations to better describe the user intention.
- The insert text operation, to differentiate between inserting text content and elements.
- Unrelated to conflict solving, we have also introduced the marker operation.
Why do we need these new operations? Rename, split, merge, wrap and unwrap “actions” can be executed by a combination of insert, move and remove operations. For example, splitting a paragraph can be represented as a pair of “insert a new paragraph” + “move a part of the old paragraph to the new paragraph”. However, the split operation is semantic-focused — it conveys the user’s intention. It means more than insert + move which just happen to be executed one after another.
Thanks to the new operations, we can write more contextual transformation algorithms. This way we can resolve more complex use cases like scenarios (1–4) described above.
Side note: We believe that the set of necessary operations is strongly connected to the semantics of the tree data that you are representing. A rich-text editor has a different nature than a genealogical tree and hence requires a different set of operations.
Adding the new operations still did not solve all the problems. We needed to extend our Operational Transformation implementation even further to handle the scenarios that we discovered over the years. Here are the most significant additions that we made:
- The graveyard root — A special data tree root where removed nodes are moved that enables better conflict resolution in scenarios when User A changes a part of data which is at the same time removed by User B (scenario (5) and similar).
- Generalizing operations to work on ranges instead of singular nodes for better processing and memory efficiency.
- Operation breaking — Sometimes, when being transformed, an operation needs to be broken into two operations, for example when a part of the content was removed (scenario (5)).
- Selective undo mechanisms — Undo feature needs to be aware of collaborative editing, so, for example, a user is able to undo only their own changes.
If you read up to this point, congratulations! 😃 In fact, we could write much more about every single thing mentioned in this article, but that would make it painfully long. If you are interested in a detailed overview of anything specific mentioned here, let us know in comments and we may create a separate article about it.
Real-time collaborative editing in CKEditor 5
So far, we talked about implementing real-time collaborative editing in general. Those low-level topics were platform-agnostic, but there is also the second part of this big puzzle — the end-user features and the platform’s architecture that allows implementing these features.
Dedicated collaboration features
Apart from enabling the users to share and edit the same document simultaneously (you can test it live on https://ckeditor.com/collaborative-editing/), we implemented some dedicated collaboration features that make the users’ real-time collaborative editing experience as engaging as one would expect from a complete WYSIWYG editor solution:
- Comments feature — Adding comments in real time, as other users edit, to any selected part of the content (commenting in “read-only mode” is supported, too).
- Users’ selection feature — Visual highlights at exact places where other users are editing to further emphasize the collaboration aspect and help users navigate inside the edited rich-text document.
- Presence list feature — Showing photos or avatars of users who are currently editing the document.
Support for rich-text editing features
Our editing framework is built in a way to support all rich-text editor features in the collaboration mode. From simple ones like text styling, through image drag and drop and captioning, to complex ones like undo and redo, nested lists or tables.
Since mechanisms used in real-time collaborative editing lay at the very foundation of CKEditor 5 Framework, any new feature added to the rich-text editor will also be available in collaboration mode.
Support for third-party plugins
A WYSIWYG HTML editor is usually just a component of a bigger platform or application, so we needed to design its architecture in a way to make it flexible and easily extendable. Your custom features need to be as supported in a collaborative environment as the core ones. If you need to develop your own piece of editor functionality, there is a high chance that you will not need to write even a single line of code to enable it for collaboration.
Developing features for real-time collaborative editing with CKEditor 5 Framework is easy thanks to the following advantages:
1. Data abstraction (model-view-controller architecture).
The rich-text editor content (the data) is abstracted from the view and from the DOM (the browser’s content representation). This brings an important benefit: abstract data is much easier to operate on. A content element (for example, an image widget) can be represented as one element in the data model, instead of a few (as it is in the DOM or HTML). Thanks to that, the feature code can become much simpler.
2. A single entry point for changes.
Every change performed on the editor data, internally, always results in creating one or multiple operations. Operations are atomic data objects describing the change. These are then used to synchronize data between collaborating clients. Thanks to that, every CKEditor 5 feature is supported in real-time collaborative editing “out of the box”.
3. A simple API built on a powerful foundation.
All the mechanisms responsible for the magic are hidden from the developer. Instead, we provide an API resembling what you are already used to. Changing the data tree is easy thanks to intuitive methods that perform actions which are then translated into operations behind the scenes.
4. Data conversion decoupled from data synchronization.
After the editor data model is changed, the changes are converted to the editor view (a custom, DOM-like data structure) and then rendered to the real DOM. The important thing is that only the editor data is synchronized — the conversion is done on every client independently. This means even a complicated feature -if represented by an easy abstraction- is still easily supported in the collaborative environment.
Markers are ranges (“selections”) on content that are trackable and automatically kept in sync while the data tree is being changed — also during collaboration. Thanks to them creating features like user selection or a comment to the text is a breeze.
Post-fixers are callbacks which are called after the editor data changes. They are not exclusive to collaboration but can be used to fix the editor model if your feature is complicated.
Real-time collaboration backend
Real-time collaboration requires a server (backend) to propagate changes between connected clients. Such server also offers additional benefits:
- Your changes will not be lost if you accidentally close the document. A temporary backup in the cloud will always be available.
- Your changes will be propagated to other connected users even if you temporarily lose your internet connection.
We have implemented the backend as a SaaS solution ready for zero-effort instant integration with your application. However, if for various reasons you cannot use a cloud solution, an on-premise version of the collaboration server is also available.
We spent significant time and effort in designing and implementing a highly optimized client-server communication protocol for real-time collaboration. We plan to talk more about some optimizations we worked on recently in another article (to be published soon).
Apart from constantly adjusting and optimizing the real-time collaboration algorithms, we plan to introduce more features that will bring the ultimate collaborative editing experience to CKEditor 5 Ecosystem. We have already started prototyping and preparing the architecture for them:
- Suggestion mode (aka track changes) — Add your changes as suggestions to be reviewed later.
- Mentions feature — Configurable autocompleting helper, providing a way to quickly insert and link names or phrases.
- Versioning and diffing — Save versions of your document and compare them.
We started building our next generation rich-text editor with the assumption that real-time collaborative editing must be the core feature that lies at its very foundation — and this meant a rewrite from scratch. After a lengthy research and development phase, we created an Operational Transformation implementation, extended to support tree-based data structures (rich-text content) for advanced conflict resolution. The successful implementation of the CKEditor 5 Framework collaboration-ready architecture was validated by working solutions from the CKEditor Ecosystem: CKEditor 5, CKEditor 5 Collaboration Features and Letters.
Behind the scenes, the implementation of it all took a lot of our effort (that, frankly speaking, exceeded our initial estimations by the factor of 2… 😃). Here are some numbers about the project to give you more perspective:
- The number of tickets closed: 5700
- The number of tests: 12500
- Code coverage: 100%
- Development team: 25+
- Estimated number of man-days: 42 man-years (until September 2018), including time spent on writing tools to support the project like mgit2 and Umberto (the documentation generator used to build the project documentation)
We hope you enjoyed reading the article. If you would like to read more about anything specific related to real-time collaboration or CKEditor 5, let us know in comments.
If you would like to play with the final result of our work, check https://ckeditor.com/collaborative-editing/.
Originally published at ckeditor.com.