Building a real-time collaborative text editor for the web (DraftJS ❤ ShareDB)
A few months ago, I was sitting around the meeting table with Johannes Weiss and Felix Gast on a Wednesday night. This was the weekly Jour Fixe for our startup, Conode, a productivity SaaS that helps teams to organize meetings.
Our sales leads and users wanted to edit pages collaboratively — you know, Google Docs style. We initially thought this to be too big of a challenge, since we lacked the budget for outsourcing and the internal knowhow to implement this ourselves. Until that night, we realized that this was critical for our survival, so Felix and I bravely said
I’m saying brave, because we just took over the code base from a competent agency called Thinslices. We didn’t really know the tech stack that well to start out with. So, it promised to be a bumpy ride… Then again, that’s how we like them.
Before jumping into code, we need to talk theory. The complexity of this distributed system is not to be underestimated and therefore a high-level overview will help to understand what's going on.
In order to collaborate, this document state must be shared among multiple peers by sending messages between them over an insecure network. A protocol is needed to properly manage this.
Wait, what exactly needs to be managed? Why can’t we just send the state object around as soon as someone edits some text?
It’s good practice to challenge yourself with simple questions along the way. It helps to wrap your head around the problem.
Well, imagine that two users type something at the same time. In such a scenario,
- both clients will end up with a different state, and
- one of the two changes will be overwritten.
That’s a bad UI, so we definitely want to avoid that.
The two issues described above, correspond to two major technical conditions that our protocol needs to fulfill :
- Convergence : all editors must converge to the same document state after a finite amount of time.
- Concurrency : edits that occur in parallel lead to a correct end result, independent of the order they are executed in.
This is a bit of a simplification. Research papers will talk about eventual consistency, commutative & idempotent conditions, the need for a central server, … All this academic literature has proposed a plethora of protocols and algorithms — some more legit than others (see article below). For the sake of conciseness, we will not delve deeper into that matter and simply say that they can be classified in either one of the two following categories :
- Operational Transformation (OT) represents the document state as a sequence of operations. Every operation is created on top of a local snapshot. Now, imagine that the operation is sent to a peer, that made an edit in the meantime. That peer will have a different snapshot, so the operation first needs to be transformed before being applied. This is the essence of how OT works.
- Conflict-free replicated data type (CRDT) is a bit more complicated than OT. It uses more memory and bandwidth, but in return guarantees eventual consistency without the need of a central server. So, you could say it is more theoretically complete.
We were happy in the end to discover that it was the right choice :-)
Update (2 Mar 2017): there is now working code for the ideas in this post, with additional optimizations.medium.com
Conode is a single-page application, which uses React+Redux. The text editor is based on the famous Draft.js framework. It doesn't offer much out of the box, but according to their own words "In Draft.js, everything is customizable."
The problem is that Draft.js isn't made for collaborative editing. This has to do with the fact that it’s API mostly exposes State and not Operations. The community actually seems divided on the issue. In the end, whether it's doable or not, depends on your functional and performance requirements.
To inter-connect DraftJS editors for collaboration, we need web sockets. This technology allows us to send messages from and to the browser (bidirectional) with little overhead, which is not possible over traditional HTTP.
So far for messaging. But, what about the application layer that takes care of that fancy OT protocol? After a lot of research, mainly consisting of reading countless Github issues and, admittedly, investigating existing apps using the Chrome Developer Tools' Network tab, ShareDB was the winning option.
Time to start coding. Firstly, we created a simple prototype which combined Draft.js with ShareDB. This allowed a quick test of our architecture without yet needing to face the complexity of building it into our existing codebase.
Remember that we said that Draft.js does not expose operations, only the EditorState. But, OT works with operations… To solve this, we used json0-ot-diff, a library that will compare the previous state with the new one (using convertToRaw). This gives us a JSON-type OT transaction, which we then pass on to ShareDB.
Such a calculation is costly in terms of performance, but the end result worked like a charm. Feel free to get in touch if you wish to receive a copy of that prototype.
The next step was to integrate this working solution into our existing codebase. This brought along challenges — more than we expected.
1. Single source of truth
To manage the document state in our frontend we use Redux. So, we needed to manage a single source of truth of the EditorState between Draft.js, Redux, and ShareDB. In the end we built a loop of functions and events, which can be seen in the image below.
2. Troubling race conditions
Our Text Editor React Component, containing Draft.js, had a few race conditions. In single user mode these were not a problem. As soon as users started making changes concurrently, occasional edits got overwritten. It was hard to detect patterns and when we fixed one, new errors were triggered.
3. Microservices backend
ShareDB stores every change as an operation in its database. As we are creating a text editor for real-time collaboration, this amounts to a large number of operations, which will be detrimental for storage capacity and computing power. Therefore, we built a collaboration service on top of our REST API workflows, that systematically empties itself. This kept the number of stored operations to a minimum and extracts the complexity of collaboration into an independent microservice.
3. Block-level locking
Our editor visually separates each paragraph into blocks. To minimize performance issues due to the EditorState comparison, we opted for a block-level locking after selection changes. So we disable an EditorBlock to all collaborators, whenever a user has selected it. This kept the diffing to JSON-type OT only without needing to compute it for the strings on top.
4. Detaching from React components
Our Editor was a pretty large React component to start out with. More than a 1'000 lines… In order not to lose ourselves in an endless refactoring effort, we first thought about creating a higher-order component, which will add collaboration flavor to the existing editor. In the end, it was way more simple to put our collaboration logic in the redux action creator that handled updates from our editor.
5. Dealing with edge cases
To avoid breakdowns many edge cases needed to be covered. For example, automatic web socket reconnection when your wifi falls out, detecting dead web socket clients, properly opening/closing ShareDB subscriptions when the user goes to the dashboard and opens another page, etc.
The end result was working, but had some glitches left due to the race conditions of bullet point 2. These bugs were very difficult and we decided to not lose any further time on them due to a client deadline. As a temporary solution we placed a lock on the entire page, which can be requested and passed from one user to the other.
Admittedly, the final solution is not perfect. However, now we know it works and what refactoring is needed in order to make it shine.
- Prototyping really pays off, as it allows to quickly validate your architecture. Without it we'd never had gotten this far.
- Plan more time for refactoring code.
- A bit of theory goes a long way in distributed systems. Even though ShareDB is out of the box, understanding the model behind was a necessity.
I hope this blog post gives insight to teams, that develop their first real-time collaborative text editor for the web. If it does, let me know. If it doesn't… thank you, come again.