Building A Collaborative Code Editor With React, Golang and GRPC from scratch

MrLobby
16 min readAug 9, 2020

--

For a better reading experience, view this on the original site.

I’ve been learning Golang and one of things I love so much about it is the fact that they have this.

This is especially useful when you’re halfway in a project and want to test something out but don’t want to go through the trouble of setting up another work directory for a small little test.

I’m also on that Leetcode Grind so this has been serving as my personal bug catcher. Also, I always thought it was kinda cool that companies would make use of collaborative code editing websites to conduct their interviews.

So I thought, why not try to make my own?

Problem

I started off giving myself some time to brainstorm for a solution and some of the things I came up with are as follows:

1. Send the whole document on every keypress

So obviously this wasn’t going to be my best idea but I’m not sure why I thought it would work under certain circumstances. The most obvious problem would be network delays causing each user to overwrite the other. Also, the amount of data to send over the network would begin to ramp up as the document gets bigger.

2. Send the letter along with the position to insert

This would fail since the positions change on every insert and the documents would get out of sync really quickly. This got me to the next one which I almost thought I had it.

3. Split the document by lines and handle inserts per line

I honestly thought this would work albeit being very hard to implement UI-wise since I would have to account for different line inputs and displaying the text. I also quickly realise that I shifted my problem from character positions in #2 to line positions in this one.

4. Attach a unique increasing counter for each character

Looking back, this was probably my closest idea. I thought of either linearly increasing the counter by a certain factor to account for minor edits in the middle but I wasn’t really sure how I would handle cases where the difference in count was less than 1 (e.g. inserting between counts 1 and 2). If I could do this however, it was just a matter of sorting and inserting at the right locations.

I wasn’t exactly working on this the whole time I was awake so at this point it was already a couple of days in and I decided to just Google it.

Solution

I came across two ways collaborative text editing could be achieved.

  1. Operational Transforms (OT)
  2. Conflict-free Replicated Data Type (CRDT)

I’m no expert so to give a very very very high-level overview:

OT works by changing the remote operation before applying it on the local document. This method is, in a sense, more algorithmic to determine what change must be applied to the received operation before executing it. I haven’t read up much about this because I chose to go for the latter.

CRDTs are just data structures that can be modified locally and have these updates propagated remotely without any conflict. There are many types of CRDTs and the one I chose to implement is the LSEQ (or at least my jumbled version of it). The final implementation mentioned in the paper is CRATE which is LSEQSplit, an extension of LSEQ. Their implementation can also be found here.

LSEQ

I’m not even going to pretend to fully understand the motivations and improvements of LSEQ compared to other CRDTs but I’m guessing it has something to do scalability of the size of identifiers allocated to each update. If you do want to know more, read the paper I linked to down below.

The general idea behind this is issuing a unique identifier to each character that you add in order to resolve conflicts in concurrent editing. This is done with the following elements.

Path

The path is just a number indicating where the character should be. A character with a higher path will come after a character with a lower path.

Site

The site is the unique value attached to each user, this will help to resolve conflicts when two paths are the same.

Count

Each character will also have a number indicating the number of operations the user has already performed on the document

These three form a triplet which is used to determine a character’s order in the document. The identifier for a character will contain a list of triplets. In the event that the first identifier is the same, the second identifier is compared to determine the order. To compare a triplet, the path is first compared, then the site and finally the count.

Example: <Path, Site, Count>

Identifier for "A" -> [<4, 'User1', 1>] 
Identifier for "B" -> [<5, 'User1', 2>]
The order will be "AB" since the path for "B" is higher than the path for "A"
Identifier for "A" -> [<5, 'User2', 2>]
Identifier for "B" -> [<5, 'User1', 1>]
The order will be "BA" since the site for "A" is greater (lexicographically) compare to "B"
Identifier for "A" -> [<3, 'User2', 2>]
Identifier for "B" -> [<3, 'User1', 1>, <1, 'User2', 3>]
The order will be "BA" since the first identifier for "B" is less than the first identifier for "A". The second identifier is not compared since the first is not equal.

The paper outlines an algorithm to generate these identifiers given the identifier of the character before and after it so do have a look at the paper to learn how it is done.

For a TLDR:

  1. Compare the path of the first triplets of the characters before and after the position you wish to insert.
  2. Obtain a number between the paths.
  3. Copy all triplets from the previous identifier up until the level of the newly generated triplet.
  4. Add the newly generated triplet to the list of copied triplets.
  5. Attach the character to insert with this list of triplets.

There is waaaaay more to this than what I’ve listed out. For example, the range of values for the path and the amount to increment for a new path are all controlled in the paper for best results. E.g. range of path values increment exponentially per level.

This is how I’ve implemented my identifiers and triplets

comparable.ts

// Interface to implement a generic form of Skiplist (for testing)
export default interface Comparable {
value: any;
compare: (other: any) => number;
}

triplet.ts

export default class Triplet {
path: number;
site: string;
count: number;

constructor(path: number, site: string, count: number) {
this.path = path;
this.site = site;
this.count = count;
}

compare(other: Triplet) {
if (this.path < other.path) return -1;
if (this.path > other.path) return 1;
if (this.site < other.site) return -1;
if (this.site > other.site) return 1;
if (this.count < other.count) return -1;
if (this.count > other.count) return 1;

return 0;
}
}

identifier.ts

import Comparable from "./skiplist/comparable";
import Triplet from "./triplet";
export default class Identifier implements Comparable {
value: string;
triplets: Triplet[];

constructor(value: string, triplets: Triplet[]) {
this.value = value;
this.triplets = triplets;
}

compare(other: Identifier) {
let i = 0;
while (true) {
let t1 = this.triplets[i];
let t2 = other.triplets[i];
if (t1 === undefined && t2 === undefined) return 0;
if (t1 === undefined && t2 !== undefined) return -1;
if (t1 !== undefined && t2 === undefined) return 1;
let cmp = t1.compare(t2);
if (cmp !== 0) return cmp;
else i++;
}
}
}

Data structure to store these identifiers

Now the paper definitely alludes to a tree-like structure and even the implementation of CRATE uses a tree. However, I was a little skeptical of the deletion process for a tree structure and whether or not there would be useless nodes hanging around so I decided to look up more implementations. I found a repository that implemented this in Rust and saw the words SkipList in the code somewhere.

Now having just completed my university course on Data Structures & Algorithms, I thought it was a perfect time to put into practice what I learnt (although I would definitely regret this later).

A skiplist is like a linked list but it has pointers that point to items on its left, right, top and bottom. There are multiple levels in a skiplist. The lowest level of the skiplist operates like how a normal linked list would and contains all elements of the list. The higher levels do not contain all elements. They act like express routes that allow you to skip elements. Skiplists are probabilistic data structures which give you O(n) search and insert for n elements.

The key to the skiplist is maintaining a sorted order. I will go through briefly on some of the operations and caveats of a skiplist but this will surely not be a comprehensive explanation.

Finding an element

These are the general steps to take for finding an element in a skiplist that is sorted in an ascending order.

  1. Start from the highest level of the head nodes.
  2. Compare the element on your right. If the element is smaller or equal, move to it. If it is equal, you have found the element. Otherwise, go down.
  3. If there are no more nodes below you and the element on the right is still larger. The element does not exist.

Suppose we want to find the number 5 in the picture above. These are the steps we would take. Note that this is a skiplist in ascending order.

The levels will be labelled 0,1,2,3 with 0 being the lowest level that contains all the elements.

  1. Start at level 3 of the head nodes.
  2. Compare element on the right which is 1. Since 1 is smaller than 5, we move to it and are currently at level 3 with value 1.
  3. Compare element on the right again, this time it is NIL. Thus we move down to level 2 with value 1.
  4. Compare the element on the right again which is 4. Since 4 is smaller than 5, we move to it and are currently at level 2 with value 4.
  5. Compare the element on the right which is 6. This is larger than our element so we move down to level 1 with value 4.
  6. Compare the element on the right which is still 6. This is still larger and we move down again to level 0 with value 4.
  7. Compare the element on the right which is 5. Move to it and you have found the element.

Inserting an element

  1. Find the element
  2. At the final position in step 1, insert the element between the current element you are on and the larger element on the right.
  3. Get a random number from [0,1]. If the number if 1, create a new node for the element and attach the new node to the original node’s top. Similarly, the new node’s bottom will point towards the original node. Repeat step 3 for the new node. If the number is 0, you have finished inserting the element

The interesting part is the randomization of “promoting” an element to higher levels. There is a 50% chance that an element will move on to level 1, 25% chance to move on to level 1 and 2 and so on.

Deleting an element

  1. Find the element
  2. Delete the element and all levels it has
  3. Reattach the left of the element to its right

Deleting is pretty simple where you remove the element and make sure the list is still linked by reconnecting the ends.

Minor change

In the LSEQ algorithm, we would need to insert and delete elements at various positions. To account for this, we just have to add another field to our elements that indicate how many elements you will skip. Higher levels will have a skip field with larger numbers and the lowest level will only have skip values of 1.

After inserting / deleting an element from the skiplist, I also return the position of that particular element in the list. This will be useful in correcting the position of the cursor later on.

skiplist/comparable.ts

// Interface to make skiplist and box type generic
export default interface Comparable {
value: any;
compare: (other: any) => number;
}

skiplist/box.ts

// Box type to wrap an object with pointers for skip list
export default class Box<T extends Comparable> {
item: T;
level: number;
skipped: number = 1;
top: Box<T> | null = null;
right: Box<T> | null = null;
left: Box<T> | null = null;
bottom: Box<T> | null = null;
head: boolean;

static Head(level: number);
constructor(item: T, level: number, head: boolean = false);
compare(other: Box<T>): number;
toString(): string;
}

skiplist/index.ts

// Functions supported by Skiplist
export default class SkipList<T extends Comparable> {
head: Box<T>[];
highestLevel: number;

// Initialize default values
constructor() {
this.head = [];
this.head[0] = Box.Head(0);
this.highestLevel = 0;
}

// Finds the Box at the position that we want to insert a new
// character in. This is needed for the LSEQ path generation algorithm
atPosition(position: number): Box<T>;

// Find a particular box with T.
// First return value indicates if the item exists
// Second return value gives the corresponding box or the one right before where box would have been placed
// Third return value returns the position of the box
find(box: Box<T>): [boolean, Box<T>, number];

// Inserts an item T into the skiplist and returns
// the position it is inserted
// -1 is returned if item is not found
insert(item: T): number;

// Deletes the item T in the skiplist and returns
// the position of the deleted item
// -1 is returned if item is not found
delete(item: T): number;

// Rolls a random float from [0,1] to determine if the box should be
// promoted to the level in the argument.This function calls itself
// again with a higher level if the current function succeeds in promoting
shouldPromote(prev: Box<T>, level: number, position: number);

// When an item is inserted / deleted, all higher level nodes on its
// right must be updated with a new "skipped" value
propagate(node: Box<T> | null, inc: number);

// Returns all items in a sorted array
get values(): T[];
}

gRPC with Golang

This is my first ever project with Golang and you should probably not take this code as reference for anything that you will ever do. For this project, the code was referenced off Tech School which has a really great playlist showcasing the different features of gRPC as well as its portability to different languages.

What I loved about Golang is how easy it is to learn. There are only a few unique features like goroutines, implementing interfaces and channels which can easily be picked up after looking at a few examples (which there are a ton of). The only difficulty I face so far is learning to write idiomatic Go code but I guess that would probably hold true for many languages. The standard library in Golang is also extremely useful. It comes with everything you need to get a webserver up and running without any third-party dependencies. A bunch of my favorite projects are also implemented in Golang (e.g. Docker, Kubernetes).

gRPC, incase you haven’t heard of it, is a remote procedure call framework that is run over HTTP/2. This was great for my project needs since gRPC over HTTP/2 supports streaming of data compared to a traditional REST API. I could have used websockets here, and probably should have, but I’ve already experimented with websockets in socket.io so I wanted to try and learn something new.

Strategy

The code is available so I’ll just give a quick run-through of my strategy.

  1. Users can send a request to create a room and this room will be created if it is not already taken. A boolean response is returned here to indicate a successful / failed request.
  2. Users can then send a connection request in which they will expect a stream of data back. This stream is captured by Go and added to the list of streams for the particular room.
  3. Typing on the frontend will send the identifier which is part of the LSEQ data structure and room ID to the server, which will forward it to all listening streams in the room.
  4. Running the program sends the current string in the code editor to the server, which will create a file [roomID].go and proceed to run it with go run [roomID].go. The output or error is captured and sent to all listeners of the room.

This was the core logic of the service and I ran into a TON of bugs just because of my unfamiliarity with Golang. For example, I locked the mutex before calling a nested function call which required access to the same lock which just hangs the execution.

Also, I didn’t realise that there would be additional dependencies required for my React application to communicate with the gRPC server. grpc-web was required to enable invoking gRPC services from modern browsers and even then, only some of the operations were supported like unary / server-side streaming. This does not work on ALL browsers too. grpc-web also relies on websockets to support streaming so I guess I would have been better off just implementing my server with websockets in the first place. However, I can see how this might be helpful for services which were already implemented in gRPC.

Creating the code editor with React

I didn’t want to add a whole ton of dependencies like routing so I just used a simple state change for the pages.

1. Create / Join Page

The create room button simply generates a random 6 character string and requests the server to create a room with that code. This code will be used to group users together. If a successful response is received from the server, the room is created and the application automatically sends another request to connect to the room.

The join room button takes the room code that the user can input in the text field and sends the same connect request. The connect function contains a callback that will handle messages that it receives from other users. The handling of the responses are omitted here but you can refer to the source code to find out more.

2. Text Editor

The text editor was the one that caused the most problems for me. Now, the LSEQ structure holds the string that is supposed to be represented in the code editor. Thus, the individual keyboard events have to be handled to attach an identifier to them before they can be inserted. I handled the onKeyDown to check for the type of key before adding it in the form of an identifier. However, there were a few caveats regarding how the state change works in React which I was unfamiliar with.

The biggest issue was the cursor in the textarea jumping to the end of the input on every insertion. You can read more about it in the issues here and here. My workaround to it was having a reference to the textarea and updating the position of the cursor on every render. I also did not use the value field of textarea but the defaultValue field instead. Then, I updated the value field via the reference I created.

Learning points

1. Choose the right tool for the job

I chose React solely out of familiarity and undertaking this project has just made me realise how unfamiliar I am with React. I have a lot to learn regarding some of the internals on how React handles state and variables that are not handled by state. The whole project would have been much smoother had I gone with pure javascript instead of the React framework.

I also chose gRPC just to experiment with it. While I don’t regret using it for this project, it was definitely not the right tool for the job. I chose it due to the fact that it supports streaming over HTTP/2 but unfortunately, I don’t think the integration of gRPC with browsers are that developed yet and grpc-web just uses a wrapper over websockets for the streaming. You can navigate to the demo site, open your developer console and check the network tabs. You would see something like this:

2. Write your tests

I cannot stress the importance of tests enough. Throughout the course of this project, I wrote a few tests here and there and they have been so useful in verifying that my code works as expected. Halfway through, I got lazy and stopped writing these tests. This was not a good idea and the bugs just kept piling on and on while manually testing each time took me ages.

Also, algorithms might be the easiest portion of your code to test as it’s easy to define their expected behaviors. I found a bug in my implementation of the skiplist where the skipped field was not updated correctly after a deletion. This was a major headache just to trace where the problem was even coming from and could definitely be avoided with the proper tests.

3. Start earlier

This was my first project and I really didn’t think I was going to pull it off. I read the paper on LSEQs so many times and got so confused on how I was going to implement this. I spent a lot of time thinking about various implementations but just starting on writing code helps so much. I had to refactor my implementation of the internal data structure for the LSEQ algorithm so many times. I actually started with a tree-like diagram before I implemented a skip list but going through the process of writing all this code helped me to form a better idea of what I needed to change in the next one.

I’m not saying you should dive in without formulating any sort of plan but if you’re a newbie like me, there’s only so much that you can plan for before you are just stuck on some details that you wouldn’t know due to the lack of experience. My takeaway for this point would be to not worry if your project is going to be a failure or success and just start on it.

What’s Next

For the current state of my project, I would give myself a pass for implementing the algorithm but in terms of the execution such as creating the actual application for a use case, it’s definitely a fail. The site is really slow at some points like running the file with the text editor being out of sync at times due to React’s virtual DOM.

  1. Rewrite the server with websocket
  2. Learn more about React / Just use Javascript
  3. Hot reloading with Go
  4. Test cases

With school starting in a few days, I probably won’t have enough time to consistently work on the project but if I came back to it, I would definitely have to revamp the entire thing from scratch.

End

I hope you learnt something regarding how to create your own collaborative editing tool. Thanks for reading!

References

  1. https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type ↩︎
  2. LSEQ Paper https://hal.archives-ouvertes.fr/hal-01552799/document ↩︎
  3. Skip List https://en.wikipedia.org/wiki/Skip_list ↩︎

Originally published at https://mrlobby.com.

--

--