Opportunities and pitfalls of running a serverless messenger app with Git
These days work is all about efficient collaboration. With many of us working from home or in dispersed teams, messaging is our most used means of communication. For us as developers, the challenge is to communicate around the code we write without too many disruptions. We also want to make our discussions as relevant and connected to the essence of our work as possible: the messaging has to connect to our code.
As a software engineer, as you create an app, that creation process sparks discussions about its code — and this is a good thing. But all of us coders have different preferences and practices, and we also find different tools and methods that help us make our work easier and more efficient. However, despite these differences, there are a couple of tools that almost all of us install on our machines as soon as we start coding, regardless of our area of expertise.
One of those tools is Git. If you want to maintain your sanity while coding as part of a team, you inevitably need some version control system — and Git is the one everyone uses these days.
To quote the header from git-scm.com, “Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.” That sounds a lot like a database to me. Indeed, a database familiar to every developer in the world, and already installed on their machines.
So why not use it?
CodeTale as a case study
CodeTale is a code documentation/messenger app currently being developed by my team at VirtusLab. Long story short, we feed the tool with archived pull requests from a code repository and make it possible to view and continue the discussions directly in the coders’ IDE of choice (which can currently be either IntelliJ or VS Code). Instead of a code comment like, “don’t remove this or everything will break,” you can just rely on a conversation that’s associated with that line. And because it doesn’t clutter the code, discussions here can be more detailed and in-depth, e.g., explaining the reason for the issue.
Instead of using a more traditional solution, like a client-server or P2P architecture, we are running with Git. Our backend is written in Scala, so we decided to rely on an existing Java library for interacting with Git: JGit.
All of the data relevant to CodeTale — including but not limited to discussions and notifications — is stored as JSON files in a Git repository. All the user needs to do while setting up is create a repository on GitHub, GitLab, or Bitbucket. (They can also use an existing code repository, although having a dedicated branch is advisable, assuming that anybody will ever care to read the commit log.)
As a result of using Git, we don’t have to deal with the bottlenecks of peer to peer networking, either. Git helps with maintaining data consistency (although we have to put some extra effort into avoiding conflicts), and network security is mostly covered by the Git repository providers we support (although we still have to implement authentication with each of them separately).
Although the functionalities of Git lend to our architecture quite nicely, it obviously wasn’t designed with a messenger app in mind, which made my team have to hack around a few of its limitations. Let me walk you through a couple of those.
To see anything, you have to pull
As an example of how CodeTale works, when a developer logs in and starts diving into the code, they see a new comment in a source file they have to modify as part of their current ticket. They use CodeTale to reply in the same discussion, asking the comment’s author about some detail. When they do this, CodeTale makes a
git push operation, and the new comment lands in the appropriate JSON file in the remote Git repository.
Everything is smooth up so far, but now we’d like the other user to become aware of the reply and read it. But Git’s main purpose is to store data; it doesn’t offer any sort of push notifications out of the box. If you want to see its newest contents you have to initiate a
git pull operation — and this is where the slope becomes a tiny bit steeper.
git pull is not a particularly light operation. Whenever you call it as you code, there’s a noticeable lag before it concludes. To understand this, let’s do a quick breakdown of what it does under the hood.
git pull command is actually an alias for calling two smaller, atomic Git operations in quick succession: a
git fetch followed by a
git merge FETCH_HEAD. The
git fetch command downloads the newest changes from the remote, while
git merge FETCH_HEAD merges (integrates) these changes into the current branch. What makes git pull an expensive operation is the fetching component of it. To get the data we really need — for example, a reply in a discussion the user took part in — we need to download all the changes that took place since we performed our last pull. In some cases, this can be a big data set.
In other words, we can only afford to pull so often. In a situation like this, how can you make your app responsive? Well, you have to hack around a bit to know when it makes sense to pull. We still
git pull the whole repository every few minutes to be up to date, but anything beyond that requires a good reason.
One workaround we rely on is fetching single files. It’s expensive to
git pull the whole repository, but downloading just one file is cheap. But you can’t just pull a single file with Git, so we rely on the version control provider instead. GitHub (as well as other providers supported by CodeTale) has a REST endpoint serving just what we need — the contents of a single file.
This helps us cut our costs: By looking at a file containing notifications for the current user, we know we should pull the newest conversations that are intended for them. Therefore, there’s a high probability that these conversations are relevant, so they should be able to see them now, and not in the next pull cycle.
(A limit of this architecture is that you have to keep checking for notifications, whereas in a client-server architecture you can just rely on the server notifying the client. But this is the only way to operate at this time.)
Resolving Git conflicts is nobody’s favourite pastime
We all know the main issue about collaborating with Git: conflicts. Whenever you code in your own branch and change something in a file touched by a teammate, there’s a high risk one of you will run into a conflict. Git is smart enough to automatically merge only if you modify different source files. Otherwise you have to go through the tedious process of manually finding the best way to merge the changes together. Hopefully the code is well covered with tests to keep you sane.
The main issue is that performing a
git pull in the case of a conflict automatically starts merging (as mentioned above, it calls
git merge FETCH_HEAD under the hood). The immediate consequences when run from the command line look something like this:
$ git merge FETCH_HEAD
CONFLICT (content): Merge conflict in build.sbt
Automatic merge failed; fix conflicts and then commit the result.
Git uses conflict resolution markers to indicate where the conflict is so you can resolve it, which in turn means that the contents of the file at fault —
build.sbt, in this case — are now an utter mess:
scalaVersion := “2.11.12”
scalaVersion := “2.12.12”
As CodeTale treats Git as its single source of truth when it comes to message data, letting the local files enter such a state is unacceptable. We don’t want our users to have to resolve a conflict created by an untimely reply in the same source file. In other words, we have to prevent conflicts from happening at all, no matter what it takes.
The solution we use in CodeTale is crude, but functional. Here are the steps for adding a comment:
- Perform a
- Try to save the changes, add them to Git, commit, and push.
- If all went well, that’s it. However, in case of a failure, assume that a conflict happened, then revert the changes by doing a
git reset --hardfrom the remote.
- In the failure case, increase the retry counter and go back to Step 1 if there are any retries remaining.
As you can see, the key is the Git hard reset (
git reset --hard). It overwrites any differences the local repository has with the origin, effectively getting rid of the source of the conflict. Then it prepares the data to attempt to save the changes from the beginning. This means that the operation may have to retry a couple of times before it succeeds (or even fails, if the traffic in a source file is peaking and there are no retries left), but we’re still getting rid of the main issue.
Of course, the user may still break their own comments repository by committing changes manually — outside of our app — but that’s beyond our control.
As it’s clear for all to see, relying on Git in CodeTale’s architecture doesn’t come without a few problems of its own, but we definitely saved ourselves a big chunk of overhead that’s usually associated with a more traditional architecture of a messenger application. The effort necessary to deal with conflicts and the high cost of pulling the newest comments rule it out as a solution for high traffic live-chatting, but that has never been CodeTale’s purpose. Instead, its niche is discussing the code inside a developer’s IDE, similar to the way a code review is done on a pull request — and for that, Git fits right in.
For more information on how CodeTale works on the client side of things — i.e., in your IDE — check out Rafał Mucha’s article on the matter. And to try CodeTale itself, stay tuned to VirtusLab’s future announcements.