Improving GitHub Flow with Slackbot

Mat Malkowski
Agoda Engineering & Design

--

Working with GitHub flow, merging pull requests (PRs) back to master branch, seems like a straightforward way of managing your repository and delivery. But it’s not as trivial as it may first appear, especially as your organization grows larger, requiring Continuous Integration (CI) checks such as tests that may take a significant time to execute. We would like to share our approach to solving some of the issues that we encountered with this workflow.

GitHub Flow, and some manual work around it

When I joined Agoda back in 2016, the front-end department had around 80 developers. We all worked on the same GitHub repository, trying to deliver as many new stories as possible. There was already a basic CI system in place. We used TeamCity to build our code, execute unit tests, mocha tests for client side code, and some integration tests as part of Selenium suite. All of those steps were reported back to GitHub, which published a status check that indicated that our pull request had passed and was safe to merge to the master branch. The entire build took around 40 minutes to execute. Once a PR received a successful status a green merge button on GitHub was clicked by developer to merge it to master (yes, every front-end developer could merge to master!).

Your PR is ready to merge!

This setup would work perfectly for a small team, when we didn’t need to merge PRs several times per day. But in our case, with almost 100 developers working on the same code base, there had to be a merge queue. The reason? You can have 2 PRs, that work perfectly fine in isolation and pass the build, but when combined they fail in some way after that merge. That’s why we introduced the merge queue at Agoda — once their pull requests was “green” and ready to merge, the developer would queue it up, so all required checks would be re-run after all PRs ahead of them in the queue had been merged. This way, we ensure, that our code was always validated against the latest master branch.

Handling the queue manual way

The implementation of the queue was quite simple: on our #fe-deployment slack channel, there would be a list of PR numbers and slack nicknames representing the current state of the queue. To enqueuer your PR you just copy the latest queue state, add your PR number with your slack name at the end, and paste it back to the same channel. You are first in the queue? Merge down from origin/master to get the latest changes, wait for the TeamCity to complete the build, and hit the merge button once it’s done! Oh, and don’t forget, message the next person in the queue!

As you can see, it wasn’t quite as continuous as advertised. There are manual steps in the process, and the entire thing can get quite messy and slow, especially if a PR build fails, and the developer chooses to re-run it instead of investigating the issue first. These manual steps, additional communication, and lack of proper error handling, left us with average of 5 PRs merged per day. That was really bad — the queue could sometimes grow to 30 PRs. We wanted to improve this, by minimizing the amount of manual intervention required from each developer, optimize the build system resources, and make the queue available for processing 24/7. So we came up with the idea of a merge queue bot.

Hello world!

Merge Queue Bot

Over the weekend, we came up with simple .net core web service that implemented processing of our queue. Apart from simple add/remove from the queue functionality, it was able to communicate both with our GitHub repo and the TeamCity build system to fetch essential info on PR statuses and the state of the TeamCity build queue and running builds.

We chose to go with a Slack bot interface. Why Slack bot? Developers were used to manually manipulating the queue in form of text commands, even before Slack was introduced at Agoda, so we wanted to keep it similar to the old way of doing things. On top of that, with only few simple commands, developing a bot is way faster than coming up with some website / dashboard. And you can access it on you mobile, so you can monitor and manage your story without the need to open your laptop!

Working with the Bot managed queue

The bot has support for 3 main commands:

mq join <your pull request number>mq leave <your pull request number>mq list

When listing the current queue, you can see some details, like the current status of running builds, if your pull requests is “green”, and time estimates on when the build will finish and you will get merged. Apart from that, it handles build failures in a smart way by no longer trying to rerun failed builds. Instead, it sends a direct message to the PR owner detailing the failure, and moves on to the next request in the queue. This way, problematic PRs create no additional wait time as they are fixed, but you won’t lose your place in the queue. Once fixed, the bot will try processing it again immediately.

The bot also better managed TeamCity resources. We started to cancel builds that are no longer up-to-date, only running the builds for the items at the head of the queue. We saved build resources and reduced the TeamCity build queue time, helping us improve the number of PRs we process each day.

Sum of per hour merges over period of single month

And let’s not forget that the bot, unlike developers, was managing the queue 24 hours, 7 days per week. Before the introduction of the bot, the most alarming thing was that developers would login remotely at 11PM to merge in their changes. After the introduction of the bot, the process of building/merging was constantly running, and as you can see on the charts above, it was merging PRs almost every hour of the day and night. You can also see that the reduction of processed PRs overnight. There are two main reasons: the merge queue shrinks overnight as backlogged PRs are merged, and those left usually contain merge conflicts, so we skip their processing.

Some final words

With the introduction of this simple tool, we improved our velocity by 200% with an average of 15 PRs merged to master each day. We also made life for our front-end developers easier as they no longer had to spend time babysitting CI — it was doing all the work for them.

GitHub Flow is a great workflow, but it comes with some limitations, especially if your build isn’t the quickest in the world. But the great thing about GitHub is number of integrations available and libraries to support those that are out there. With these integrations and a few development days, we were able to create our own wrapper framework for GitHub Flow that helped us deliver faster and more frequently — a big win for our Front-End department.

Do you like building tools and working with continuous delivery? Join our team and help us create amazing software!

--

--

Mat Malkowski
Agoda Engineering & Design

@matmalkowski Full stack developer. Lover of web frameworks, react and all things .NET. Lead Software Engineer, currently building great products @agoda