Butler Merge Queue — How Strava Merges Code

For close to a year, a large portion of our repositories on GitHub have been using a fast-forward-only master branch that is guaranteed to be green (all CI tests have passed) and that is guaranteed to have only Pull Request (reviewed) code in it. Developers fire-and-forget a Slack slash command that squashes their commits, runs all tests, merges their Pull Requests, and cleans up their Git branch. This is the story of how we got there.

State of the System

At Strava we use GitHub Pull Requests to do all of our code review work. We view code reviews as a critical part of the development process both because peer-reviewed code is less buggy and easier to read, and because code reviews allow us to share knowledge and ownership of code.

The front-end application tier for our website and API is a Ruby on Rails application. It is our most active code base: in 2016 36 developers submitted 3100 Pull Requests to this repository. Our current release process involves deploying the master branch twice daily. We would eventually like to move to a continuous release cadence.

At Strava we have a continuous integration and build service named Butler (more about why we built our own CI service in another blog post). Butler monitors GitHub activity, automatically runs tests and builds, and notifies both GitHub and Slack of successes and failures.

In mid-2016 our web team enabled two new GitHub protection features on the shared master branch of our main Rails application. The first ensured that no developer could commit directly to the master branch - they had to go through a Pull Request and code review process. The second ensured that Pull Requests had to be up-to-date with master before being merged.

Ensuring that Pull Requests are up to date with master has the benefit of guaranteeing a successful build once the code is merged, and ensures that merge conflicts are resolved and reviewed on the feature branch. Both of these protections brought us closer to making continuous deployment possible and helped enforce that our development process was driven through code reviews.

Aside from the continuous deployment goal, having an always green master is still important for us:

  • We deploy the master branch twice daily. A failing test suite on master means deployments are held up.
  • A single failing test on any shared branch can easily hide additional failing tests introduced by subsequent commits, which makes fixing the entire test suite more difficult.
  • The committer responsible for a failing test on master is under unnecessary time pressure to fix code quickly, making mistakes and sloppy solutions likely.

The problem with the new protections is that they significantly impacted our code merge process because they effectively pitted developers against each other. Getting code merged to master became a race which involved rebasing changes, waiting for tests to run, and then pushing before master had been updated by someone else. Failure required a developer to restart this entire process from scratch.

At this point you may be asking yourself, “why not use GitHub’s Squash and Merge feature?” There are a few reasons:

  • this feature had not yet been released;
  • even today it does not guarantee that the rebased and squashed commit passes CI before merging; and
  • it automatically squashes all commits (more on this later).

Engineering a Solution

The Developer Productivity Engineering (DPE) team at Strava applies engineering practices to improve and automate the processes and tools used by software developers, with the aim of increasing developer productivity.

The DPE team put together a design document which outlined what an ideal solution to the merge race problem would include:

  • an easy to use Slack and Web entry-point;
  • a guarantee that master is always passing and fast-forward only;
  • a solution that required very little manual interaction; and
  • an automatic squashing of appropriate commits.

The solution we came up with is now known as the Butler Merge Queue. “Butler” because this tool is integrated directly into our CI platform and “Merge Queue” because it is a queue that processes Git merges.

Here’s how it works. Once a developer has had her Pull Request reviewed and her tests are passing (indicated via a GitHub status check and retrievable via a Slack command), she submits her Pull Request to the merge queue via a Slack slash command:

/butler merge REPO BRANCH

For instance, to merge her feature-a branch in the android repository she runs /butler merge android feature-a. The Butler Slack bot then acknowledges her request and she can continue with her other work. If at any point the merge fails, she receives a direct Slack message indicating the type of failure:

She can then use the provided link to view test failures or rebase/merge conflicts. If all goes well, the Butler Slack bot will send her a message indicating success:

To accommodate this new merge flow, our Butler service was updated to keep track of some additional Pull Request metadata. In particular we were interested in the head branch, base branch, and the SHA of the head commit on a Pull Request. This additional information made it possible to quickly find Pull Requests by their branch names and determine if a Pull Request was ready for merge.

Each (repository, branch) combination is effectively a separate merge queue. In other words, merges to our android repository will not be impacted by merges to our web repository, and merges to the master branch of our web repository will not be impacted by merges to any other branch in that same repository.

Butler Builds and the State Machine

The main unit of work in Butler is a build. Each build can be split into any number of parallel tasks which, in combination, determine a build’s success or failure. Butler provides some strong guarantees around builds. Builds will always fail (for instance, due to command failures or timeouts) or succeed (due to command success). While this might seem obvious, it is actually an interesting problem to solve in a distributed environment. This simple guarantee enables us to build additional functionality on top of the Butler build framework with confidence.

We leveraged the build guarantees to perform the individual operations required for a merge to succeed: rebase, test, and merge. Internally, Butler uses a state machine to track the progress of a merge.

The first point to note is that the rebase process can be skipped when the head branch is already up-to-date with master and does not contain automatically squashable commits (fixup! or squash!). This is a rather minor optimization but it does improve merge time when rebasing is not necessary (which is often the case in less active repositories).

The “Build Start Timeout” state is also an interesting outlier in the diagram. Because the merge queue is completely automated, we need to be sure that any failure state is handled appropriately. We don’t want developers to need to check in on their merges — they should be able to assume that they will receive a notification on success or failure.

The issue is that once a rebase has succeeded we need to wait for two pieces of information before we can proceed:

  • What is the new HEAD SHA of the Pull Request’s branch?
  • Has a build been created for the new commit?

We don’t want to end up in an infinite wait loop, so there is a timeout while we wait for these operations to complete. This ensures that any transient failures related to retrieving GitHub events or receiving GitHub web-hooks can be retried or corrected within a reasonable amount of time.

Once a build has been created, we are again in safe territory, and we can be sure that the build will either fail or succeed.

By having an explicit timeout/failure state we can alert developers when we end up in this state. In practice the “Build Start Timeout” state has been used during GitHub downtimes and when Butler itself had a bug. When that did happen we were very glad to have this failure state because we were able to immediately identify the underlying issue and fix it. Planning for your own failure is key!

More --autosquash

Earlier I mentioned that the merge queue squash operation differs from GitHub’s “Squash and Merge” functionality. Butler will not squash all commits in a Pull Request. This is by design. At Strava we believe in well crafted Git commit messages. Descriptive and informative commit messages improve the code review experience and provide important context for future code readers. There are legitimate cases in which a Pull Request may contain more than a single commit, so we wanted to account for that in our design.

Whenever a developer wants to indicate to Butler that commits should be auto-squashed, they use fixup! (or squash!) commits. Butler will then perform an --interactive --autosquash rebase on the head branch to clean up these commits and bring the branch up-to-date with the target base branch.

Onwards

The Butler Merge Queue is always improving. Our next goal is to provide interactive Slack messages, after Pull Requests are fully reviewed and tests are passing, that allow developers to use the merge queue with the click of a button — instead of having to enter a slash command.

Also, Strava is hiring! If you’re interested in working on Strava’s engineering team, you can also reach out to me directly @panthomakos.


Originally published at labs.strava.com.