Compass True North
Published in

Compass True North

From Gerrit to Github

Transitioning the code review tool of a monorepo

Photo by Patrick Hendry on Unsplash

This is part of a blog series on managing monorepos. If you haven’t already, check out https://medium.com/compass-true-north/repositories-one-or-many-f9da590611af. More to come in the next few weeks!

At Compass, we have over 350 software engineers collaborating on a single monolithic repository (see our post on multi-repo vs monorepo) that is home to hundreds of our backend services.

We were using Gerrit as a code review tool for this monorepo but decided to switch over to Github. This is the story of that transition.

Why did we do this?

The benefits of Github that fueled the transition include:

  1. Many engineers have a preference for the Github UI over Gerrit’s UI
  2. Most engineers are already familiar with Github (which decreases onboarding time)
  3. Using Github allows us to offload in-house maintenance burden (we had to manage the Gerrit instance ourselves)
  4. Github integrates with many different platforms out of the box (e.g. CircleCI, Codevo.io) and has an easy-to-use APIs for extensibility and customization (e.g. Github Checks) which help us move towards our CI/CD goals

While weighing the pros and cons of making the transition, the biggest drawback was losing Gerrit’s tools that are tailored for a trunk-based development pattern. However, we determined that there were more benefits to switching to Github, the integrations were important for reaching our organization-wide CI/CD goals, and our engineers could still follow a trunk-based workflow with Github.

How did we achieve this?

It took a small, dedicated team of engineers several months to make the transition happen. The team completed some smaller tasks such as documentation, re-pointing developer environments, and automation, in addition to several large changes that needed to be made:

a. Outlining a recommended Pull Request workflow

The biggest hesitation in making the transition from Gerrit to Github was the new code review paradigm.

Gerrit required all changes for a code review to be squashed into a single commit, which results in a succinct, clean git history and inherently encourages trunk-based development because of the ease of breaking up work into separate, stacked, dependent code reviews.

Github’s pull requests, on the other hand, encourage branch-based development by nature of them being based on branches and naturally only allowing new commits to be added to the branch in order to make adjustments to an open pull request (using the ` — force` flag is necessary for squashing or rebasing).

Our ideal CI/CD pipeline involves a trunk-based workflow with small, frequent changes to a single master branch. Gerrit more naturally aligned with this, and so we investigated if we could augment Github to make the developer workflow more like Gerrit’s. Ultimately, most ideas (automatically pointing dependent pull requests, CLI/scripts for git rebasing) would require large efforts for minimal returns and would antagonize the Github tooling. So we simply encouraged trunk-based development via documentation, “Squash & Merge”, and a git alias for `git push — force-with-lease`.

b. Requiring approval from owners

Many different teams share the monorepo, so in Gerrit, we had checks on code reviews that prevented them from being merged unless they received approval from an owner of the changed code. Gerrit used OWNERS metadata files within the repo’s directories that specified who the owners of that directory were and thus must approve the change.

We needed to decide if we wanted to continue using the OWNERS files we had created for use with Gerrit, or if we wanted to start using Github CODEOWNERS, which provides similar behavior but is implemented with a single CODEOWNERS file within the repo.

Ultimately, we decided to keep the OWNERS files, so that we didn’t have to recreate the ownership mapping in a single file. Also, having a single CODEOWNERS file wouldn’t work well in a monorepo because the file would’ve been over 1,000 lines long and all 300+ developers would be modifying it.

However, we needed to rewrite the logic for referencing the OWNERS files to determine who must review a pull request. We implemented the logic, leveraging our in-house Github Check framework (read more about it in this blog post) and reusing code from a similar Github Check used in the Compass frontend monorepo.

c. Transferring people and teams

While there wasn’t any work required to make every developer a member of our Github organization, we needed to make sure that we had Github Teams for every existing Gerrit Group in order to make the ownership approval check work.

To create Github Teams that reflected the Gerrit Groups, we wrote scripts to hit the Gerrit APIs to find all the team definitions, including the people in the groups, and then create the corresponding Github Teams and add the appropriate people.

The hard part was creating a mapping of developer emails to their Github usernames. To establish this mapping, we wrote a script to hit Gerrit to get all unique emails and then queried for Github accounts within our Github organization that had names that matched the emails. An issue we ran into was that not every developer had names that matched (or names set at all for that matter) so we also sent out a survey asking developers to share their Github usernames and emails and added these to our mapping. This gave us mappings for most developers, only leaving a few stragglers, whom we decided to add to their corresponding Github Teams manually on an ad-hoc basis once we made the switchover.

d. Moving CI scripts from Jenkins to CircleCI

Switching to CircleCI was a major motivation for us when making the transition from Gerrit to Github. Github integrates seamlessly with CircleCI, which offers a lot of simple and powerful CI tooling that would free us up from having to manage our own Jenkins instances.

There were minor adjustments that needed to be made to scripts and code in order to make CI run on a Docker container in CircleCI rather than a Jenkins EC2 instance. These included:

  1. Adding a non-root user in the container to run the tests (because of an embedded Postgres library being used)
  2. Environment variables (codified through the build tool we use, Bazel)
  3. Manual thread limiting (because containers in CircleCI don’t accurately report the number of processes available)

The bulk of the work for transitioning to CircleCI was paying down technical debt. The monorepo is shared by hundreds of services, making running tests a non-trivial resource issue. We needed to make the tests faster and less resource-intensive because of the cost model of CircleCI. We could no longer sweep costs under the rug by just upping the EC2 instance to meet the resource demands of the tests. Besides that, we wanted to increase developer velocity by making the tests faster.

We employed various techniques for optimizing these tests (parallelization, refactoring, code removal), but a major contributor was using Bazel’s Remote Caching. Bazel is a wonderful build tool designed with monorepos in mind. By explicitly defining dependencies for compiling and testing, Bazel caches build steps so that you can reuse them between different build and test runs. If nothing has changed in the code, Bazel has logic to determine this and decide what needs to be run and tested again. Jobs that used the remote cache were cut from average build times of ~45 minutes to ~5 minutes!

Making a hard cutover

It would not have been technically feasible to make a soft transition between the code review tools because it would have involved combining two disparate git histories, which carries high-risks for bugs, complexity and the work to implement it. As a result, we chose a Sunday night as our launch date to make a hard cutover.

Launch night involved disabling the Gerrit instance, running some scripts to repoint Gerrit URLs to Github URLs and manually adjusting some Jenkins configurations. As expected, there were some unexpected bugs with the transition, but nothing we weren’t able to deal with that night. The hard cutover was a success: the first day with the new code review workflow went smoothly, with only a few minor issues requiring intervention to fix and unblock developers.

What we learned

  • Monorepos require attention. Many tools (including CircleCI) aren’t naturally built to handle their massive resource requirements. That’s why there are tools like Bazel.
  • Bazel’s remote caching capabilities are essential for a monorepo (cut the average build times from ~45 minutes to ~5 minutes!).
  • Having CircleCI is a good forcing function for keeping us honest about the resource requirements of our CI automation.
  • The customizability of Github via Github Checks opens a world of possibilities and is an awesome feature of Github. We were able to have our ownership requirements tailored exactly to our needs.
  • Strict ownership requirements of our monorepo were more complex than anticipated upfront. We realized that more time should’ve been spent defining the behavior and planning its implementation.
  • We should have a source of truth for developer data within our organization

Looking forward

Now that we’ve offloaded some of our maintenance burdens, we can focus more time on adding features for developers and the Github checks/integrations that make it easy to extend and customize your workflow. Some of the ideas we plan on implementing include:

  • More Github Checks to enforce better practices (ensure that team sizes are reasonable, ensure that pull request sizes are reasonable…the possibilities are endless!)
  • Automating deploys with CircleCI to get us closer to a fully automated CI/CD pipeline.
  • Integrating with Codecov.io to record code coverage and optionally add in blocking for pull requests that bring code coverage below a defined threshold.
  • Setting up the infrastructure so that teams can more easily modify the CircleCI configuration and add their custom workflows and tests.
  • Adopting Bazel for all languages in the monorepo to garner the benefits of the remote cache

Conclusion

When we made the switchover from Gerrit to Github, pull request check runtimes decreased by 29%, costs to manage the CI automation decreased by about 52% by switching from Jenkins EC2 instances to CircleCI, and the number of master failures decreased 10%. While admittedly these numbers are the result of a myriad of factors, the tooling that Github enabled us to include most certainly played a part.

We’ve been pleased with the results of moving to Github. Despite lacking some of the trunk-based development features that Gerrit had, the Github workflow has worked just fine with our CI/CD pipeline and many developers are happy with the familiar code review environment. Above all, the Github Checks and CircleCI integration have opened the possibilities for enhancements we can make to the code review workflow and made the transition worthwhile.

This is part of a blog series on managing monorepos. If you haven’t already, check out https://medium.com/compass-true-north/repositories-one-or-many-f9da590611af. More to come in the next few weeks!

Compass Engineering & Product Blog — An inside glimpse at our technology and tools, brought to you by the engineers of the game-changing real estate platform, Compass. Hiring at https://www.compass.com/careers/

Recommended from Medium

AMA with Chainlink

StrictMode in Android

Why I Decided To Study Software Engineering.

The unexplored trails

forest and wonderful graphics

Docker virtualization container Laravel LAMP working smoothly as expected — grab your SSH keys to…

KESS KTAG V2 ECM Master Version Online Titanium Programmer

Last Week — 24–26

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nathaniel Morihara

Nathaniel Morihara

Software Engineer @ Compass. nathanielmorihara.com. nathanielmorihara@gmail.com

More from Medium

API’s Security 101 (2022 edition): Part 1

Starting the journey with scalability, horizontally scaling with load balancing

Don’t forget to make the best logger you can.

An overview of message brokers