Repositories — One or Many

Alex McKinney

Published in

Compass True North

7 min readApr 7, 2020

The Challenges We Face in Source Code Architecture

The Never Ending Debate

Every software company faces a question early in their development: “Should we write all of our code in a single repository, or distribute it across many?” Some companies, including Adobe and Google, have written papers that outline reasons for using a monorepo. Other companies like Netflix suggest a multi-repo strategy and distribute their source code across a large number of small repositories.

The monorepo and multi-repo strategies both have tradeoffs, and one’s success is often the other’s detriment. Many of these tradeoffs are obvious, such as the time it takes to check out a repository locally, but there are a variety of subtle implications contained within other tradeoffs. This article aims to shed light on some of these less obvious tradeoffs.

A Case Study

At Compass, we’re currently debating whether or not we should break out our monorepo into a multi-repo. The Builder Tools team is leading this effort, and they have created internal tooling to bridge the gap between monorepo and multi-repo.

Compass has learned a lot during this development, and we’re now in a better position to discuss the pros and cons of our repository strategy. In the sections below, we will walk through each of Compass’s primary goals and recent realizations for each.

Goals

Code Standardization and Versioning
Accelerate build and test times
Developer velocity
Team independence
Dependency management

Code Standardization and Versioning

A monorepo makes it possible to ship atomic commits that span a wide surface area. In other words, a single commit can update code used in every one of Compass’s microservices simultaneously. This makes it easy for library teams at Compass to address backward-incompatible changes across all of Compass’s source code.

In multi-repo, library teams are forced to obey the laws of semantic versioning. Every programming language has different strategies for preventing the need to break a library’s major version, each of which require language expertise, e.g. keyword parameters with defaults in Python. If Compass platform teams are not equipped to address backward compatibility concerns, it becomes far more expensive for teams to upgrade their libraries to the versions mandated by the company.

Suppose that an important security patch necessitates an upgrade in one of Compass’s core libraries. In multi-repo, all of Compass’s individual teams are required to manually upgrade their own repository’s version so that the team’s service mitigates the vulnerability.

Now suppose that the same security patch necessitates a non-trivial backwards-incompatible change. This type of incident quickly becomes an operational nightmare.

In monorepo, a platform team can quickly and silently address the issue without disrupting other engineering teams. A simple and effective script can migrate all of the code in a single shot, removing the need for manual intervention.

Accelerate Build and Test Times

Compass engineers frequently face slow build and test time execution in its continuous integration (CircleCI) pipelines. This results in a poor developer experience and negative sentiment for the Compass platform.

In multi-repo, developers operate a smaller repository which immediately alleviates this concern. Running builds and tests become lightning fast, and individual contributors are encouraged to improve their service’s test coverage since they don’t need to worry about inhibiting the monorepo’s ability to run its tests quickly.

This result is also possible in a monorepo, however. Bazel, which is currently used by Compass, has the ability to configure local and remote caching. In other words, this means that sequential builds don’t need to rebuild and test redundant artifacts. Bazel is intelligent enough to only run tests that relate to the files changed in a single commit, which makes CI feels as nimble as it would in the multi-repo. Remote caching was recently enabled at Compass, and test times are beginning to decrease.

With that said, it’s worth mentioning that not every tool (like Bazel) is built with monorepos in mind. The Builder Tools team has had to implement and ask for workarounds in commonly used tools like CircleCI and Codecov.io because they weren’t intended to be used with monorepos. Therefore, adopting a monorepo approach often implies that a team of engineers is prepared to face these challenges.

Developer Velocity

Multi-repo makes it easy for developers to understand a service and all of its dependencies in a single view, e.g. a Github landing page for a single repository. New contributors can clone the project, understand the project’s layout, and begin working on a new feature within minutes. This can lead to more features shipped, an improved developer experience, and improved productivity.

In general, this development environment works very well for individual contributors. By nature of multi-repo, however, Compass’s libraries are hosted in other repositories, which results in a sparse dependency tree.

A sparse dependency tree can significantly slow down a team’s ability to ship features. If an engineer needs to make a change to one of Compass’s libraries to build a feature, for example, they are required to do the following.

Create a pull request on the library’s repository with the required change.
Review and merge the pull request.
Create a new library release that conforms to Semantic Versioning.
Upgrade their service’s repository to use the new version.
Create a pull request that introduces the feature they originally set out to do.
Review and merge the pull request.

In a monorepo, these steps simplify to the following.

Create a pull request that introduces the required change.
Review and merge the pull request.

Now imagine if an engineer’s feature requires changes to multiple libraries. Given the amount of time it may take for a library team to cut releases of their libraries, this process can quickly become very expensive.

Team Independence

A multi-repo lends itself to isolated, nimble development environments where teams are empowered to create tooling for their specific domain. This level of individuality and ownership is exactly what Compass needs.

However, there are negative consequences that come with a team’s ability to bootstrap their own tooling for common things like code generation and integration testing. Without the proper guard rails, teams can abuse this ability and threaten Compass’s ability to standardize and unify under a set of common coding principles.

For example, based on the time required to update their service’s dependencies mentioned above, teams will be tempted to fork a library so that they can apply a quick fix and address the issue at hand. In fact, this has already been observed in practice. This behavior lends itself to deviating from Compass’s standards and makes it increasingly difficult to unify and apply critical upgrades later on.

A monorepo is built so that it’s easy to share code between different services. This encourages developers to introduce shared libraries they use to solve common problems, rather than separate teams silently creating similar abstractions throughout the company. By defining a curated set of libraries in the monorepo, individual teams can more easily discover solutions to common problems without needing to take on the technical burden of supporting a library of their own.

It’s worth considering alternative solutions for improving team independence within the constraints of a monorepo. At Compass, there already exists the notion of an OWNER metadata file, which defines the ownership of a specific collection of sub-directories in the monorepo. This concept can be easily extended into full-fledged METADATA, which could provide features like code coverage thresholds so that contributors can define their own service’s testing guidelines and practices.

Dependency Management

Managing dependencies is a significant challenge in both monorepo and multi-repo. Both architectures have major benefits in this regard.

In multi-repo, teams are free to select each of their dependencies by hand. Developers are empowered to use familiar standard dependency management toolchains for each of the supported languages, e.g. npm and pip, and consume libraries from a centralized Artifactory. The multi-repo will address the concern of upgrading and including dependencies, but it will force teams to independently manage and resolve conflicts between their own dependencies.

In a mature monorepo, the dependency management problem disappears. A monorepo developer rarely needs to interact with any of the dependency management toolchains because their dependencies will already be available. Additionally, this enables the Builder Tools team to become gatekeepers of what dependencies and versions are consumed in Compass’s codebase. This prevents deprecated and/or unsupported libraries from ever being introduced, e.g. monkey patching libraries.

Final Thoughts

There are plenty of tradeoffs between a monorepo and multi-repo architecture. Many company-wide initiatives are clearly affected by this decision, including developer velocity, dependency management, and a team’s ability to independently operate.

It’s unclear whether or not monorepo or multi-repo is the right decision for Compass. In multi-repo, teams can operate more independently, but it’s harder to distribute common implementations and libraries. In monorepo, code is more easily standardized, but staffing a team of Bazel experts is nontrivial. What is clear is that Compass needs to consider the pros and cons of both architectures before proceeding with such an important decision.

The ongoing Builder Tools project seeks to address many of the multi-repo-specific problems described. Specifically, the team is evaluating whether or not multi-repo is a solution they are equipped to support for the long term, or if a multi-repo is better than a fully-featured Bazel monorepo.

Although Compass has not made a decision for themselves yet, this post calls out some not-so-obvious questions we should ask ourselves when deciding between a monorepo and a multi-repo. How will we manage our dependencies? What strategy better promotes developer velocity?

Regardless of the decision we make, it’s important that we take the time to consider these side effects. It’s easy for all of us to lean towards the solution we have the most experience with, but it’s increasingly important that we challenge our assumptions and ask questions related to the long term implications.