Why you don’t need a mono-repo but should just build from source
In 2017, I had the chance to attend the Devoxx France conference. I heard many interesting talks dedicated to the trendy topics of the moment such as containerization or micro-services architecture. One thing surprised me though: Several times “mono-repos” were presented as the new good practice to adopt, allowing to avoid issues such as having to maintain multiple versions of libraries. A typical example was the talk “Why your company should organize its codebase in a single repository?”
At Criteo we are far from applying this recommendation since our codebase is made of hundreds of repositories. But still, we stopped using versioned libraries for our internal code 6 years ago. So I was surprised that “mono-repos” were seen as the only way not to depend on versioned libraries.
Actually the term “mono-repo” just describes a possible implementation of a pattern. This pattern suggests that all the code of a company should no longer be “built from versioned artifacts”, but “built from source” — that is to say, build using the head of the master branches of all the repositories of the codebase. This allows applying the Trunk-Based Development Model to the whole codebase by sharing a virtual single trunk. This “mono” trunk can be achieved with a single repository as well as with several repositories.
The present article in its first section lists examples of the improper usage of the term “mono-repo”, when it is used as a synonym of a “mono” trunk. A second section aims at clarifying that the real need behind this buzz-word is to frequently upgrade the dependencies of source code, and thus to “build it from source”. A third section presents the main advantages and disadvantages of “building from source”. The last section focuses on the specific case of the external dependencies. Finally, in the next article, we will compare the implementations of the pattern “build from source” based on two different organizations of the code, “mono-repo” vs “poly-repo”.
NB1: Please keep in mind that the views expressed here are influenced by the codebase we host at Criteo: A high number of web applications and internal services that allow and require frequent upgrades, organized into hundreds of repositories.
NB2: A part of the present material comes from a Meetup event — The Continuous Delivery at Criteo — that took place in June 2017.
“Mono-repo”, a misnamed concept
The ability to build a whole codebase from source is widely advertised as requiring a unique repository. For example, the famous reference website on Trunk Based Development correctly presents the goal — to put “all applications/services/libraries/frameworks into one trunk” and to “force developers to commit together in that trunk, atomically” — but it does it under an entry called “mono-repos”. When promoting Continuous Integration, ThoughtWorks — well known for its Technological Radar — presents as a prerequisite to “maintain a single source repository”. On Medium, the tag “Monorepo” became really popular, especially to talk about building front-end projects from source. In the English Wikipedia, an article for Monorepo was created in 2018, listing the advantages “over individual repositories”. Symmetrically the term “poly-repo” is used to express the opposite concept of “not built from source” (e.g., the article Monorepo vs. polyrepo).
It would be fine if these terms were used metaphorically to only express that the organization of the code should simulate a mono-repo. But most of the literature uses them as a metonymy making the implementation “mono-repo” the equivalent of the concept “build from source”. Two Medium articles appeared end of 2017 to start promoting the idea of getting the same advantages with multiple repositories as with a mono-repo. They advertised it as a “meta-repo” implementation (cf Mono-repo or multi-repo, why choose one, when you can have both and Monorepo, Manyrepo, Metarepo ).
The true goal: frequently upgrading the dependencies
First, it is important to know what problem we are trying to solve by proposing to “build from source” the whole codebase.
Internal and external components
By way of illustration, let’s schematize a component (typically a library) named “YourComponent” that your team owns. This component depends on dependencies and is itself the dependency of some clients. This can be represented with the following schema:
For the sake of simplicity, the schema shows only one component of each type, but usually, there can be a dozen direct dependencies and these dependencies link transitively to another big amount of dependencies.
The external — aka third-party — dependencies and clients are owned by other companies. Whereas the internal — aka in-house — dependencies and clients are owned by other teams than yours, though in the same company.
In the simplest case, only “YourComponent” and some external dependencies — think about typical libraries like Log4j or Hibernate — might be present. As soon as the application becomes more complex, it starts to depend on internal dependencies developed by other teams of the company or to be itself a dependency for internal clients. External clients are less frequent. They exist only if you expose an API to other companies. So, depending on your business it could not be the case.
The more complex the application and the organization of the company are, the more complex will the combinations between components and their different versions be.
Upgrading a dependency
After adding a dependency to your component, you usually stick to a particular version of it for several days, sometimes months.
The main reasons why you will need to upgrade a consumed dependency are the following:
- You want to use a new feature that is only available in a recent version. This is often the case for internal dependencies that are close to the business. This is very rare on external dependencies unless they are business-oriented rather than infra-oriented.
- There is a bug on the version you use and you want to upgrade to the fixed version. This is often the case for internal dependencies, business or infra oriented. This is usually not the case for external dependencies because their behavior is more stable. Even security patches are not that frequent and don’t always concern your code.
- The team owning the dependency wants to change the API. This is usually not the case of external dependencies because they want to keep their users and don’t give them a reason to switch to other implementations. This can be the case of internal libraries if your infrastructure is changing a lot. For example, a successful company that needs to scale will regularly modify its infrastructure.
Without a doubt, the most frequent updates come from changes related to the business of the company, so they are more likely to relate to internal dependencies.
The cost of infrequent upgrades
An important thing to be aware of is that the upgrade of dependency will have a bigger impact than just the one due to the reason for the upgrade (such as a new feature or a bug fix). With version upgrades come other changes: New features, other bug fixes, and more importantly some API changes.
This is what we encountered at Criteo end of 2012. We were blocked on an internal library used by all applications that required a patch to satisfy a change in the infrastructure. You can think of it as a security fix to apply to all applications. The other changes coming with the upgraded library were so important that each application required several attempts to be successfully released. In the end, the patch needed more than a month to be deployed. Other similar issues occurred during the next months blocking the release of any new feature for months. This demonstration of the issues created by “building from versioned artifacts” pushed the management to create a dedicated team to switch to the pattern “build from source”.
Building with versioned artifacts is very comfortable in the short-term. The development cycle is faster. But the more delayed the upgrade is, the higher the final cost will be. When regression occurs during an upgrade, the analysis among numerous commits will be more complex, and other regressions could accumulate while the first one has not been fixed yet. In addition, the repetition by every team of the same work of upgrade, validation, and adaptation to new APIs, implies an underestimated waste of time for the company.
As soon as the codebase grows, there are benefits to improve how bugfixes on internal dependencies are integrated.
The extreme version of frequent upgrades is when you “build from source”, taking the head of the master branches of all your dependencies, compiling them, and linking with these most recent versions instead of older ones.
Thus we shifted from a problem to solve (updating dependencies), to a goal on dependency management (having frequent upgrades of dependencies), then to a pattern to apply (“building from source” using the last merged commits). The following table presents the pattern and its anti-pattern as well as their most known implementations.
Advantages and disadvantages of “building from source”
We came to the pattern “build from source” in order to get frequent upgrades of dependencies. So it is not a surprise that many advantages come from these frequent upgrades:
- There is only one version to maintain because everyone depends on the latest version of the code.
Remark: In case of dependency of your component that requires an urgent fix, if you think it is too risky to deploy the application with the latest version of your component, it is still possible to do a hotfix on a branched version. The branch is common to your component — the client — and to the dependency. It should then be a priority to deploy a new “trunk” version of the application.
- There is no need to manually update versions of dependencies. It is now done at each build.
- When a bug is discovered in a dependency, you don’t need to dig into a long list of commits to find the culprit one. The issues are discovered early, while they are still easy to fix.
- You are more eager to modify your dependencies as you don’t need to go through the process of a merge request, waiting for a released artifact, and finally bumping the version of the dependency in your component.
Also, the code you depend on can be modified and tested at the same time as your own code.
- It avoids the “Diamond dependency hell” problem: a component cannot depend on 2 different versions of the same dependency.
Some other benefits come from the presence of the source code of the dependencies:
- A debugging session with the dependency’s code is simplified: its source code is local and can be edited.
- It is easier for a dependency to check before merging if it breaks its clients.
- The code of your dependencies can always be rebuilt because it is continuously built/tested. You cannot depend anymore on libraries that have been built a long time ago and whose compilation/validation is now broken.
All the mentioned advantages increase the development velocity and avoid a slowdown in the development pace.
The drawbacks are of a different kind. “Building from source” implies a higher cost in term of infra and support:
- CPU and machines: More builds and tests are run, as every commit is tested and not only a few versions.
- Tooling: Usual tools (such as build scripts and code review tools) are not sufficient to manage this new way to build and maintain the codebase.
It also puts higher importance on quality on deeper dependencies:
- If the build of your component tends to be broken, has flaky tests, or is not well covered by tests, it might often break apps it belongs to.
“Build from source” can only be achieved with a reliable build and enough test coverage. As stated in Essentials of monorepo development , “tests become the blood of the whole system”.
- Components are more difficult to change as all clients must be changed right away. But this is just a short-term impact vs a mid-term one.
Finally, “building from source” implies to switch to a new development pattern: The usage of “feature flags” to progressively roll-out a change of behavior client by client. This concerns non-trivial changes for which tests are not enough.
The specific case of external dependencies
We said that upgrades are usually more frequently needed on internal dependencies, and “building from source” makes them instantaneous. As for the external dependencies, you generally can’t afford to move their code into your own codebase. The maintenance would be too high and can only be amortized in the biggest companies — such as Google where “an area of the repository is reserved for storing open source code” ( Why Google Stores Billions of Lines of Code in a Single Repository ). Consequently, they can’t directly benefit from “building from source”. They still require a manual process where a commit upgrading a version number is pushed. That does not mean we should not try to reach the same goal as for internal dependencies though: to upgrade them as frequently as possible.
Benefiting from frequent upgrades
Frequent upgrades on external dependencies provide all the advantages previously listed about “building from source”.
There are just these small differences:
- Modifications of external dependencies are easier in the case of frequent upgrades. Indeed, submitting a merge request on an old version requires to check the bug is present on the master branch and to backport the fix on all versions you depend on. When applying frequent upgrades, there are chances you depend on only the last released version, and can just propose the patch on the master branch.
- Issues related to “diamond dependencies” are not totally avoided (unless you succeed to use a unique version) but just mitigated (as you tend to use recent versions only).
Realizing frequent upgrades
Unifying the versions
Before doing more frequent upgrades, you need to ease them by unifying the versions.
Because most of the build tools do not enforce a unique version of a dependency, you usually don’t have a clear vision of which versions are used in your company. You might even not be sure which version is shipped in your product. Indeed, each build tool has its own strategy:
- Maven selects the closest in the pom graph,
- Gradle chooses the most recent,
- MSBuild will warn you that it takes the highest version from “PackageReferences”, but you won’t be notified if you still use the old “References”,
- Bazel has no strategy and forces you to declare the winning version.
The first step is to have access to some reports that help you to start working on unifying the versions. These reports would typically list the status of each external dependency: Versions used with the number of users, and available versions.
This unification already mitigates a typical issue with external dependencies: The dependency hell created by the diamond problem.
If you succeed to have only one version or at least one predominantly used version, that means that this version can be used as a default one. Developers can start using this default shared version. This will simplify upgrades as you will be able to bump the version used by all products by bumping this default version.
Versions to consider for an upgrade
The version to consider for an upgrade might not be the last one. It could be the case for a security fix, but in the case of bug fixes, you might wait for a few days or weeks to be sure that the version is stable enough and to not play the role of a beta-tester in the place of all other companies.
Ownership of external dependencies
In the same way that any internal piece of code should have an owner, it is also a good practice to have owners for the external dependencies.
Without some ownership, you can only rely on the goodwill of developers to upgrade everybody and not only their application, whereas they face strong pressure to deliver new features. Even if they are willing to help, it just does not scale as they may need to adapt dozens of projects to a new API.
It does not mean the work should be split in all teams though. From a company point of view, it is much more efficient to have a developer upgrade dozens of applications to a slightly different API than to ask dozens of teams to understand the change and apply it. Only a dedicated team can afford to plan such migrations. A good example might be NUnit that changed its API between the versions NUnit 2 and NUnit 3. The upgrade mainly required to understand how to translate the API and didn’t require a comprehension of the test scenarios. So a dedicated team could script the translations and just push merge requests to the right owners.
In conclusion, a developer who needs to upgrade an external dependency should attempt to bump the default shared version or ask for the owners to take care of this. If the external component concerns a few apps, this might be done by the developer himself; if this is a very generic component like JUnit, this should be done by a dedicated owning team.
Once versions are unified and owners found, we can put a tool in place — similar to Dependabot — that automatically proposes to upgrade the shared versions and creates reviews that can be quickly merged, maybe just after a human check of the change-log.
As we have seen, the real need is not to use a single repo, but to get rid of upgrades between components.
The pattern “build from source” allows it for internal components by making the codebase behave as a single repository. Regarding the external components, the anti-pattern “build from versioned artifacts” still applies and needs to be mitigated. We’ve spotted some ways to ease the upgrade process for these components.
In the next article, we’ll compare two possible implementations of the pattern “build from source”: Using a single repository versus using several repositories. We’ll see how this choice impacts the development experience and the costs of implementation, as in both cases, “build from source” does not come for free…
Want to be part of our journey?