Mono- or multi-repositories for enterprise development?

Andrew James
Credera Engineering
10 min readDec 7, 2021
Photo by Lukas from Pexels

Enterprises such as Google and Microsoft have proven the use of mono-repositories at scale, and with the open sourcing of supporting technologies, they are now very much a viable option for many types of development. However, the decision is still not straight forward, and there are a number of factors that need to be considered - especially for enterprise development.

It is important to note that a single, cross-organisation mono-repository isn’t always the right answer. The decisions that you take may lead to you a spectrum of choices, including several — or many — repositories.

This article is designed to help you evaluate whether your own organisation is likely to be able to easily realise the benefits of mono-repositories or whether a more significant effort will be required. Furthermore, it is tailored to the particular concerns of large enterprises rather than smaller organisations for which some of these considerations may be less applicable.

One way to approach this assessment is to start with a baseline hypothesis based upon your organisation’s current repository structure. Next, you can evaluate the considerations that may drive a decision to either split or combine repositories.

Mono-repository considerations

Organisational and team size and structure

The way that your organisation is structured and its development teams are composed is a critical factor in your choice of approach. Start-ups and smaller enterprises are more likely to be able to achieve the benefits of mono-repositories due to their increased agility, smaller team sizes, and existing code bases. They also have the ability to rapidly introduce new technologies with fewer governance or legacy integration requirements.

Larger enterprises should approach a journey towards mono-repositories in increments — usually grouping repositories by domain, team, or organisational division. If choosing to combine repositories by team, those teams should be domain-based to avoid later organisational changes that can later impact the repository layout.

Development, CI & CD Tooling

To be successful, the introduction of mono-repositories requires teams to learn about and integrate many new tools and technologies into their CI/CD processes. These include:

  • Mono-repo oriented CI tooling such as Bazel, Lerna, and Gradle composite builds, which can produce efficient, repeatable builds for multiple languages;
  • New plugins for integration with IDEs and code editors, such as Bazel and Intellij. Often these integrations are less mature than those for standard tooling like Gradle or Maven;
  • Repository management tooling such as ToMono, which merges multiple Git repositories (and their histories) together;
  • Tools which optimise performance and support for large Git repositories such as Git LFS, Git VFS, Git Sparse Checkout(newly enhanced with Sparse Indexes), and Scalar;

It should also be expected that significant customisation of and integration with existing enterprise products is required. Most still expect many smaller repositories and will not be efficient or performant with larger repositories. Typically, this work needs to be carried out by an experienced and multi-skilled team of CI engineers before development starts in earnest, and maintained through the lifecycle of the project with an appropriately allocated budget. Examples of common activities include:

  • Merging and re-structuring the contents of your existing repositories to support mono-repository CI tooling;
  • Enabling remote build and dependency caching with your chosen CI tool (for example, Bazel or GitHub Actions);
  • Optimising checkout, build, and testing times by not having to check out full repositories and perform clean builds on every commit;
  • Supporting partial or subtree repository processing for build, security, static analysis, testing, and other tools that normally expect to operate on whole repositories;
  • Ensuring that your CI process is able to produce reproducible builds for efficiency and security;
  • Integrating automated approvals of pull requests using tools such as Tide to avoid overwhelming developers with manual reviews;
  • Integrating additional tools such as Trufflehog to detect embedded credentials or secure keys before they are persisted into repositories and distributed to all developers;

As development within mono-repositories scales, you should continue to monitor the developer experience to identify any further constraints or pain points with tooling that may require mitigation.

In-house or third party development

Given the additional skills required to adopt mono-repositories, it may be advantageous to build these in-house using permanent engineers (or a hybrid team) rather than outsourcing all development to a third party. At a minimum, clear knowledge transfer activities are required and core members of the development and CI/CD engineering teams should be experienced employees.

Current state of development

Deciding to introduce a mono-repository is something that should typically be done prior to starting development on a new project. Whilst it is possible to migrate to mono-repositories mid-way through development, the time, cost and disruption to existing teams and code on larger projects may not be considered a worthwhile investment of effort.

Development languages and frameworks

Mono-repository tooling is still rapidly developing, particularly in the Continuous Integration space. You should be aware that your choice of tooling may constrain your choice of development languages to those integrations that are supported and mature. You may also find yourself on the needing to fix bugs upstream.

If you find yourself requiring the use of multiple incompatible build tools within a mono-repository, this may be a driver to consider splitting them. On the other hand, homogenous development with standard languages and frameworks (e.g. web development) may well be good candidates for mono-repositories.

Granular Permissions and Visibility

Distributed source code management systems such as Git will by default provide a full copy of the repository to every developer and permit every developer to modify any part of the repository. Whilst this is often acceptable in small repositories, you must pre-emptively plan for situations in mono-repositories where:

  • Developers should only be able to update the portions and branches of the repository that they own or are permitted to contribute to; and
  • Where sensitive configuration or code may need to be committed to a repository (e.g. for confidential or highly-secure projects) which should not be visible to all developers.

In the first case, source code hosting products such as GitHub have introduced new features that can provide de facto sub-repository permissions for developers such as Code Owners and Protected Branches, largely helping to solve this problem.

Unfortunately, there’s still no ideal solution for the second case (assuming that you’re using a distributed SCM tool such as Git, rather than client-server like Perforce). Partial mitigations and workarounds are currently possible using submodules or git-crypt, but these are far from elegant. Companies such as Google and Microsoft that have invested heavily in SCM are able to use customised solutions such as virtual file systems with inbuilt permissions, but these are unfortunately not yet widely available and may place restrictions on development machines and environments.

If you have a requirement for confidential development within a mono-repository and are not able to use one of the mitigations above, this may be a reason to split repositories.

Developer devices and DLP Risks

Unlike with open source software development, most enterprises have requirements to prevent the external loss or exposure of their source code. This has always been an issue with distributed SCM tools such as Git, where every developer has a full repository copy and history. However, the risk increases substantially with the additional data stored in mono-repositories.

The most common mitigation used by enterprises is to require that all engineers use corporate managed laptops or desktops with appropriate security policies, prohibiting the use of BYOD policies or third party devices. Whilst effective, this can also be very costly and still poses the risk of lost devices.

More recently, two additional approaches have also become more popular;

  • The use of developer desktop-as-a-service offerings to ensure that all development is kept remote; and
  • The use of products like Code Server and Coder to remotely host IDEs such as VS Code in containerised environments.

With the introduction of mono-repositories, consideration should be given to the adoption of one of these two new approaches. In addition to mitigating DLP risks, these may also help to optimise costs and improve performance by co-locating remote build and dependency caches and development machines. If using SaaS services to host source code, the possibility of transferring source code between personal and organisational accounts should also be considered.

Repository size

Repository size is clearly one of the most important factors when considering the use of mono-repositories. It’s important to consider not only the current size but also forecast future growth over time. Smaller repositories in the tens-to-low-hundreds of MB may not require specific developer tooling, but as multi-GB mono-repositories become more common, this has a significant impact on CI/CD technologies and the need to build custom integrations with the wider ecosystem. This helps to optimise performance and efficiency characteristics.

Third party dependencies

As mono-repositories grow in scope, it often becomes necessary to implement an approach for third party dependency vendoring to avoid rapid increases in repository size. This is especially important when the dependencies are in binary form, large in size, or regularly change.

A common strategy is to use Git LFS or an artifact repository such as Artifactory or Nexus, but these may not cleanly integrate with mono-repository CI tooling or support reproducible builds. If these do not meet your requirements, a third approach is to vendor the source code of all dependencies directly into the repository. This provides the most flexibility, but at the cost of significant maintenance effort and limits the use of non-open-source products.

Future strategy

Determining the future strategy for the repository is a key decision point before the choice to migrate to a mono-repository is made. If parts of the repository are to be outsourced or made visible to third parties, this may drive separation in the future. Whilst tools exist to extract and re-factor mono-repositories, this may have a significant cost.

Development approach and tooling

Mono-repositories are best suited to certain approaches to development, primarily those that are trunk-based. This may then drive the need to invest in additional related capabilities such as feature flags, which enable the separation of behaviour and code at deployment time. Mono-repositories may also introduce counter-intuitive constraints that engineers do not expect, such as supporting only one version of a dependency.

The choice of different development approaches may make the use of mono-repositories considerably more challenging or reduce their benefits. Further, it is important that all teams within the mono-repository use the same approach so that integrations with CI/CD and issue management tooling such as JIRA are kept as simple as possible.

Security and compliance

As with other tools and processes, security and compliance approaches for DevSecOps will need to be integrated with the mono-repository. Solutions for identity management, traceability, permissions, and authorisation are required. Where the tools expect a single repository, such as for code scanning, customisations may be necessary.

Change Management

In enterprises where it’s necessary to integrate with dedicated change management tooling or products for deployment approvals, mono-repositories may introduce additional complexity. Implementors should carefully consider how to simplify these integrations and, where possible, empower teams to approve their own deployments within an error budget, such as Google’s SRE approach.

Environment Management

Most enterprises are now adopting infrastructure-as-code principles using tools such as Terraform to manage their environments. With the adoption of a mono-repository, you should consider whether to embed environment specifics into the repository (such as a directory per environment) or use the mono-repository as a template with placeholders which other repositories use as a template.

Mono-repository Assessment Approach

Enterprises considering the use of a mono-repository should adopt a structured approach to the decision to ensure that it can be justified and aligns with the organisational strategy.

1. Determine the scope and requirements of the potential mono-repository:

— Development languages and frameworks;
— Contributing teams and organisations;
— Business domains;
— Development approach;
— Existing repositories;
— Any confidential development;
— Expected size;
— Future strategy;
— Complex permissioning requirements;
— Vendored dependencies.

2. Determine the integrations that are necessary with existing tools and processes within the enterprise:

— Continuous integration & deployment
— Issue management;
— Security and compliance;
— Documentation.

3. Determine the technologies that will enable the mono-repository and which meet the requirements:

— CI/CD products (such as Bazel);
— Source Code Hosting (such as GitHub, GitLab, BitBucket, …);
— IDE plugins and integrations;
— Utilities to combine existing repositories.

4. Develop a MVP proof of concept that demonstrates building, documenting, and deploying an example of each development language and framework.

5. Build and execute a plan for the integration of the PoC with existing enterprise tools and processes.

6. Train engineers on the use of the mono-repository and new technologies introduced as part of it.

Summary recommendations

  • Build a common strategy for repositories at the beginning of the programme and manage them using GitOps processes;
  • Do not unnecessarily proliferate repositories within your organisation, using a minimal number and combining repositories together which:
  • Share the same development languages, frameworks and dependencies;
  • Are part of the same product or where it makes sense to version them together;
  • Likewise, consider splitting repositories where:
  • They cross many development domains or organisational boundaries (like frontend and backend development);
  • Use different development approaches;
  • Can’t be built together with the same tools or approaches; or
  • If you choose to adopt mono-repositories:
  • Invest in an up-front PoC to assess feasibility;
  • Ensure that you have a working CI/CD approach before starting development;
  • Recognise that this is still a rapidly maturing area of development and that existing processes (like upgrading common dependencies) will need new approaches;
  • Dedicate ongoing effort and budget to maintaining and enhancing capabilities.

In conclusion

Depending on your specific use case, mono-repositories can provide significant benefits for new development with a rapidly growing ecosystem and support from the wider open source community. However, they aren’t suitable in all cases and require significant enterprise commitment investment and alignment to deploy, maintain, and develop within. Careful assessment is necessary to determine whether your organisation is ready for adoption and can realise the expected benefits.

Looking for a career change? Credera is currently hiring! View our open positions and apply here.

Got a question?

Please get in touch to speak to a member of our team.

--

--