Monorepos: bp’s journey towards adopting single code repositories

bp tech blog
bp tech blog
8 min readMar 20, 2024

--

By Alex Hornby, senior principal software engineer at bp

In the quest for efficient software development, many organizations struggle with scattered codebases. At bp, a solution is emerging in moving towards single code repositories. Doing this enables us to build better products faster, while reducing the time to spot bugs in the code. It all comes down to having good signal.

Why is monorepo relevant to the energy industry?

“Our software and data engineers help keep workers connected on site, whether onshore or in the middle of oceans, and help consumers in their sustainable choices by optimising EV charging station placement. So, from my perspective, enabling software developers to do work safely and as quickly as possible is a priority. Monorepo helps us do that by giving engineers rich and rapid feedback that helps bp’s digital products meet the quality bar we expect. It also provides a means to raise the quality bar gradually, for example by requiring python type annotations on changed code.”

Fran Bell, senior vice president for digital technology at bp

Signal is everything. In software development, particularly agile, when products are being constantly updated with new changes, having good signal is particularly important. But what do we mean exactly, when we say “signal”?

Signal refers to the various types of feedback we get from testing our code to make sure it works. It can be many things, including: “Does it build? Do tests pass? Do we have test coverage? Does the code meet standards? Does it cause the UX layout to change? Does user adoption increase?”

Simply put, the more signal you receive, the faster you can spot problems and the better your code will be. To improve developer productivity, we need to get more signal, much earlier in the development process, for example by writing more tests and being able to run them across our products locally.

But a key challenge across the global tech industry today is that developers, as a rule, don’t get enough signal. Products today are often split across tens or even hundreds of code repositories, which means most developers can only test out their code to see if it is good on a 10th or a 100th of a product.

With products constantly being updated, this polyrepo effect can slow teams down to a halt and result in significant code duplication. It’s the responsibility of each team to make sure they have copied the latest template to their code repository and refresh it every time the source code changes.

Often, developers don’t pick up on the change. So, they write their code, test it within their own repository, and then send the code to the quality assurance environment to test.

This is a slow manual process that can take up to a week, and when the code is tested out in the product, it is tested together with all the other teams’ code updates. If there’s a bug that breaks the product, everyone must figure out whose code caused the problem.

And if someone submits broken code that is merged to the source code and other products then copy that broken file, it causes issues across the board, and impacts product release dates.

Monorepos are the new default for software development

However, there is a better way. It’s possible to escape from polyrepo by implementing a single code repository for each of your products, commonly known as a “monorepo”.

By doing so, code can be tested locally to see how it impacts the entire source code. It can also track which files are used to build products and prevent broken code from being mistakenly rolled out across multiple products.

Going even further, you can compose company-wide repos. This could include putting your design system code and its UX consumers in the same repo, so that you can see the UX effects on consumers directly on a monorepo PR, rather than weeks later when they try to adopt a new library version.

Since introducing monorepos at bp, we’ve seen demonstrably improved development workflows with hundreds of developers adopting the approach across our teams over the last year. Static code analysis and other practices have reduced developer errors, while testing efficiency has increased by focusing only on relevant changes. We’ve just scratched the surface, and internal measurements so far suggest potential for a substantial increase in overall test output compared to past practices, with twice as many PRs having successful build evidence and 25% more PRs having automated test evidence.

Not everyone in the industry is keen on using single code repositories though. Some criticism has come from software engineers who implemented monorepos without adopting the monorepo-aware tooling that goes with it, and their code then breaks due to the work of an unrelated team.

By using the right build system, you can make sure you only get relevant signal. If you only build relevant code for your product, you get more relevant build and test results.

But it isn’t as difficult as you might think. The two most frequently used build systems for monorepo right now are Google’s Bazel, and Buck2 by Meta. They’re both open source, and while setting it up will cost you time (between a week to a month to switch over) it’s less expensive and more effective to have a build team, or multiple, set up a monorepo than each set up their own build environment and tests.

One obvious question to ask at this point is, how do you decide what code to bring into a monorepo? A good strategy here is to combine all the interesting and relevant code into one repository. For instance, it may make sense to combine the UX and API the product calls and the service that monitors the API.

A common misconception is that Monorepo means monolithic services. There are service boundaries, so you can only depend on third party or common code, not other product code. Then you’ll want to be writing far more tests, and ideally automating those tests, so developers don’t have to worry about each change they make to the code and can instead focus on creativity and improving the products.

By using the right build system, you can make sure you only get relevant signal. If you only build relevant code for your product, you won’t have to fear an intern’s project breaking everyone else’s PRs.

Monorepos also remove silos, allowing visibility of code and documentation, which enables simpler code reuse, and unblocks issues by providing the ability to fix another teams’ code.

How do you get started?

Getting started with this means choosing a build system that supports monorepo style work. At bp we use Bazel due to its maturity, but for organizations doing python/C++/rust and a few other backend languages, or you are willing to write your own rulesets, then the more recent Buck2 is also worth evaluating. There are also specialized build tools, including nx, if you are a frontend/nodejs only shop, however in bp we have a wide range of technologies and wanted a build tool able to provide signal across all of them.

As you begin this transition, first find the areas where people mare suffering from lack of signal and use that to show how a build system with tests and linters driven from it can give developers better signal earlier. Then start bringing together related repos. At bp, we set up python type linting and improved test frameworks for the most active data pipeline repo. Based on that we made the case to adopt and then merge in other data pipeline repos.

Once you have shown value in your first use case, you can start to draw metrics from it (e.g. polyrepo’s with test coverage) to encourage others to adopt the monorepo and make a business case. A lot can be done with as little as two people to initiate the kick-off and first feedback. From there you can then make a case for a squad to keep your monorepo healthy and help onboard new teams.

As you expand you don’t have to stay with one programming language. At bp, we went with typescript/node as the next environment to help give signal in a platform using micro frontends approach that had been suffering from cross repo dependencies. It was this that informed our decision to use Bazel to enable front to back testing.

The case for the monorepo build team rests on the value developers and products get from the additional signal. Let’s say a repo is only 1 of 10 or a 100 for a product, you can wait days or weeks for it to be integrated with other code to get signal if it all works together. This is a slow and expensive way to uncover a problem. You will need to find an appropriate monetary or effort value for that for your own organization.

At bp, our aim is to be pragmatic — the aim isn’t to have only one code repository, it’s to not have any more than are necessary. Monorepos mean we can provide the signal locally on the developer’s laptop upfront. This takes error discovery from days and weeks, down to minutes and seconds, meaning we can deliver more value, much faster than before.

If you’re interested in reading further about monorepos, you can check out this website.

To read more about how bp uses technologies to create change, please subscribe to our blog.

Alex Hornby, senior principal software engineer at bp

Alex Hornby is a senior principal software engineer leading bp’s Developer Infrastructure organization, focused on increasing bp’s developer productivity and product quality. He also advises senior leadership in bp on software development best practice and has a deep background in developer tooling and distributed systems, including prior experience in big tech at Meta, and in finance at Morgan Stanley.

Alex sees recruitment and mentoring as core to the senior engineering role. He interviews a lot of engineers!

Appendix A: Developer Lifecycle and where initiatives increase velocity

The inner loop is the single developer workflow. A single developer should be able to set up and use the tools for the inner loop to edit code, build and test their changes, and debug them quickly. The inner loop includes using source control tooling, like git, to manage their changes locally, and then push to a shared repository once they are ready for others to see it.

The outer loop is what happens when we go from the developer to the developer’s team.

For developers, activities in the inner loop should be frequent and fast and iterate often. The outer loop is slower — it helps verify that their changes integrate well with others and gives a chance for code review and feedback, each outer-loop iteration costs more.

Having to push to CI* to get outer loop signal should be rare, you should be able to reproduce a test failure locally for example.

*Continuous integration (CI) is a DevOps software development practice where developers regularly publish their code changes into a central repository, after which automated builds and tests are run.

--

--

bp tech blog
bp tech blog

Welcome to bp’s tech blog, where our tech experts will share some of their most significant technical contributions to help solve the energy trilemma.