Monorepo vs Polyrepo

My experiences dealing with these two flavors of organizing the codebase

Published in

Avenue Tech

5 min readOct 21, 2021

Since I started as a Software Engineer, working with polyrepo used to be the standard way to write code for me. If you need to create a new service, just create a new repository, configure your CI/CD pipelines, and you are done. It's simple to deal with, simple to manage, and simple to release. But, when I started at Avenue I was introduced to a new way to organize the codebase: monorepos.

Polyrepo

For me, it's the most common way to write a new piece of code. It's almost automatic to create a new repository when you need to write code. If you need to update some library, it's easier to test, it's easier to release and you feel comfortable taking risks to rewrite a piece of code. Also, it's simple to run locally, you write a Docker Compose file and the dependencies of your software will be running as you need. The keyword here is atomicity. Everything you do may only impact that service you're dealing with.

As your platform grows other repositories will be created for new microservices, internal libraries, and more. Your microservices will have workflows connecting dots between them, and for me, it's where the problem begins. Cloning, building, and running all microservices locally may be very tedious, and it's harder for new members of your team to understand what they need to do to have their local environment up and running. Another common problem is to keep the services using the latest version of your libraries and refactoring them is expensive because you need to open each repository and apply needed updates.

Let's imagine a scenario where we have a library with the contracts that your services use to communicate with each other (it's common when you use Protobuf) and a task to change a particular contract is assigned to you. You need to be aware that changing things may cause side effects in all related services. How do you test it? How do you measure which services may the affected? How do you measure the impact? Even if you can do it, eventually you'll still need to coordinate deployments.

TL;DR

It's easier to manage an individual repository, but when you have dozens of repositories it gets harder to coordinate changes between them, and the complexity to run your platform locally grows up.

Monorepo

The first time I saw a monorepo I thought: What is the difference between monorepos and a monolithic system? It's completely different. The idea behind a monorepo is to have entire your codebase in the same place, but it doesn't mean you cannot have multiple services on it. It's all about how you organize your code and how you build it. It should be like when you have a default directory where you clone all your repositories, everything is in the same directory but the repository boundary is clear and you cannot share code between them. For me, you should be able to break your monorepo in many polyrepos without refactoring thousands of lines of code (of course in real life it isn't so easy, but it's a good way to think about it). It's really, really simple to refactor code in monorepos. If you break something, you can see at the same moment what isn't building anymore, and you're able to refactor everything you need. If you add a new behavior in your logging library, your services will automatically behave as you want. It's easier to manage the codebase.

Okay, what about the drawbacks? When you organize your codebase in one repository you gonna have huge work to test and build it. You'll see the build time of your CI pipeline growing up exponentially. Small changes in a little piece of code will take minutes (hours maybe) to be tested, built, and deployed. Moreover, many engineers will work in the same repository. The team must have a good workflow (aka GitFlow) to avoid bad tested code block potential releases, or create merge/rebase problems. A good Pull Request policy may help with merging stuff, but for bugs, there isn't another answer: you need to test.

Is it the only way? Nope! Companies like Google and Uber adopted monorepos in their engineering teams, and as you can imagine they have much bigger codebases than normal companies. They use tools like Bazel and Buck to make possible test and build just what changed in your code, helping you to bring back agility to your CI pipeline. You don't need to spend time running everything every time, you test what changed. But isn’t for free, these tools are quite complex to manage and usually you need to configure them manually (there are tools to generate the copy/paste stuff, but you need to customize it sometimes). Also, your team needs to learn how to use these tools because they'll become a daily tool for testing and building tasks.

TL;DR

It makes easier to share code between projects, refactor code, and run your platform locally. But it increases the complexity of your CI pipeline because you are putting everything in the same repository.

Hybrid

You can build a hybrid style by getting the best of these two options. You can create a repository based on a team or a platform, and group services in the same context. People will still be able to share common codes between services but they don't need to worry about breaking someone else pipeline. If your team makes an effort to don't let it become a monorepo, maybe you won't have the CI drawbacks and you may not need to use tools to improve your build time. Do you gonna have a higher build time compared to a polyrepo? Probably yes, but the cost of adding a build tool versus the build time may not worth it.

Conclusion

There isn't a right choice. The team needs to figure out what fits better. Both sides have benefits and drawbacks, and the decision about which one you should pick maybe just be a matter of when. In a moment that the company is growing fast, a monorepo strategy may be better for the team. When the team gets bigger you can change to a hybrid or a polyrepo style. I think that it's important to have a great understanding of the reasons why you're taking each strategy because it'll reflect on the team productivity, DevOps, onboarding of new people, and more.

References