The pros and cons of poly and mono repositories
As we continue to merge two products together (which you can read about here), we are starting to come across natural product conflicts. One of these conflicts is that one product uses a mono repository and another uses poly. This post is about measuring up which will be the best for us going forward.
The decision between poly and mono repositories isn’t black and white, however much I’d like it to be. What is going to be best for us now, will not necessarily be the best for us in the future. I’m no developer, so I can only go off my own experience with the two strategies.
Here is a list of considerations I feel would affect the efficiency of engineers. I will address each of these points and my views below, and keep the points based on what is natively available for both poly and mono.
Imagine a scenario where we have 150 developers. One of them becomes disgruntled and decides to steal parts of the code base. In a mono repo environment, they would have to export a single repository.
As we are a SaaS company, our application is our ‘secret sauce’. If this was to be leaked, it could spell more competition. If our entire IAC was to get leaked, attackers would be able to view the majority of our infrastructure and its configuration.
Fragmenting our application into different repositories, with tighter access control would help solve this problem. Natively. That’s a win for poly.
Fine-grained access control and boundaries
Boundaries will inevitably have to be introduced the larger we get e.g. analytics team members should be the only ones with access to analytics code. Or as the engineering team gets larger, it would make sense to split development teams responsibilities up at a service level. e.g. one team would be responsible for Split Screen, training pages etc. while another would own Admin, Tooltips etc.
One team could have a completely different set of standards than another (as it currently is with analytics) and we wouldn’t want other teams interacting with ‘their’ codebase. Another being the need to define stricter responsibilities, such as the examples I gave above.
Defining stricter responsibilities would be counter-productive if we were going to stay the same size we currently are, but would be necessary when we get larger.
If we go with a mono repository strategy, we would have to rely on social boundaries. There is no way you can enforce hard rules on mono repository directories natively. You would have to rely on social/process driven rules, which is suboptimal. Poly repositories allow you to create and even automate fine-grained access control to your service source code, natively.
If you have many individually ran or deployed micro-services, each with their own pipeline and infrastructure, tying them to one another within a mono repository will inevitably reduce the flexibility of your working practice. Take this example: I’d like to create a new POC.
In a poly repo world, I would create a repository with access control already configured. My CI/CD would be directly attached to the project, which would be easy to configure because I have created run-books and templates for running the most common CI/CD deployments, and I’d be away.
In a mono repo world, I would first have to confirm with DevOps/Ops that the CI/CD will be able to handle my extra project. They will likely say no and have to put it on a to-do list. You will have to wait until the CI/CD is configured, then you can check out a branch and start work. That’s if the CI/CD works. If it doesn’t, the tennis match begins.
The example above is related to CI/CD, but my point is, if you tie together truly independent services with any component, it will require more effort enabling change, than it would if you embraced their individuality. That’s another win for poly.
Mirroring micro-service/deployment architecture
Above I have drawn out 2 diagrams of how infrastructure, repositories and pipelines would be logically coupled within a mono repo world and a poly repo world. Diagram 1 being mono, Diagram 2 being poly (obviously).
We all know the benefits of a microservice architecture. It makes it easier to make changes, version, blah blah blah. Therefore, it makes sense for each service to have its own pipeline, so you aren’t affected by the needs of another service deployment. So surely, the same logic could be applied to each micro-service having its own repository. One more point for poly!
Amount of PRs
When you have a single repository with many services, any time a new feature needs to be created this requires changes to a few repositories. Having your code next to one another makes it easier to create one ‘Atomic’ PR. As an engineer, I’d rather create one PR than four. Yet with poly repos, you will have to create those multiple PRs.
Whether you create one PR or multiple, you would still have to do multiple deployments if there is a pipeline per service. I’d argue whether it was logical to just raise a separate PR per service anyway, but this round goes to mono.
Spread out, code vs code in one place
Take this example: You are trying to bug bash, and realise the bug may be within a different service. Would you rather open another directory within your IDE or download another repository to make the change?
It’s faster, simpler and more efficient to do a find within one repository than it is to open the repository UI and do a global search in a single project. Mono takes this one.
Quicker to onboard
When all of your code is in one place, all you have to do to onboard a new developer is give them one lump of code. As that’s the only place they will be working, there will be no hidden repositories, doing secret things that the company has long forgot about. It’s all there plain to see. Mono is on the march.
Editing multiple services simultaneously
This relates to the ‘Amount of PRs’ section. In short, if you need to do a find and replace across the entire codebase, it’s a lot easier to do this if everything is in one place. The score is now 4–4.
Natively, with our current repository store you are able to define a pipeline per service/repository via a YAML file. Within this you can create constructs of repeatable CI/CD steps that are typical across your services e.g. deploy terraform, deploy application code, run smokes etc. This template-able nature allows engineers to create a contained CI/CD environment with ease.
If we were to go with a mono repository set up, we would most likely have to stay with in network CI/CD runner because of its flexibility. The CI/CD would have to be managed by someone, or a team, rather than that team creating constructs to enable other engineers to create their own CI/CD.
If we stayed with this tool, we would also have the operational overhead, which I don’t need to go into. Poly’s back in the lead.
I have mentioned the word ‘natively’ a lot during this article, and for good reason.
Every problem I have highlighted can be fixed with additional tooling. This is particularly true for mono repositories, any problem you come across we can just stick another tool to fix it. This is all well and good for the size we are now. But someone needs to look after that tooling and fix it when it breaks. Which it will. And if there isn’t an open-source tool out there to solve one of our problems, we would have to create our own tool.
As we grow, we can either take the direction of solving large problems with services or solving large problems with open-source or internal tooling. Open-source or internal tooling from the outset seems like the cheaper option, but if you add growth into the mix, it ends up becoming more expensive, because the people hours add up when it comes to maintaining the tooling.
We don’t have the time nor the resources to create, maintain, and document an internal tool to the standard service providers do. And if we do, that will have to reside on a team, most likely DevOps.
The more tooling we implement, the risk that a release or a piece of work getting blocked increases, with every tool implemented. The higher the risk, the more time we have to spend maintaining to make sure it doesn’t create a blocker. The more time we spend, the more engineers we need. The more engineers we have, the more it’s going to cost.
By now you have probably worked out I am a strong advocate for poly repos for our use case.
If I thought the engineering team was going to stay the size that it currently is, I would be an advocate for a mono approach. While many large organisations use mono repos, such as Facebook and Google, they have a team or more dedicated to internal tooling to solve internal problems.
We don’t have that luxury. So I believe we should adopt a ‘minimal internal tooling’ mentality, to avoid tooling hell.
Although I am an advocate for poly repos, for me this is not a debate between poly vs mono. It is a debate on what the most efficient set up for our development and release process is. After all, it is these two processes that will heavily influence our TTL for new features and bug fixes.
The perfect world can exist with both poly and mono, but we will get there quicker, cheaper and with less blockers with poly.