Monorepo: please do!

You should choose a monorepo because the default behavior it encourages in your teams is visibility and shared responsibility, especially as teams scale. You will have to invest in tooling no matter what, but it’s always better when the default behavior is the behavior you want to see in your teams.

Why are we talking about this?

Matt Klein wrote “Monorepos: Please don’t!” — I like Matt, I think he’s very smart, and you should go read his point of view. He originally posted a poll on twitter:

My response was “I’m literally both of those people”. Rather than talk about how dope Rust is, lets dig in to why I think he’s wrong about Monorepos. For context, I’m the CTO at Chef Software. We have ~100 engineers, a code base that goes back about 11–12 years, and 4 major product segments. Some of that code lives in Polyrepos (my starting position), some of it in Monorepos (my current position.)

Before I start: every single argument I make here an be applied to any repository layout of any kind. There is no technical reason you must choose one or the other, in my opinion. You can, and will, make either layout work. I’m stoked to talk with others about this, but I’m not interested in faux-technical reasons one is superior to the other. From a technical perspective, I think it’s a wash.

Matt and I agree on the first part of this point:

Because, at scale, a monorepo must solve every problem that a polyrepo must solve, with the downside of encouraging tight coupling, and the additional herculean effort of tackling VCS scalability

You are going to solve the same problems if you choose a monorepo or a polyrepo. How do you release? Whats your approach to upgrades? Backward compatibility? Cross project dependencies? What architectural styles are acceptable? How do you manage your build and test infrastructure? The list is endless, and you will solve for them all as you grow. There is no free lunch.

I think Matt’s argument is similar to the views shared by lots of engineers (and managers) I respect. I think of it as coming from the perspective of an engineer working on a component, or a team working on a component. You hear things like:

  • The code base is unwieldy — I don’t need all this other junk.
  • It’s harder to test, because I have to test all this other junk I don’t need.
  • It’s harder to work with outside dependencies
  • I need custom virtual source control systems

Certainly, all these points are valid points. They happen in both cases — in a polyrepo, I have my junk, except in order to build… I might need all the other junk. So I “just” build tooling, which checks out the whole project. Or I build a fake monorepo with submodules. We could go around all day. But I think Matt’s argument misses the #1 reason I’ve flipped quite hard to a monorepo perspective as my own level in the organization has gotten higher:

It forces the conversation, and makes trade-offs visible

When we split the repositories up, we are de-facto creating a coordination and visibility problem. It tends to map cleanly to the way we think of teams (especially the way individual contributors think of them): we have responsibility over this component. We work in relative isolation. The boundaries are fixed on my team and the component(s) we work on.

As the architecture gets bigger, no single team can manage it anymore. Very few engineers hold the whole system in their head. Lets say you manage a shared component, A, which is used by teams B, C, and D. Team A is refactoring, adding a better API, and also changing the way the internals work. The result is the change is not backwards compatible. What advice do you give?

  • Find all the places where the old API is being used.
  • Are there cases where the new API cannot be used?
  • Can you patch and test the other components to know you won’t break them?
  • Can those teams validate your change right now?

Note that these questions are the same regardless of layout. You’re going to need to hunt down B, C, and D. You’re going to need to talk to them, figure out the timing, understand their priorities. At least, we hope you do.

In reality, nobody wants to do any of that shit. It’s way less fun than just fixing the damn API. It’s all human and messy. In a polyrepo, you can just make your change, get it reviewed by others who are focused on that component (likely not B, C, or D), and move on. Team B, C, and D can all just safely stay on their current version for now. They’ll move when they realize your brilliance!

In a monorepo, the onus shifts by default. Team A changes their component, and immediately breaks B, C, and D if they aren’t careful. This causes B, C, and D to show up at A’s door, wondering why team A broke the build. This teaches A that they cannot skip my list above. They have to talk about what they’re going to do. Can B, C, and D move? What if B and C can, but D was tightly coupled to a side effect of the behavior of the old algorithm?

We then have to talk about how we get out:

  1. Support multiple internal APIs, with the old algorithm deprecated, until D can get off it.
  2. Support multiple released versions, one with the old interface, one with the new.
  3. Delay releasing the improvement in A until B, C, and D can all accept it at the same time.

Lets say we choose 1, multiple APIs. In this case, we have two live code paths. The old path and the new one. Pretty manageable in some situations. We put the old behavior back in, deprecate it, and agree on a schedule for its removal with team D. Essentially identical in a poly or mono repo.

For multiple released versions, we need to have a fork. We now have two components — A1 and A2. Team B and C are on A2, and D is on A1. We need to have each component be a releasable head — because security updates and other bug fixes may be required before D can move. In a polyrepo, we can hide this away in a long lived branch, which feels good. In a monorepo, we force the code into a new module altogether. Team D has to still take a change — to the “old” component. Everyone can see the cost we pay here — we have twice as much code now, and any bug fixes that apply to A1 and A2 must be applied in both cases. With the branch approach in a polyrepo, this is hidden away behind cherry picking. We intellectualize the cost as being lower, because it’s not literally duplicated. In practical terms, the cost is the same: you will build, release, and maintain two mostly identical code bases until such time as you can retire one. The difference is, with a monorepo, this pain is direct and up front. It sucks more, and that’s a good thing.

Finally, we get to 3. Delay the release. It may be that the changes A made are quality of life for team A. Important, but not urgent. Can we just delay? In a polyrepo, we push this to artifact pinning. Sure, team D, we say. Just stay on the old version until you catch up! This sets up a game of chicken. Team A continues to work on their component, ignoring that team D is getting ever more stale (that’s team D’s problem, those stupid heads.) Meanwhile, team D just talks shit about team A’s cavalier attitude to code stability, if they talk about it at all. Months pass. Finally team D decides to take a look at moving, but the changes to A have only gotten bigger. Team A barely remembers when or how they broke D. The upgrade is more painful, and will take longer. Which pushes it further down the priority stack. Until the day we have a security issue in A, when we are forced to fork. Team A has to go backward in time, find the moment where D was stable, fix the issue there, and make it releasable forward in time. This is the de-facto choice people make, and it is by far the worst one. It feels good at the time, both for team A and D — because we get to ignore each other.

In a monorepo, 3 really isn’t an option. You’re forced to deal with the situation in one of the two sustainable ways. You’re forced to see the upfront cost of having two releasable heads. You teach yourself to be defensive about back-compat breaking upgrades. But most importantly: you cannot avoid having the hard conversation.

In my experience, as the teams get big enough to no longer be able to hold the whole system in their head this is the most important thing in the world. You must raise the visibility of friction in the system. You have to work actively to force teams to look up from their component, and see the perspectives of other teams and consumers.

Yes, you can make tooling that tries to force the issue in polyrepos. But my experience with teaching continuous delivery and automation to huge enterprises tells me this: you want the default behavior, without tooling, to be the behaviors you want to see in the world. The default behavior of a polyrepo is isolation — that’s the whole point. The default behavior of a monorepo is shared responsibility and visibility — that’s the whole point. In both cases, I’m going to build tooling to sand off the rough edges. As a leader, I’ll pick the monorepo every time: because tools must reinforce the culture I want, and culture comes from the tiny decisions and behaviors of a team every day.