The Case for Monorepos
I have built earlier projects around GitHub organizations. Antwar, my Webpack and React based static site generator, is a good example. The problem is that juggling between multiple repositories within a single organizations tends to become boring after a while, especially when you make bigger updates.
Since writing this post initially, I have moved Antwar to a monorepo as well.
I have been experimenting with an alternative approach known as a monorepo lately. Instead of having a git repository for each package, you’ll maintain them within a single one.
This is what I’ve done at Reactabular, a table component of mine for React. I maintained it originally as a single npm package, but as the package grew, I realized I should split it up. In addition to improving it architecturally, this move made it easier to consume individual parts of it. In theory someone could build a “distribution” on top of the smaller parts, port it to some other environment (Angular, etc.), and so on.
To give you a better idea of how a monorepo worked out in my case, consider the structure of my repository below:
- ./dist — Standalone builds of the project core package. Given not all people use npm for a reason or another, they can consume the dist versions.
- ./docs — Documentation of the project generated using Catalog. Even though the tool works reasonably well, I’ll likely port over to Antwar once it gains more functionality.
- ./images — Images related to the project. Just a couple of logos now.
- ./packages — npm packages of the project. This is where most of the code lies.
- ./packages/reactabular — The core package encapsulating most important functionality to allow consuming Reactabular through a single package.
- ./packages/reactabular-<package> — Individual package. There are a lot of these and often they don’t depend on each other. I have a separate package for utility functions that are useful across packages, though, and version it separately.
On a package level I have the following structure:
- ./packages/reactabular-<package>/CHANGELOG.md — Important changes made to a package listed per version.
- ./packages/reactabular-<package>/LICENSE — Package license. I tend to default to MIT these days.
- ./packages/reactabular-<package>/README.md — Package README showing the basic API and usage instructions. I refer to these files at the documentation and render them through it.
- ./packages/reactabular-<package>/package.json — npm package definition for a package. There’s some duplication across packages. I have set it up so that it generates an ES5 compatible version for distributing over npm.
- ./packages/reactabular-<package>/src — The source code (ES6) of a package.
- ./packages/reactabular-<package>/tests — The unit tests of a package.
There’s nothing special on the package level structure. If I had a repository for each, this is the structure I would end up with apart from webpack configuration which I would need to maintain per project. In this case I need to set up webpack only once so that’s a win.
I have set up Babel to compile each package using npm’s prepublish hook. I feel this could be pushed to higher level (whole project root), but I’m still experimenting.
Managing with Lerna
Just having this structure doesn’t do a lot of good. Sure, you can manage it a little easier now and you could write integration level tests without any problems, but what about cutting new releases? Multiple packages might need an update at once and you would need to keep possible dependencies between them in check.
This is where tools such as Lerna come in. Well known projects such as Babel and Jest rely on it. So it has been tested and proven in scale. It makes your life easier by solving the management problem and allowing you to publish (lerna publish) them easily. It can also bootstrap (lerna bootstrap) your projects so that they have dependencies installed when you start developing on a monorepo project.
I opted for the default versioning schema so that my packages run in sync with each other (1.x.x, 2.x.x). It’s possible to allow each package to be versioned separately so they can be on different series, but I find that confusing myself.
The problem with a synced scheme is that it might lead to version inflation if you follow SemVer and like to do a lot of breaking changes. So before you hit that magical 1.0, make sure your API is stable.
A good compromise could be to develop experimental packages outside of the monorepo (0.x series) and then pull them to the monorepo when they are stable enough. This way you have room to maneuver and you can experiment in peace without having to bump the series version as you break things.
Pros and Cons
Managing a big project like Reactabular using multiple repositories (one for each package) would be close to nightmarish. Coordinating releases would be difficult and painstaking. Monorepo approach takes away some of that pain, especially when there’s tooling to support it. It also allows you to write integration tests across multiple packages to support the project.
That said, monorepos aren’t without their problems. In addition to the issue I highlighted, collaborating in a commercial environment could become tricky. What if want to keep your source closed, but share a specific package with a contractor? I don’t have a good answer for this.
I can also imagine version history can become somewhat interesting as time goes by. The repository could grow big especially if you make the mistake of storing a lot of binary data there.
The approach incurs extra complexity on your project as now you have a new tool to learn.
Jonathan Werner did a take on the topic. We considered using monorepo at webpack-validator, but ended up skipping it there. Jonathan discusses on his post why.
Mrm, a Possible Compromise
It is possible to have the cake and eat it too. Mrm is a tool that allows you to get most of the benefits of monorepos while having a separate repository for each of your projects.
Mrm allows you to maintain project configuration in one place while propagating it to the projects as a dependency. When you upgrade the centralized project configuration, you also describe the change as a migration. Then, when you upgrade the dependency in the dependent projects, Mrm is able to patch the project configuration to this new state.
Although this might sound complex, mrm solves a fundamental issue. After moving to mrm, you will find that you have one worry less as you can manage project metadata and setup (dotfiles etc.) in one place while specializing it based on the need.
So far I have been pretty happy with Lerna apart from the aforementioned issue. It pushes your project towards a certain structure, but if you don’t mind, it can work.
If you want a monorepo, Lerna isn’t the only option. André Staltz ended up writing bash scripts of his own for example.
In case you feel your project could start to grow and branch into multiple packages, I can see definite advantages in the monorepo approach. Given it’s still quite new, there are rough edges in tooling as we are still figuring these things out. Still, if you are feeling adventurous, it’s definitely worth a go.