Moving from multiple repositories to a lerna-js mono-repo

Rohan Prabhu
HackerNoon.com
9 min readJan 22, 2019

--

At mitter.io, we have a couple of public-facing npm packages that we need to publish, and we recently moved to a mono-repo structure managed by Lerna from having separate repositories for each one of them. Today, I’d like to share our experience of this migration and our setup with the new monorepo structure. All of our packages are either SDK targets or are dependencies for our SDK targets:

  • @mitter-io/core - The core functionality of the mitter.io SDKs
  • @mitter-io/models - The typescript models (classes, type aliases, interfaces etc.) for the SDKs
  • @mitter-io/web - The web SDK
  • @mitter-io/react-native - The React Native SDK
  • @mitter-io/node - The node.js SDK (used for node.js backends)
  • @mitter-io/react-scl - The standard component library for ReactJS applications

All of our packages are written in TypeScript and typings are bundled with the packages itself, and we do not distribute separate typing packages (the ones you usually see starting with @types/). We use rollup to bundle these packages in both the UMD and ES5 module formats. In addition to this, we use TypeDoc to generate our documentation which is then published on a public bucket in AWS S3.

Before using Lerna, we had a separate repository for each of our packages and it worked fine while we only had the web SDK. As we progressed and had more developers working on the SDK, we started facing a few issues with our setup:

  1. Given that most of the SDK logic resides in @mitter-io/core, almost every change that occurred in the core package and all other packages had to be updated to point to the new version. So, even if there was a bug that was to be fixed for, say React Native, the change would go in core, but the update needed to now reflect in all other targets, i.e., web, node and react-native. It was quite common for a developer to miss a target.
  2. Almost every change in the SDK would result in changes across at least 3 of the 5 packages.
  3. We saw a huge benefit in keeping the same version across packages (makes it easier for developers to guess what the latest version of target would be), but manually tracking this was becoming cumbersome.
  4. npm link (or yarn link if you’d prefer) had its own set of issues with making sure all the dependencies were linked, then unlinked to use the correct one from npm and back to the local link for development.
  5. It was quite common to run scripts across packages (e.g., to publish the typescript docs), and we were using a fragile scaffold of symlinks and bash scripts to manage the same.

Around that time, we came across Lerna and it seemed to be the perfect fit for our requirements.

We decided to follow the simplest path there is, trying to use defaults as much as possible. From what we experienced, migrating to Lerna was a breeze. Start off by creating a new Lerna repo:

Answer a couple of simple questions (where we always resorted to the default) and you’re all set. Moving our old packages from their repos to the new one (which we were dreading as we thought it would be a massive pain) was way easier than expected:

NOTE The --flatten may or may not be required, but we faced issues without it.

What’s amazing about Lerna is that it brings in all of the git commits along with it (you might lose some history with --flatten), such that for the new repo, the history looks like development has always been happening in this monorepo. This is absolutely essential because you are going to need to git blame someone for a bug you discovered after moving to the monorepo.

With Lerna, we now manage a single repository for all of our packages, with a directory structure that looks like this:

To publish the changed packages, we now simply have to:

You don’t have to do lerna bootstrap every-time; only if this is the first time you are checking out the repo. What it does is simply install all the dependencies of each of the packages under this repo.

At the same time, we also decided to streamline our process a bit and added all the packaging tasks within the npm lifecycle itself. Do note that this doesn’t have anything to do with Lerna; this is something that should ideally be there in any npm package regardless of the repo structure. For each of the packages, the following scripts are present in the individual pacakge.json:

This builds the package with the typescript compiler, bundles it with rollup and generates docs with typedoc:

Having a single repo structure also allows you to keep common scripts in a single place so that changes apply across all packages (we should also move the build script to a separate script, given that it has now become quite a complex bash command).

The developer flow apart from releases is unchanged. A developer creates an issue on GitLab (or is assigned one), creates a new branch for the issue, and then merges the changes to master after a code review. The release lifecycle now follows an extremely structured process:

  1. When a milestone is completed and we are planning to make a new release, one of the developers (in charge of that particular release) creates a new version by running lerna version.
  2. Lerna provides an extremely helpful and easy to use prompt for figuring out the next version

Once a new version is selected, Lerna changes the versions of the packages, creates a tag in the remote repo, and pushes the changes to our GitLab instance. Beyond this, developers are not required to do anything else. Our CI is setup to build all tags which have a name similar to a semantic versioned number.

NOTE We run lerna version with --force-publish because we want all packages to have the exact same lineage of versions. So sometimes we’ll have packages that don’t differ between different versions. Depending on your preference, you might choose to not do it.

We use GitLab’s integrated CI for building, testing and publishing across all of our projects (JS and Java). For the new JS monorepo, we have two stages:

The build phase is extremely simple and runs the following two scripts:

This phase runs on every single commit to essentially validate the sanity of the package. The publish phase on the other hand, runs the following:

We figured out we had to do a git checkout master and a git reset --hard because GitLab clones (or fetches, depending on the configuration) the repo, and then checks out the commit that is to be built. This sets the working directory in a ‘detached HEAD’ mode, i.e., the ref HEAD isn’t pointing anywhere. Lerna uses HEAD to figure out the current version of the package and errors out in the detached head state.

We also need to run lerna publish from-package as opposed to lerna publish, as executing a simple lerna publish would have Lerna complaining that the current version is already published, as the metadata was updated when the developer ran lerna version locally. Thefrom-package argument tells Lerna to publish all versions that are not currently present in npm for a given package. This also helps if a publish failed for some reason and you’re retrying the pipeline.

The publish phase is configured to run only on tags that match the following regex credit:

This is a bit fancy, and for most teams and for most purposes, simply ^v*$ should work. :)

NOTE Although we haven’t done it yet, since we are a small team, one could also mark any tags following the above regex as protected in GitLab to restrict who can publish packages to npm.

You can check out our monorepo at https://github.com/mitterio/js-sdk (This is mirrored from our internal GitLab repo).

When running common scripts (like we do for publishing typescript docs), it is quite useful to know the particulars of the package running the script. This applies for scripts in the npm lifecycle, as well as scripts one might run using lerna run or lerna exec. For a given package in npm, npm makes the entire package.json available to a script using environment variables. So, for a given package with the following package.json:

The following variables will be available while running any lifecycle script:

Quirks/Issues

A couple of things we are still working on with the new setup (some of them are issues, while some we probably just don’t know better):

  • Not sure if it is possible, but we would like to be able to have common lifecycle scripts for all of our packages. Declaring these in the root package.json does not work.
  • It is extremely difficult to test your Lerna setup completely without actually publishing something to npm. Not sure if there is a --dry-run somewhere.
  • Lerna has a way of keeping a common config-block for devDependencies so that all of the devDependencies are of the same version for each of the subpackages. This is quite a cool feature but would take us some time to weed out all the common ones.
  • The same could apply for other dependencies as well, so while we won’t want a common dependencies config block, having a way to express variables available across the projects would be nice. For example, in our Java/Kotlin monorepo, we use gradle.properties to contain variables like springBootVersion, springCoreVersion, etc., which are then used by the individual gradle scripts.

Our thoughts on monorepos
It has been quite a heated debate recently with monorepos and whether we are seeing a huge number jumping on the bandwagon again, quite reminiscent of the time when microservices was all the rage.

The structure we follow here is having multiple monorepos, and this is not
our first time managing monorepos. Our entire platform and backend is a monorepo that contains private, deployable code and multiple public-facing packages that are published to bintray. We also have our main website running with a Spring backend, with the frontend bundled with webpack supporting hot reloading (webpack watch), etc. We never decided to go with a single mono-repo across the organisation because the tooling simply wasn’t there.

Having most of our Java code in a single repo works great because gradle provides all the tooling needed for the Java monorepo and lerna and the npm lifecycle providing the tooling for the JS SDK’s monorepo. So, simply put, monorepos are great once you identify the coverage of changes that go in your repo. For our Java backend, we saw multiple MRs across projects for a single feature, which inclined us to move to a monorepo only for this particular project, with all of our other code still in separate repos. And once we saw a similar pattern emerge for our JS SDKs as well, we moved to Lerna.

Do note that we are a small team of about 9 engineers; so what works for us might not work for teams of different sizes. What we would mostly like to point out is that the adoption of any solution does not have to be binary, wherein either we do it as prescribed or not do it at all.

Some of the motivations we saw for a monorepo definitely applied to us and a lot of them did not. For instance, we simply cannot spare the time to build the tooling if our entire codebase was moved to a single repo — regardless of the benefit we may or may not experience. So the debate really isn’t about having a “single repo” — by itself, it is nothing more than a new directory structure. The prescription of them is to alleviate certain issues and as with every “silver bullet”, there are caveats.

The debate is about common issues faced in the software industry and what solutions have commonly been taken; “common” being the keyword. The area where you deviate from the “common” application is where you get to innovate, make changes and build a little.

Originally published at medium.com on January 22, 2019.

--

--

Rohan Prabhu
HackerNoon.com

Technical co-founder at mitter.io. Work mostly on JVM/Backend, loves to dabble with ReactJS. A bit of a fanatic about static typing.