Migrating from Lerna with Yarn Workspaces to Bazel

Published in

Cognite

6 min readJan 13, 2021

At Cognite we have recently started to explore a mono repository setup across relevant products. Our application team is no exception. Such a setup provides us with a number of benefits:

It keeps the code in one place, simplifying security updates and migrations.
It reduces the technical debt by enforcing a single version dependency.
It eliminates the overhead of publishing shared code to an external registry.
It simplifies managing CI/CD pipelines, including but not limited to building, testing, and deploying.

Of course, these benefits come with their own costs, including maintenance and build system management. Finding the right tool for managing dependencies and builds is essential. In this article, we’ll talk about our migration path from Lerna with Yarn Workspaces to Bazel, which helped us speed up the CI/CD pipeline from 35 to 3 minutes on average.

Initial setup

We started with the following setup:

15 Node.js microservices.
5 packages used internally and externally.
7 teams managing microservices.

We were using Lerna for versioning and publishing packages and Yarn Workspaces for simultaneous dependency installation, cross-referencing, and dependency hoisting.

Every service was packaged into a Docker image in order to be deployed to a Kubernetes cluster. Services were using packages internally, and some of the packages were used externally from the repository.

We planned to double the amount of services and shared packages over the course of a year.

Limitations

Lerna with Yarn Workspaces is generally a great choice for a project containing a dozen packages built by a single team. But with a repository with multiple services managed by a number of teams, we quickly ran into issues with pipeline build time and project ownership.

Our CI/CD pipeline, while testing, building, and deploying services, and publishing packages, took 35 minutes to run. In order to speed up the pipeline, we tried to run builds in parallel with Jenkins, but we quickly hit limitations with horizontal scaling. We also tried to use a Docker layer caching in order to rebuild only services dependent on the change. This technique was unfortunately unreliable due to a single yarn.lock file approach enforced by Yarn Workspaces. Any time a yarn.lock file was updated (for example because of Dependabot), all the services were rebuilt, since all of them were dependent on this file.

A single yarn.lock file also makes it impossible to grant complete ownership of a service to a specific team with features like GitHub CODEOWNERS.

Welcome, Bazel

At Cognite the main build tool for mono repositories is Bazel, so we decided to give it a try in the applications team as well.

Bazel only rebuilds what is necessary. With advanced local and distributed caching, optimized dependency analysis, and parallel execution, we got fast and incremental builds. Bazel cache is more granular than Docker layers, which are only created per step. This helped us speed up the build time of most single services from 3 minutes to less than a minute.

Bazel is aware of all the dependencies in the mono repository. Any change triggers only appropriate services builds, tests, and deployments — no more wasteful executions of unchanged parts.

Bazel does not require any specific directory structure or changes to source files. It also makes it possible to introduce Bazel only to a part of the mono repository, ensuring a gradual migration path.

Bazel goes beyond JavaScript and includes multilanguage support. In such a setup, if an individual team decides to go for a language other than JavaScript, there is always room for it.

Bazel setup

Building a TypeScript service

The work starts with defining a workspace for the repository in a file called WORKSPACE. We need to define a workspace name and map the NPM Bazel workspace to the node_modules directory.

Since our services and packages are written in JavaScript, we need to load an appropriate rules_nodejs set of rules. A rule defines a series of actions that Bazel performs on inputs to produce a set of outputs. This can be done with http_archive, which downloads a Bazel repository as a compressed archive file, decompresses it, and makes its targets available for binding.

After the rules target is loaded, we can load and use the yarn_install target, which takes the package.json and yarn.lock references as input parameters and runs yarn install during workspace setup.

Now that we have configured the workspace, we can gradually introduce Bazel setup per service by creating a BUILD.bazel file in the appropriate service path.

In order to build a Node.js service written in TypeScript with Bazel, we need to use a ts_project Bazel wrapper around the TypeScript compiler. There we need to define our target name and provide a path to source files, root and output directories, TypeScript binary, tsconfig, and an optional dependency list. Bazel will trigger the build only when a dependency or a source file has been changed. Otherwise the build will be taken from the Bazel cache.

To finally build our service with Bazel, we need to execute the bazel build command, which will create a Bazel target and place it into the bazel-out directory.

We wrapped the command above into NPM scripts to hide build implementation details from developers. This way, developers can just run a familiar command.

Building a Docker image

Now that we have our service built, we may want to pack its contents into a Docker image. For that we need to load rules_docker rules into our WORKSPACE file.

Now let’s add a Bazel rule into an already existing BUILD.bazel file, provide it with a compiled service entry point, and compile_ts to the target built in the previous step.

In order to build a Docker image with Bazel, we need to execute the bazel build command, which will create a Bazel target and place it in the bazel-out directory.

To run a Docker container with our created image, we can simply run the following command.

We can use the container_push rule to push an image into a Docker registry. The rule needs a target name, format, reference to our recently built image target, and remote Docker registry credentials.

Running the following command will publish the image.

Note: We normally want to execute this command from our CI/CD pipeline.

Publishing NPM packages

While services are distributed through the Docker registry, externally used packages are distributed through NPM. In order to publish a package, we can use the pkg_npm rule, which needs the target and package names, source files to copy into the package, and a reference to the build.

We can publish a package by executing the following command.

Note: We normally want to execute this command from our CI/CD pipeline.

Writing custom Bazel rules

Unfortunately not all the rules needed for a successful pipeline execution are provided by the Bazel Node.js community. Rules for tooling such as eslint, prettier, jest, and kubeval need additional treatment. Bazel provides us with a way of defining custom rules with the rule function.

Every rule requires an implementation function that contains the actual rule logic. The function itself is not able to read or write files. Its main job is to emit actions. Actions, in turn, take a set of input files (deps here) and generate a set of output files.

Here we’re taking all the dependencies and their declarations and outputting them through Bazel Providers. Notice the transitive declarations here — Bazel is smart enough to list all the dependencies and all the dependencies of those dependencies. So if a service depends on packageA which in turn depends on packageB, we need only specify packageA for the service dependencies, and Bazel will resolve packageB automatically.

Let’s define a rule that we can use in our services. We load the jest-cli executable into _jest_cli_test and list the dependencies generated by the _jest_deps rule together with the jest.config.js and jest arguments.

Now we can define the rule as any other external rule in our BUILD.bazel file and provide the required parameters that we defined earlier.

We have taken a gradual approach to migrating the existing mono repository and introduced Bazel piece by piece, slowly but surely improving the overall build time. The entire migration process took us about six weeks from beginning to end for one developer. This process also included improving deployment infrastructure, which was outside of the Bazel scope. Overall we spent the most time preparing custom Bazel rules and ensuring that at least a single service could be migrated. Subsequent migrations were relatively fast.

While Lerna with Yarn Workspaces comes with almost zero configuration, Bazel requires a plenty of configuration and management work. This is something each team should bear in mind when deciding on whether or not to go for Bazel.

It worked out well for us. With Bazel we were eventually able to speed up our CI/CD pipeline from 35 to 3 minutes on average. While this is already a tenfold improvement, it is actually even greater. Onboarding twice the services with the previous setup would double the amount of time spent in the pipeline. Now, it introduces only a negligible overhead. With Bazel, we can now easily scale our mono repository setup with new services and packages.