Migrating to Bazel as a build tool.

Brijesh Dhakar
Harness Engineering
5 min readAug 26, 2021

--

At Harness, our complete codebase is in a single Github repo. It comprises around 5 million lines of code with 100+ inter-dependent modules. We started observing an increase in build times every week and our existing build tool (maven) was unable to scale with our growing needs.

Challenges with existing build tool- Maven

  • Lack of incremental build support When we make a small change in one of the 100 modules, Maven limitations for incremental builds needed too often clean builds.
    As being a fast-growing organization we are including new modules at a good pace. So with the increase in code, the time taken by build was increasing rapidly.
  • Local development issues For local development, branch switching and then building the whole project again was a pain point. The developers have to wait for around 20–25 minutes for the project to get synced up.
  • Time taken by the unit-tests-job The maximum time in continuous integration is the time taken by the unit tests jobs. When a developer makes a small change, ideally it should run only dependent unit tests. But maven executes all the unit tests no matter what is the change in code.
    Example — If a developer updates the README file, there is no need to run any unit test. But developers had to wait for all the tests to pass.
    We were running our unit tests in three batches — unit-test-0, unit-test-1, unit-test-2 (We distributed our tests in 3 parts).
    With the increase in code and unit tests, the time taken by the unit-test job was increasing with time.

At this juncture, we realize that we need to look into an alternate option that can serve our future needs as well.

Why bazel?

Bazel is an open-source build tool developed by google in 2015. It is used for building and testing software. We chose Bazel as our Build system for the following properties.

  • Fast: Bazel is Fast. By analysing the dependency graph, bazel exactly knows what needs to be rebuilt. Bazel caches all the previously done work and rebuilds only what is needed. And bazel can build our projects in parallel.
    For example, we have three modules A, B, C such that A depends on B and B depends on C.
    A → B → C
    If you are making changes in module B, bazel will build only modules A and B and not C.
  • Correct / Reproducible builds: If you build the same code with the same arguments, it will always output the same builds.
  • Less Intermittent test failure: Bazel runs the tests in the sandbox. So there are very few chances of collision. Results in less possibility of intermittent test failure.

Proof of Concept (POC)

We took two modules (A, B) such that B depends on A (B->A) and tested bazel changes there.

  • We moved two modules from maven to bazel
  • Created a bucket on GCP for cache
  1. Running for the first time both the modules — We observed that both module tests ran without any cache.
  2. Making changes in module A — In this case, also, both modules were built and all tests ran.
  3. Making changes in module B — In this case, module A was cached and only the B module was built. Test of module B ran while tests of module A were cached. Time taken for the whole process was less as compared to the above two cases.
  4. No changes in any module — No module was built. Both modules A and B were cached. Tests of both modules were also cached. Time taken was minimum among all the above cases.

To test the cache, we also added a sleep statement in one of the tests of module B. Test ran without cache was close to 30 seconds and with cache, it took less than a second.
We knew that we have some modules that take a huge time to build and take a lot of time to run those tests. By having it cached, we can save a lot of time. Hence we decide to move to bazel from maven.

Migration

One of the challenges we faced while migration was we were having more than 100 modules to migrate. It was quite impossible to migrate all these modules with one go. So we decided to adopt the hybrid approach. Here hybrid approach means we are going to use maven and bazel both. Those modules converted to bazel will be built through bazel while the rest will be built through maven. We will cover this in the latter part of this series.

Here, we are going to discuss how we can migrate a simple module from maven to bazel.

  1. WORKSPACE
    We can compare this with the central pom.xml file in maven. In bazel, this file is at the root level of the project where we define our external dependencies similar to how we do in maven’s pom.xml.
Project:
Module1
|_ src/main/java
|_ src/test/java
BUILD.bazel
Module2
|_ src/main/java
|_ src/test/java
BUILD.bazel
Module3
|_ src/main/java
|_ src/test/java
BUILD.bazel
.
.
file1
file2
WORKSPACE

We use rules_jvm_external for external maven dependencies. This is an external library that fetches the maven dependencies transitively.
The pattern followed for adding dependency is:

groupId:artifactId:version

Example:

In the below image, first, we are loading rules_jvm_external then we are using its maven_install for fetching external maven dependencies.

WORKSPACE

2. BUILD.bazel
This file can be created at module level as well as file level. Since our codebase contains a lot of files, we decide to adopt module-level bazel migration i.e we created BUILD.bazel file at the module level.

java_library(
name = "module",
srcs = glob(["src/main/**/*.java"]),
deps = [
"@maven//:com_google_inject_guice",
"@maven//:org_slf4j_slf4j_api",
]
)

The build.bazel can be defined as follows

  • java_library: This is a java_rule in bazel that compiles a set of Java source files and creates a jar.
  • name: Unique name for this target. Which will be used for referring to this target while building.
  • @maven: Here maven is the name of the target we defined in WORKSPACE. The dependency is referenced by combining groupId followed by artifact id separated by _ . We need to replace each . with _ in groupId as well as in artifactId.
  • srcs: A set of Java source files which we want to include in this target and build together.
  • deps: all the dependencies of srcs are defined under deps. It can have external dependencies and other targets from the project as well.

After creating the BUILD.bazel file we can run the below command to build this target.

bazel build path_to_directory:module

here path_to_directory is the relative path from project root to Build.bazel file directory.

In this article, we discussed the motivation to use Bazel and how we started with a simple and practical approach to start migration with 2 modules. This initial success was critical for us to expand into other modules.

We will discuss subsequent learnings and challenges in part 2 of this post.

--

--