Running tools under Bazel

This article is intended for frontend toolchain developers who need to host some tooling under Bazel. If you’re an end-user of a Bazel toolchain, this might be interesting, but your time might be better spent reading something else :)

I’ve been pairing with the NativeScript team this week to build a proof-of-concept of building an Angular app as a native Android app under Bazel, so we got a chance to run various Java and NodeJS based tools along with the existing Angular and Android build rules in Bazel. We encountered the same issues I’ve had in the past, so I’m writing up some quick notes here.

For the most part, you can think of Bazel as just taking whatever commands you would have run manually, like

tsc -p tsconfig.json
sassc my_styles.scss
ngc -p my_angular_app.tsconfig.json
rollup -c rollup_config.js

and running whichever of those commands are needed to bring the output tree up-to-date with respect to the input tree. By running bazel with the --subcommands flag you can see that Bazel is just shelling out to the tools in this way. That means that any sequence of tools can be adapted into a Bazel toolchain in theory.

However, when you look more closely, there are some requirements for a tool to be well-behaved under Bazel. These requirements come from the fact that we want our builds to scale with massively parallel remote execution farms, and that means Bazel needs to be able to exactly control the environment where the tool runs. This mostly comes down to correct path handling. Also Bazel doesn’t mix build outputs into your source tree — there is a distinct output folder in a temp dir that Bazel creates.

  • The tool may require inputs to be provided via stdin or outputs printed to stdout — this isn’t supported under Bazel where inputs and outputs must be regular files.
  • The tool may assume the location of inputs and outputs relative to the working directory. It’s better for the tool to accept arguments like inputs.manifest or outputDir
  • The tool may assume it can look around the disk at everything in the input tree (or across all your dependencies). Bazel requires that you explicitly list the inputs so that the tool can be run remotely. However you don’t want thousands of inputs to every process that Bazel runs, this is too slow.
  • Bazel will run the tool in a working directory which is at the root of the workspace. In some cases you may need to chdir explicitly before the tool runs.
  • Because Bazel has a distinct output folder, if your tool assumes it can take the path of some .js file and find the .json file next to it, but the .js file is an output from a .ts input, then this will break.
  • Bazel uses symlinks on Mac and Windows to set up a “sandbox” that mimics some aspects of remote execution, and nodejs by default will traverse the symlinks and hop outside the sandbox. This lets you read inputs you shouldn’t, and can affect path logic. The flag --preserve-symlinks ought to always be set — but if your tool invokes node itself you may need to do that.
  • A NodeJS program that uses __dirname will often break because the of the .js files being in a different location. process.cwd() can be a better way to get an absolute path when needed. (Example in NativeScript tooling: https://github.com/bazelbuild/examples/compare/bazelbuild:e0ace80...alexeagle:7618247#diff-a9121f7c044932ee1523bf4261a3d035R8 )
  • Once you’ve made your program run under Bazel, it may need lots of extra command line arguments to be explicit about its execution environment. This can bump into command line length limits. Putting the arguments into a “params file” is a great solution, since it also lets you inspect that params file when debugging. tsconfig.json is a great example of a params file, and we actually extend it to stuff extra information so that we never need more command-line arguments.

I’ve taken a couple different approaches to making tools work under Bazel. First, if you control the sources of the tool, you can just change the program to behave well under Bazel: add parameters that let you specify paths to the inputs and outputs and so on. The second approach is needed if you don’t control the tool: you can wrap it with a sh_binary and adapt the Bazel environment to look like the environment expected by the tool. An example for NativeScript: https://github.com/bazelbuild/examples/compare/bazelbuild:e0ace80...alexeagle:7618247#diff-15ccfce54a5624d53f8b91b042e1b253 
However beware: for this to work under Windows, you’ll either have to write this again as a .cmd script, or require users who depend (even transitively) on your tooling to install Bash.

There are also some “best practices” for a tool to be well-behaved:

  • When the tool runs successfully, it ought not to print output. Idiomatic Bazel logs are not spammy.
  • The tool should produce identical output for given inputs (deterministic) — otherwise it will cause cache misses for later build steps since their input hash will be tainted.

I might come back to update these notes in the future as I remember more tips. Happy coding!