How to Create a Persistent Worker for Bazel

Mike Morearty
8 min readDec 12, 2017

--

This is a detailed description of how to write a “persistent worker” for use with the Bazel build tool. I created a TypeScript persistent worker for use in ours builds at Asana, thus getting a roughly 3x speedup of our TypeScript compilation. The techniques described here can be used to create a persistent worker for any other tool; it is certainly not TypeScript-specific. I’m just my TypeScript worker, bazeltsc, as an example.

The information here is believed to be accurate as of the current version of Bazel, version 0.8.1. Getting a persistent worker is not a trivial task, so unfortunately there is no “tl;dr” section of this post.

A little Q&A

What is a persistent worker?

It’s Bazel’s way to keep a compiler (or other build-related tool) “warm,” launching it once and then feeding it consecutive build tasks. This always saves on compiler launch time, and in some cases it can save more than that — depending on how the compiler in question works, it may be possible to avoid, for example, re-parsing input files that are shared across multiple compilations.

Like most build tools, Bazel will often end up invoking a compiler multiple times. Anyone who has ever built a C/C++ application that uses a Makefile has seen output like this:

cc -c monster.c
cc -c frail.c
cc -c weird.c
cc -c args.c
cc -c tpyos.c
...

In this case, cc is being invoked multiple times.

In Bazel, if the compiler has sufficient support for the concept of staying “warm,” then you can tell Bazel how to use it in that way.

Will this work for any compiler?

No. It only works for compilers that are internally architected in such a way as to make this possible.

For example, the TypeScript compiler has a very clean internal architecture which makes it pretty easy to do this: It has an API to let you specify a set of source files and command line arguments to use, and it will reuse as much information as possible from earlier compiles, including ASTs from previous parser passes.

At the opposite extreme, I’ve never looked at the gcc source code, but I would be surprised if you could get it to work as a persistent worker.

At a minimum, a compiler would need to have a way to reset all of its state. Any compiler that has a lot of global variables is going to be difficult to turn into a worker.

Isn’t this risky? Won’t there be subtle incremental compilation bugs?

There is certainly some risk. Incremental compilation is an inherently harder problem that “clean” compilation. It really depends on how the compiler was architected.

Let’s get started

The three steps needed to create and use a persistent worker

  1. Make a compiler (or more likely, a wrapper around someone else’s compiler) that recognizes the --persistent_worker command line argument and knows what to do with it.
  2. Write Skylark rules (rules in your .bzl files) to invoke that compiler, following a few guidelines such as including execution_requirements = { "supports-workers": "1" } in the action.
  3. Modify your Bazel command line (or .bazelrc file) to tell it to launch that compiler as a persistent worker, by specifying
    --strategy=MyCompiler=worker .

Below are lengthy sections describing each of those three steps.

1. How the compiler needs to work

The compiler must work as follows:

  • It must recognize the --persistent_worker command line argument.
  • When it receives that, it must have a loop that reads a protobuf-formatted WorkRequest from stdin; does a compile; and writes the protobuf-formatted WorkResponse to stdout. (When it sees EOF on stdin, it can terminate.)
  • Your compiler should probably also be able to work as a regular one-shot tool if it is invoked without the --persistent_worker argument, because in some situations, Bazel may not use workers, depending on its command line arguments. If your tool will only be used within your own team and you have control over how it is used, this is probably not necessary; but because of step 3 above, if you will be sharing it with others, you’ll want to do this, so that it will work regardless of command line arguments.

One implementation subtlety to be aware of is that since the protobuf messages are coming in via a stream, each protobuf message is length-preceded (Bazel uses writeDelimitedTo instead of writeTo when writing the protobuf messages). The length is encoded as a protobuf varint.

2. How the Bazel actions need to be written

When a Bazel “action” (written in Skylark, Bazel’s Python variant) invokes a compiler as a persistent worker, it must follow a couple of guidelines. First let me give a very simple example of what an action looks like for a regular tool that is not a persistent worker:

ctx.actions.run(
inputs = [input_file, ...],
outputs = [output_file],
mnemonic = "MyCompiler",
executable = my_compiler,
arguments = [
"--option1", "--option2", input_file, "-o", output_file
]
)

If you want Bazel to invoke the compiler as a persistent worker, the action’s definition must work as follows:

  • The call to ctx.actions.run or ctx.actions.run_shell (or the older ctx.action) must include execution_requirements = { "supports-workers": "1" }.
  • Its arguments must also include an @filename argument. That file will contain all the arguments that are specific to a single “work request” (a single compile).

A simple example:

args_file = ctx.actions.write(
ctx.label.name + ".args", # the name of the args file
"\n".join([ # the contents of the args file
"--option1", "--option2", input_file, "-o", output_file
])
)
ctx.actions.run(
inputs = ...,
outputs = ...,
mnemonic = "MyCompiler",
executable = my_compiler,
arguments = [@args_file],
execution_requirements = { "supports-workers": "1" }
)

So, in this example, how will my_compiler actually be invoked? Exactly like this:

my_compiler --persistent_worker

That’s it. Then, the compiler will start listening on stdin. In the above example, the protobuf WorkRequest that it receives will include:

  • arguments: an array of strings: ["--option1", "--option2", "input_file", "-o", "--output_file"]
  • inputs: A list of the inputs that are available to it. Most persistent workers can ignore this.

The next time Bazel needs to run that same action (presumably with a different set of inputs and a different set of outputs), it will notice that it still has a persistent worker available; so it will not start a new one. Instead, it will just send a new WorkRequest to the existing worker.

There’s an additional point to make regarding the action’s arguments. What if your persistent worker needs some arguments to be passed to it once, when it is launched, and then each compile has a (per-compile) set of arguments, as shown above? To accommodate that, Bazel requires that @argfile always be the last argument in the arguments array. For example:

ctx.actions.run(
inputs = ...,
outputs = ...,
mnemonic = "MyCompiler",
executable = my_compiler,
arguments = ["--maxmem=4G", @args_file],
execution_requirements = { "supports-workers": "1" }
)

In that case, the compiler will be invoked with this command line:

my_compiler --maxmem=4G --persistent_worker

But the WorkRequest will be the same as before: Just the contents of @args_file.

3. Bazel command line arguments you must use

You may have noticed mnemonic = "MyCompiler" in the above action invocations. The mnemonic is a label for that specific action, thus allowing you to refer to it from other places.

What other places? Well, mainly the Bazel command line. Bazel will only launch your compiler as a persistent worker if you tell it that that’s how you want it to be launched. The main way to do that is with the --strategy command line option:

bazel --strategy=MyCompiler=worker //my:target

Notice that are two = signs. The general form of this argument is --strategy=<mnemonic>=<strategy>, where <mnemonic> is, of course, the mnemonic of a build tool, and <strategy> is one of:

  • standalone or local — a regular local build
  • sandboxed — a sandboxed local build
  • remote — a remote build
  • worker — a persistent worker!

A full discussion of strategies is well beyond the scope of this post; there is actually quite a lot of nuance in the selection of build strategies. We’ll stick to the worker-related issues here.

There is also an argument to set the default strategy for all actions:
--spawn_strategy=<strategy>. Naturally, you can specify that, and then override it for specific mnemonics.

That begs the question, should you just use --spawn_strategy=worker and be done with it? Probably not. As you learn more about strategies, you will realize that sandboxed builds are one of Bazel’s best features; but when you specify worker and an action can’t run with the worker strategy, it falls back to standalone ; there is currently no way to make it fall back to sandboxed .

Give me source code

Words are nice, but often it’s very helpful to see source code. Here are some examples you can look at.

  • rules_scala (written in Java). Specifically, GenericWorker.java is a very helpful file that could be used as a skeleton for your own Java-based worker.
  • bazeltsc (written in TypeScript; could also be used as a model for JavaScript). This is the tool that inspired this post. Persistent workers are sufficiently complex that it is hard to create a “hello world” example. But this example is pretty small. Take a look at src/bazeltsc.ts. Start at the end of the file, where you will see main() which differentiates between a --persistent_worker build; a --debug build (which is my own thing: a persistent worker that uses plain text instead of protobuf on stdin/stdout, so I can experiment with it interactively); and a “regular” build. Also look at persistentWorker(), which reads a protobuf message, invokes the compiler, and then sends the response back.

Java and Scala persistent workers

Bazel has a built-in persistent worker for Java. Since the Java rules are built directly into Bazel, there is no need to specify a command line argument telling it to use the worker strategy; it used automatically whenever you compile Java using java_binary, java_library, etc.

The standard Scala rules also support persistent workers. However, since those rules external rather than being built into Bazel, you must specify
--strategy=Scalac=worker .

Other relevant Bazel command line arguments

Bazel has some other command line arguments that can be helpful:

  • --worker_verbose: great for debugging. Will print a message every time a worker is launched or terminated. Will also display the path to a file where that worker’s stderr can be found. In fact, you have any Java code in your current Bazel project, try adding --worker_verbose to your command line now, to see the output. You should see several lines that look like this:
INFO: Created new non-sandboxed Javac worker (id 6), logging to /private/var/tmp/_bazel_mikemorearty/f2cd96704fd64e248571bccff6f9ab48/bazel-workers/worker-6-Javac.log
  • --worker_quit_after_build: Also helpful for debugging. Normally, workers are kept alive even after Bazel exits. (Their lifetime is controlled by the background Bazel daemon.) This tells Bazel to make them exit after each Bazel invocation. Not advised in production, because it will slow things down.
  • --worker_sandboxing: If enabled, workers will be executed in a sandboxed environment. This is great! Use it. (You can read up on sandboxing elsewhere.)
  • --worker_max_instances: Specify how many worker instances to create. The default is 4.

Conclusion

As you can see, creating a persistent worker for Bazel is currently fairly complex. I hope this overview helped a little.

I also recently gave a talk on this topic at Bazel Conference 2017, which was more of a high-level overview, whereas this post goes into detail. If you’re interested, the talk is here. (I’m the second speaker in that talk, which was a joint presentation by Alex Eagle and myself.)

If you’d like to learn more about what we work on at Asana or are interested in joining our team, we’d love to hear from you. Visit our jobs page to see our open positions.

--

--