Leveraging Bazel Remote Caching and Execution For Cross-Platform Builds (Mac/Linux)

Sridhar Mocherla
8 min readMay 26, 2024

--

If you’ve used or worked with Bazel directly, you’ll have noticed that it’s fairly non-trivial to get benefits of remote caching when using Bazel on multiple host platforms (MacOS/Linux) even for platform-independent artifacts like JVM. Some of this is down to internal assumptions built into Bazel action graph — like encoding host-specific paths ( bazel-out/<cpu>-<compilation-mode>/ ) into action inputs and others being toolchain inputs (JDK) being platform specific. Bazel encodes some of the following information into an action’s execution:

  • Paths to input/output files in bazel-out
  • Command line arguments
  • Input/source files for the action and any transitive inputs/dependencies, including toolchain dependencies.
  • Environment variables

A cache hits means an “Action Cache” hit where the Action Key entry in a remote cache exists for a prior execution of the action. If any of the above differs, the action key will differ and result in a cache miss.

This can be a massive bottleneck in a monorepo if you have your CI system running one platform (Linux x86_64) and developers working on a different one (MacOS being very common). In an ideal scenario, your CI system builds and populates the Bazel remote cache in the CI builds (either through local execution or remote execution targeting Linux workers) and developers get to leverage it directly to have fast builds. However, the limitations above mentioned mean you can’t leverage the same remote cache for builds for your developers on your MacOS.

A Few Choices

In this context, there are typically a few solutions:

  • Build remote caching infrastructure for builds from MacOS — this is possible but can prove to be prohibitively expensive as you’ll need MacOS CI/remote execution workers along with your Linux workers so that Bazel cache is “warm” when developers get to use it.
  • Move development to the target platform (Linux) with Cloud Development Environments — where developers typically connect to a Linux host with their development environment setup already with bazel running in the remote environment. This works too but it introduces additional changes to the developer’s loop and requires supporting their Integrated Development Environment (IDE) experience in a remote environment. The CDE ecosystem is fairly nascent especially with regards to IDEs, which in some cases are not at parity with the local experience.
  • Update your Bazel builds on your Mac to use remote execution on Linux workers leveraging the same remote cache used in CI — this requires a number of changes to the build toolchain so that the cache can be safely shared with CI builds and has a few caveats, but it works and is somewhat less documented. This is highly powerful even with its limitations as you avoid the burden of maintaining MacOS build infrastructure and need to build a remote development platform.

Using Remote Execution on MacOS targeting Linux workers

With the above context, we’ll discuss the last option in more detail in this article and how you can enable it.

Assumptions:

  • You’re using Bazel in a monorepo with JVM toolchain (Kotlin/Java/Scala) with perhaps other toolchains too (Protobuf). It’s likely you have other languages too due to Bazel’s polyglot support but we keep it simple for this article.
  • You’re on at least Bazel 5.0 or higher.
  • Your JVM code doesn’t have native dependencies, especially those that are not compatible with Remote Execution. A common example is using python pip packages which have C extensions from rules_python . These packages are installed as Bazel repository rules on the host and don’t respect the toolchain resolution mechanism that’s important to choose the right platform for build actions.
  • You either use local execution with a Bazel Remote Cache (grpc implementation like bazel-remote) or remote execution with Linux workers in CI. Remote execution at scale in CI can be potentially quite expensive in terms of cost and operational burden for a DevOps team, so you may want to stick to local execution for CI builds while using remote execution for developer builds. Note that it is quite important to make sure the container environment for CI actions and remote execution actions builds must be same to avoid breaking caching behavior.
Mixing Bazel local execution in CI and remote execution on Macs for sharing platform-independent artifacts

Choosing the right C++ toolchain

You might not be using C++ directly in your code base directly but if you’re using Protobuf it’s very likely you use the C++ toolchain indirectly as protoc is written in C++. Further, there are several internal tools within Bazel like ijar which are written in C++. The autoconfigured C++ toolchain with Bazel uses the host platfrom @local_config_platform//:host which will default to Clang that comes with XCode on Mac and gcc on Ubuntu-based distros.

We need to force builds triggered on a Mac to use a C++ toolchain for Linux. We can use the bazel-toolchains repository to generate the toolchain configuration. If you’re already using remote execution in CI already, you can skip this step as it’s already autogenerated or maintained. Otherwise, you can run this command

./rbe_configs_gen \                                                  
--bazel_version=6.2.0 \
--toolchain_container=<my-ubuntu-docker-image> \
--output_src_root=<path/to/my/repo> \
--output_config_path=tools/bazel/rbe_configs \
--exec_os=linux \
--target_os=linux \
--cpp_env_json=ubuntu2004.json \
--generate_java_configs=false

where ubuntu2004.json can be leveraged as an example from Bazel’s continuous-integration repository. Once generated, this will generated build files in tools/bazel/rbe_configs in your repository.

Update the platform definition in tools/rbe_configs/config/BUILD.bazel

platform(
name = "platform",
constraint_values = [
"@platforms//os:linux",
"@platforms//cpu:x86_64",
],
)

and remove parents attribute as it points to @local_config_platform//:host . This is required because the constraints for the host platform on Mac and Linux are different, and as you’ll see we’ll be overriding --host_platform anyway for builds in this configuration.

Another thing to update here is to ensure the C++ compiler is used for these builds. Make sure cc_toolchain_suite in tools/bazel/rbe_configs/cc/BUILD.bazel is setup like this

# This is the entry point for --crosstool_top.  Toolchains are found
# by lopping off the name of --crosstool_top and searching for
# the "${CPU}" entry in the toolchains attribute.
cc_toolchain_suite(
name = "toolchain",
toolchains = {
"k8|gcc": ":cc-compiler-k8",
"k8": ":cc-compiler-k8",
"armeabi-v7a|compiler": ":cc-compiler-armeabi-v7a",
"armeabi-v7a": ":cc-compiler-armeabi-v7a",
"darwin_arm64": ":cc-compiler-k8",
"darwin_x86_64": ":cc-compiler-k8",
},
)

Notice that darwin_arm64 and darwin_x86_64 is mapped to the C++ compiler for Linux. These keys correspond to the host cpu value determined by Bazel’s autodetection mechanism. This can also be overriden alternatively with Bazel’s --host_cpu flag.

Note: The above instructions are based on using Bazel’s legacy mechanism for C++ builds which don’t use Bazel’s regular toolchain resolution. If you have that enabled (it’s not by default), you likely have

build --incompatible_enable_cc_toolchain_resolution

and may be using a hermetic C++ toolchain from toolchains_llvm, in which case you can just avoid setting up a Crosstool configuration and rely on --extra_execution_platforms to pass/register C++ toolchains for Linux.

Command Line Configuration (.bazelrc)

We need to turn on various flags and define a configuration that can be easily used for these builds.

# Turn on configs for linux/darwin
build --enable_platform_specific_config

build:remote --remote_executor=grpc://<remote-executor-endpoint>
build:remote --host_platform=//tools/bazel/rbe_configs/config:platform
# Use our custom C++ toolchain targeting Linux
build:remote --crosstool_top=//tools/bazel/rbe_configs/cc:toolchain
build:remote --extra_toolchains=//tools/bazel/rbe_configs/config:cc-toolchain
build:remote --extra_execution_platforms=//tools/bazel/rbe_configs/config:platform
# Removes host specific fragments in bazel-out directories
build --experimental_platform_in_output_dir
build:remote --remote_instance_name=main
build:remote --spawn_strategy=remote
# required for using ijar/singlejar to target the right platform
build:remote --define=EXECUTOR=remote
build --remote_instance_name=main
# More jobs as remote executor runs on a remote container/host
build:remote --jobs=32

# Disable writes to the remote cache for locally executed actions
build:remote --remote_upload_local_results=false
# Force linux builds to use the same host platform as exec platform for better remote caching
# This is required if you local execution in CI on Linux hosts
build:linux --host_platform=//tools/bazel/rbe_configs/config:platform
build:linux --extra_execution_platforms=//tools/bazel/rbe_configs/config:platform

# Use a hermetic remote JDK
build --java_language_version=17
build --java_runtime_version=remotejdk_17
build --tool_java_language_version=17

Some important flags above that are worth noting:

  • experimental_platform_in_output_dir ensures the paths in the inputs/outputs to an action in bazel-out is exactly platform-<compilation_mode> and not bazel-out/<cpu>-<compilation_mode>
  • host_platform is overriden to force it to Linux and disable autodetection.
  • crosstool_top overrides the C++ toolchain for these builds and avoid using the autoconfigured toolchain from the host.
  • extra_execution_platforms is set to the same as host_platform to ensure host=exec behavior is maintained.

The above configuration can be invoked with bazel build --config=remote ... on your Mac which will then use the same configuration and inputs as it would forbazel build ... running on Linux (either local execution or remote execution), meaning the action key gets to be shared. You can confirm if this is the case by running

bazel aquery //path/to/my:target --config=remote | grep ActionKey

on your Mac and then the same command on Linux without --config=remote and you should see the same exact action keys.

Linux Remote Execution Workers

If you already use remote execution in CI targeting Linux, then you can just leverage --config=remote for both builds in CI and Mac and skip this step. However, if you’re using local execution in CI with a remote cache --remote_cache , then you may need to setup Linux RBE workers specifically for developer builds from Mac and use it in conjunction with the remote cache you already use in CI. There’s various implementations of the Remote Execution APIs that can be leveraged in this mode. I’ve had good success doing this with NativeLink which can bring up quite easily and proxy the remote cache to an existing GRPC server acting as Action Cache(AC)/Content Addressable Storage(CAS). NativeLink itself can be configured as a scheduler/worker and a proxy to an external AC/CAS store

Ready to go!

Now you have all the pieces in place and can just start building your JVM code on Mac with cache hits from CI builds targeting Linux. An example would look like this and you should see nearly or all being cache hits.

bazel build //path/to:target --config=remote
..
..
INFO: 1699 processes: 1670 remote cache hit, 29 internal.
INFO: Build completed successfully, 1699 total actions

Caveats

As mentioned earlier, there are a few caveats with this approach:

  • Works for JVM targets without any native dependencies: Make sure your JVM code has no native dependencies. If it does, make sure the rules those targets use are compatible with remote execution. Otherwise the actions will fail. An example is using pip packages with C extensions (like numpy) as data dependencies in the transitive closure of your JVM targets.
  • bazel build and bazel test should work for all languages that are compatible with Remote execution. But bazel run won’t work as the binary actually gets run on the host (Mac in this case). This is still a problem with JVM binaries as the Linux JDK will be used on Mac when running bazel run <java-binary> --config=remote . There is a workaround for this to force it to use the MacOS JDK by running
host_java_home=$(bazel cquery @bazel_tools//tools/jdk:current_java_runtime --output starlark --starlark:expr="providers(target)['ToolchainInfo'].java_runtime.java_home" | uniq)
export JAVABIN=$(bazel info output_base)/$host_java_home/bin/java

bazel run <java-binary> --config=remote

JAVABIN is used in the generated executable and it defaults to the target JDK (Linux), by setting it we force it to use MacOS JDK. Since the JVM bytecode is still platform independent, it should continue to work in this context.

  • If you use different --compilation_mode in builds or user-defined transitions on any of the flags affecting the path, then actions in those builds will likely not be shared even for JVM artifacts. This is a much more challenging and broader problem space with Bazel being addressed here.

--

--