Leveraging Bazel Remote Caching and Execution For Cross-Platform Builds (Mac/Linux)
If you’ve used or worked with Bazel directly, you’ll have noticed that it’s fairly non-trivial to get benefits of remote caching when using Bazel on multiple host platforms (MacOS/Linux) even for platform-independent artifacts like JVM. Some of this is down to internal assumptions built into Bazel action graph — like encoding host-specific paths ( bazel-out/<cpu>-<compilation-mode>/
) into action inputs and others being toolchain inputs (JDK) being platform specific. Bazel encodes some of the following information into an action’s execution:
- Paths to input/output files in
bazel-out
- Command line arguments
- Input/source files for the action and any transitive inputs/dependencies, including toolchain dependencies.
- Environment variables
A cache hits means an “Action Cache” hit where the Action Key entry in a remote cache exists for a prior execution of the action. If any of the above differs, the action key will differ and result in a cache miss.
This can be a massive bottleneck in a monorepo if you have your CI system running one platform (Linux x86_64) and developers working on a different one (MacOS being very common). In an ideal scenario, your CI system builds and populates the Bazel remote cache in the CI builds (either through local execution or remote execution targeting Linux workers) and developers get to leverage it directly to have fast builds. However, the limitations above mentioned mean you can’t leverage the same remote cache for builds for your developers on your MacOS.
A Few Choices
In this context, there are typically a few solutions:
- Build remote caching infrastructure for builds from MacOS — this is possible but can prove to be prohibitively expensive as you’ll need MacOS CI/remote execution workers along with your Linux workers so that Bazel cache is “warm” when developers get to use it.
- Move development to the target platform (Linux) with Cloud Development Environments — where developers typically connect to a Linux host with their development environment setup already with bazel running in the remote environment. This works too but it introduces additional changes to the developer’s loop and requires supporting their Integrated Development Environment (IDE) experience in a remote environment. The CDE ecosystem is fairly nascent especially with regards to IDEs, which in some cases are not at parity with the local experience.
- Update your Bazel builds on your Mac to use remote execution on Linux workers leveraging the same remote cache used in CI — this requires a number of changes to the build toolchain so that the cache can be safely shared with CI builds and has a few caveats, but it works and is somewhat less documented. This is highly powerful even with its limitations as you avoid the burden of maintaining MacOS build infrastructure and need to build a remote development platform.
Using Remote Execution on MacOS targeting Linux workers
With the above context, we’ll discuss the last option in more detail in this article and how you can enable it.
Assumptions:
- You’re using Bazel in a monorepo with JVM toolchain (Kotlin/Java/Scala) with perhaps other toolchains too (Protobuf). It’s likely you have other languages too due to Bazel’s polyglot support but we keep it simple for this article.
- You’re on at least Bazel 5.0 or higher.
- Your JVM code doesn’t have native dependencies, especially those that are not compatible with Remote Execution. A common example is using python pip packages which have C extensions from
rules_python
. These packages are installed as Bazel repository rules on the host and don’t respect the toolchain resolution mechanism that’s important to choose the right platform for build actions. - You either use local execution with a Bazel Remote Cache (grpc implementation like bazel-remote) or remote execution with Linux workers in CI. Remote execution at scale in CI can be potentially quite expensive in terms of cost and operational burden for a DevOps team, so you may want to stick to local execution for CI builds while using remote execution for developer builds. Note that it is quite important to make sure the container environment for CI actions and remote execution actions builds must be same to avoid breaking caching behavior.
Choosing the right C++ toolchain
You might not be using C++ directly in your code base directly but if you’re using Protobuf it’s very likely you use the C++ toolchain indirectly as protoc
is written in C++. Further, there are several internal tools within Bazel like ijar
which are written in C++. The autoconfigured C++ toolchain with Bazel uses the host platfrom @local_config_platform//:host
which will default to Clang that comes with XCode on Mac and gcc
on Ubuntu-based distros.
We need to force builds triggered on a Mac to use a C++ toolchain for Linux. We can use the bazel-toolchains repository to generate the toolchain configuration. If you’re already using remote execution in CI already, you can skip this step as it’s already autogenerated or maintained. Otherwise, you can run this command
./rbe_configs_gen \
--bazel_version=6.2.0 \
--toolchain_container=<my-ubuntu-docker-image> \
--output_src_root=<path/to/my/repo> \
--output_config_path=tools/bazel/rbe_configs \
--exec_os=linux \
--target_os=linux \
--cpp_env_json=ubuntu2004.json \
--generate_java_configs=false
where ubuntu2004.json
can be leveraged as an example from Bazel’s continuous-integration repository. Once generated, this will generated build files in tools/bazel/rbe_configs
in your repository.
Update the platform
definition in tools/rbe_configs/config/BUILD.bazel
platform(
name = "platform",
constraint_values = [
"@platforms//os:linux",
"@platforms//cpu:x86_64",
],
)
and remove parents
attribute as it points to @local_config_platform//:host
. This is required because the constraints for the host platform on Mac and Linux are different, and as you’ll see we’ll be overriding --host_platform
anyway for builds in this configuration.
Another thing to update here is to ensure the C++ compiler is used for these builds. Make sure cc_toolchain_suite
in tools/bazel/rbe_configs/cc/BUILD.bazel
is setup like this
# This is the entry point for --crosstool_top. Toolchains are found
# by lopping off the name of --crosstool_top and searching for
# the "${CPU}" entry in the toolchains attribute.
cc_toolchain_suite(
name = "toolchain",
toolchains = {
"k8|gcc": ":cc-compiler-k8",
"k8": ":cc-compiler-k8",
"armeabi-v7a|compiler": ":cc-compiler-armeabi-v7a",
"armeabi-v7a": ":cc-compiler-armeabi-v7a",
"darwin_arm64": ":cc-compiler-k8",
"darwin_x86_64": ":cc-compiler-k8",
},
)
Notice that darwin_arm64
and darwin_x86_64
is mapped to the C++ compiler for Linux. These keys correspond to the host cpu value determined by Bazel’s autodetection mechanism. This can also be overriden alternatively with Bazel’s --host_cpu
flag.
Note: The above instructions are based on using Bazel’s legacy mechanism for C++ builds which don’t use Bazel’s regular toolchain resolution. If you have that enabled (it’s not by default), you likely have
build --incompatible_enable_cc_toolchain_resolution
and may be using a hermetic C++ toolchain from toolchains_llvm, in which case you can just avoid setting up a Crosstool configuration and rely on --extra_execution_platforms
to pass/register C++ toolchains for Linux.
Command Line Configuration (.bazelrc)
We need to turn on various flags and define a configuration that can be easily used for these builds.
# Turn on configs for linux/darwin
build --enable_platform_specific_config
build:remote --remote_executor=grpc://<remote-executor-endpoint>
build:remote --host_platform=//tools/bazel/rbe_configs/config:platform
# Use our custom C++ toolchain targeting Linux
build:remote --crosstool_top=//tools/bazel/rbe_configs/cc:toolchain
build:remote --extra_toolchains=//tools/bazel/rbe_configs/config:cc-toolchain
build:remote --extra_execution_platforms=//tools/bazel/rbe_configs/config:platform
# Removes host specific fragments in bazel-out directories
build --experimental_platform_in_output_dir
build:remote --remote_instance_name=main
build:remote --spawn_strategy=remote
# required for using ijar/singlejar to target the right platform
build:remote --define=EXECUTOR=remote
build --remote_instance_name=main
# More jobs as remote executor runs on a remote container/host
build:remote --jobs=32
# Disable writes to the remote cache for locally executed actions
build:remote --remote_upload_local_results=false
# Force linux builds to use the same host platform as exec platform for better remote caching
# This is required if you local execution in CI on Linux hosts
build:linux --host_platform=//tools/bazel/rbe_configs/config:platform
build:linux --extra_execution_platforms=//tools/bazel/rbe_configs/config:platform
# Use a hermetic remote JDK
build --java_language_version=17
build --java_runtime_version=remotejdk_17
build --tool_java_language_version=17
Some important flags above that are worth noting:
experimental_platform_in_output_dir
ensures the paths in the inputs/outputs to an action inbazel-out
is exactlyplatform-<compilation_mode>
and notbazel-out/<cpu>-<compilation_mode>
host_platform
is overriden to force it to Linux and disable autodetection.crosstool_top
overrides the C++ toolchain for these builds and avoid using the autoconfigured toolchain from the host.extra_execution_platforms
is set to the same ashost_platform
to ensurehost=exec
behavior is maintained.
The above configuration can be invoked with bazel build --config=remote ...
on your Mac which will then use the same configuration and inputs as it would forbazel build ...
running on Linux (either local execution or remote execution), meaning the action key gets to be shared. You can confirm if this is the case by running
bazel aquery //path/to/my:target --config=remote | grep ActionKey
on your Mac and then the same command on Linux without --config=remote
and you should see the same exact action keys.
Linux Remote Execution Workers
If you already use remote execution in CI targeting Linux, then you can just leverage --config=remote
for both builds in CI and Mac and skip this step. However, if you’re using local execution in CI with a remote cache --remote_cache
, then you may need to setup Linux RBE workers specifically for developer builds from Mac and use it in conjunction with the remote cache you already use in CI. There’s various implementations of the Remote Execution APIs that can be leveraged in this mode. I’ve had good success doing this with NativeLink which can bring up quite easily and proxy the remote cache to an existing GRPC server acting as Action Cache(AC)/Content Addressable Storage(CAS). NativeLink itself can be configured as a scheduler/worker and a proxy to an external AC/CAS store
Ready to go!
Now you have all the pieces in place and can just start building your JVM code on Mac with cache hits from CI builds targeting Linux. An example would look like this and you should see nearly or all being cache hits.
bazel build //path/to:target --config=remote
..
..
INFO: 1699 processes: 1670 remote cache hit, 29 internal.
INFO: Build completed successfully, 1699 total actions
Caveats
As mentioned earlier, there are a few caveats with this approach:
- Works for JVM targets without any native dependencies: Make sure your JVM code has no native dependencies. If it does, make sure the rules those targets use are compatible with remote execution. Otherwise the actions will fail. An example is using pip packages with C extensions (like numpy) as data dependencies in the transitive closure of your JVM targets.
bazel build
andbazel test
should work for all languages that are compatible with Remote execution. Butbazel run
won’t work as the binary actually gets run on the host (Mac in this case). This is still a problem with JVM binaries as the Linux JDK will be used on Mac when runningbazel run <java-binary> --config=remote
. There is a workaround for this to force it to use the MacOS JDK by running
host_java_home=$(bazel cquery @bazel_tools//tools/jdk:current_java_runtime --output starlark --starlark:expr="providers(target)['ToolchainInfo'].java_runtime.java_home" | uniq)
export JAVABIN=$(bazel info output_base)/$host_java_home/bin/java
bazel run <java-binary> --config=remote
JAVABIN
is used in the generated executable and it defaults to the target JDK (Linux), by setting it we force it to use MacOS JDK. Since the JVM bytecode is still platform independent, it should continue to work in this context.
- If you use different
--compilation_mode
in builds or user-defined transitions on any of the flags affecting the path, then actions in those builds will likely not be shared even for JVM artifacts. This is a much more challenging and broader problem space with Bazel being addressed here.