Bazel Hermetic Toolchain and Tooling Migration

Tinder
Tinder Tech Blog
Published in
13 min readJun 13, 2022

Authors: Maxwell Elliott, Connor Wybranowski

Taking on a build system migration is a lengthy endeavor. As our company doubled in size, our existing iOS build stack left a lot to be desired. Moreover, our team was tasked with maintaining systems that we inherited and did not entirely understand, making support for a growing team of iOS engineers extremely difficult and unsustainable. Our rapidly growing codebase was also significantly increasing build times, which in turn was making system maintenance difficult. We knew that we needed a technology that would allow us to solve these challenges while increasing scale, and Bazel was a natural choice. Here’s how we tackled the problem and successfully developed a defensible foundation as a team, while also creating open source libraries during the migration.

Before

  • Developer experience and build stack was unowned, support was provided by a handful of motivated domain experts
  • Constant stream of inbound questions, clarifications, issues with our tools and build stack
  • Local developer issues were viewed as a problem with that developer’s machine
    — “Works On My Machine” issues with key parts of our build toolchain, usually taking a team of engineers to resolve
  • Toolchain stack
    — Shell scripts (largely understood only by the original author)
    — Ruby scripts (only a handful of engineers know Ruby)
    — Python
    — Java
    — Kotlin
    — Swift scripts
    — Homebrew
    — Cocoapods
  • Using Cocoapods as our dependency provider
    — All dynamic frameworks; hesitancy to add more modules due to growth in app launch time and app size
  • Committing all XcodeProject / Workspace and Cocoapods artifacts to the repo
  • Could not simply clone the primary repository and start developing (had to learn how to bootstrap via documentation)
  • Monolithic tooling entry points:
    — tinder_pod_install (lengthy Ruby script; no testing and no ownership)
    — tinder setup (a mix of Shell logic, Ruby logic, and a vended binary in the repo)
  • Committing all internal tooling binaries to our repository
  • Xcodebuild used in CI/CD and locally
  • All Jenkins CD jobs contained out of band configuration from the Tinder iOS codebase
  • No test coverage or validation possible over our build and release stack
Code stats as of 12/29/2020
Module stats
App size and launch time
Build performance stats

After

  • Build stack is owned end-to-end with full support by a dedicated team
  • Relative inbound volume and velocity has decreased despite significant growth in responsibilities of this dedicated team
  • Local developer issues are viewed as a symptom of a larger problem / an opportunity to automate or improve existing systems
  • Toolchain stack
    — Shell scripts (code generated, gitignored)
    — All third-party binaries and libraries provided by Bazel Hermetic Toolchain
    — Most first-party binaries provided by Bazel Hermetic Toolchain
    — One committed first-party binary to serve as pre-Bazel entry point
    — All tooling is written in Swift
  • Using Bazel as our dependency provider (still use Cocoapods as a version constraint solver)
  • All underlying build system artifacts / configurations are code generated and therefore gitignored
  • Developed full DSL to describe all targets and apps, developers use this DSL exclusively; this DSL performs codegen of the corresponding Bazel configuration
  • Developers can now simply clone the primary repository, execute a single entry point binary to install Bazel and other binaries, and start developing
  • Internal tooling binaries are provided by Bazel Hermetic Toolchain
  • Bazel used in CI/CD and locally
  • Jenkins logic has been converted to Swift and is run via our internal tooling
  • Full test coverage over our build and release stack
Code stats as of 3/25/22
Module stats
App size and launch time
Build performance stats

Phases

Hermetic Toolchain

“Hermetic” is defined as “complete and airtight.” In our case, it describes a system in which all tasks are deterministic regardless of the user’s environment (pre-post build, build, codegen, static analysis). Prior to the Bazel effort, significant developer resources were tied up in troubleshooting systems that we did not fully understand. Our previous toolchain system (Homebrew) had very soft guarantees around versioning, which resulted in a brittle local developer experience. We frequently experienced issues that we could not replicate ourselves, since the environment was non-hermetic. On top of this, we often observed that the “fix” was unique to a particular developer’s machine, and thus never involved a paper trail; there was no infrastructure change that could be made to prevent this going forward. As the team continued to grow, the cost of supporting this class of issues was becoming non-trivial and ultimately prevented us from addressing other existential issues in our development experience. Finally, after triaging a multi-hour, overnight outage with a team of several engineers due to the legacy toolchain’s versioning system, we decided to develop a hermetic toolchain for our developers and move away from this class of issues.

We knew that all external toolchain executables needed to be versioned, dynamically fetched (as needed, with caching support), and run in a hermetic sandbox that could not be influenced by a developer’s environment. Bazel and Buck stood out as candidates for this functionality. In the end, we chose Bazel for the strong and active community; the Bazel Slack channel was a huge factor in the success of this effort.

After adopting Bazel and providing all external binary dependencies via remote archive rules, we were able to remove our external usage of Homebrew. This migration was largely done in place and had no impact on developers whatsoever. Given the success of this approach and the ease of migration, we then decided to apply these learnings to our internal toolchain to supercharge the team’s impact with this Bazel foundation.

Internal Tools

Prior to this effort, our internal toolchain was largely composed of SPM-built binary executables that were committed directly to the consuming repository. Additionally, the source code for these tools lived alongside the source code for the primary Tinder iOS application, thus inheriting its CI performance characteristics. It was at this moment our team needed to define the values we stood for in building these tools:

  • We created a new repository that was both pragmatic and aspirational; it provided an extremely fast development environment isolated from the performance consequences of the Tinder iOS repository, with the added benefit of establishing a new threshold for code quality, CI performance, and overall developer experience
    — This point bears repeating: we did not simply strive to introduce something that just worked to address our suboptimal state; we set out to redefine the way that code development could work on our team and beyond. In practice, these aspirational goals were realized through documented team values (such as exhaustive test coverage, exhaustive completeness of changes), extensive onboarding and troubleshooting documentation, and the creation of foundational developer tools (Xcode project generator, performant CI / CD). We wanted to achieve or exceed the quality bar established in the primary Tinder iOS application repository, in addition to proposing an alternate manner for approaching iOS development at Tinder.
  • All tools would be built exclusively in Swift allowing DevEx systems and infrastructure to be more resilient to organizational change and allow for external contribution
  • Everything would be built exclusively via Bazel to demonstrate the efficacy of this build system for our use-case

Over the course of this effort, we learned a good deal about Bazel:

Things we got right:

  • Choosing to use vanilla Bazel over jumping straight to a DSL / wrapper: at this point, it was important for all DevEx engineers to get up to speed with Bazel, and introducing an abstraction early would have made the inevitable future support burden much larger (with fewer domain experts)
  • Automating Tulsi-backed XcodeProject generation via Swift
  • Creating a very granular build graph
  • Consuming bazel-diff and using target selection from the start
  • Instrumenting all tools with metrics
  • Introducing a repeatable way to depend on anything (Bazel Hermetic Toolchain)

Things we got wrong:

  • Opting for prebuilt binary executables in the tooling repository, which was a strategy of the primary Tinder iOS codebase that we chose over building from source
    — The use of binary executables often hid non-hermetic assumptions related to execution paths
  • Creating a bespoke test runner over direct use of bazel test / coverage
    — By not using runfiles and instead populating variables with absolute paths to artifacts, we experienced degraded cache performance, and forced atomic execution of tests
    — Delayed figuring out how to read / write test artifacts from a test bundle in a hermetic fashion (runfiles)
  • Avoiding using the Build Event Protocol in our initial efforts; creating a robust Build Event Protocol parser has unlocked more reliable and stable parsing capability than parsing Bazel STDOUT/STDERR logs

At this point, we have migrated the source code for all Swift tooling into our new repository, built said tools via Bazel, and populated our Tinder iOS hermetic toolchain with prebuilt binary executables. We cut over from using committed binaries to our new executables without incident; in many cases this was simply a matter of adjusting invocations from a tool in the user’s local path to one from the Bazel hermetic toolchain (example command becomes @example//:example — command). With the tooling in place and the migration completed, we were now ready to begin scoping the effort required to build all of Tinder iOS via Bazel.

ios-tooling metrics as of 3/25/22

Building Tinder

Even before work began on building the Tinder target with Bazel, we knew that parallel builds were an absolute requirement; we had to prove that this new system was more stable, performant, and capable than what we used at the time (Cocoapods). Knowing this, we realized that our configurations across Podspecs, Podfiles, and BUILD files would have to be synchronized. Additionally, there was a significant burden of proof required before adopting Bazel as our default build system, and therefore the entire migration process (including parallel builds) had to be done in a non-blocking fashion — particularly because Cocoapods was still the only source of truth from a release perspective. This meant that our migration pattern had to be fully automated because we could not require all developers to learn and understand Bazel as part of the validation of this technology; the adoption of which was still far from certain.

During the development of our internal toolchain, we discovered that we could consume human-readable configurations (YAML) and automate complex, repeatable tasks that would otherwise be error-prone if performed by a developer manually. This strategy of developing a domain-specific language (DSL) and pairing it with code-generation and automation had already proven successful with several efforts prior to Bazel, so this was a natural solution to our issue of automating the creation of BUILD files. We had also defined a repeatable recipe for module creation in earlier tooling efforts, though critically missing the leverage of a DSL. Given the rigid requirement that Cocoapods and Bazel configuration remain completely synchronized during the migration, it was clear that a unifying DSL was required to define modules/apps going forward. These insights resulted in the creation of xc-cli, a tool for dynamically rendering configuration for any build system given a YAML DSL:

Developers run a command called build that code generates all configuration files in the repo:

After xc-cli was deployed, we were able to run parallel CI jobs without interfering with the current Cocoapods workflow. However, we still observed a gap of synchronization between the dual build systems that we would work to resolve.

The primary synchronization issue between Cocoapods and Bazel configurations concerned external dependencies. Bazel requires us to define all dependencies, including transitive, upfront. Conversely, Cocoapods uses a SemVer constraint solver system which can automatically resolve transitive dependencies. To bridge this gap, we developed automations that could take a Podfile.lock and generate the corresponding Bazel rules for our external dependencies (think code-generated toolchain containing numerous {http,git}_archive definitions). Even today, after completing the migration to Bazel, Cocoapods’ version solver is used as our source of truth, but no artifacts are consumed except Podfile.lock.

After closing the remaining synchronization gaps between Cocoapods and Bazel, we were able to quickly validate the stability of our Bazel build, and gather performance metrics. As expected, our Bazel build was highly performant and deterministic, and the caching layer was critical to this new performance frontier. Despite still undergoing the validation process, however, we had to turn our attention to the first App Store release of a Bazel-built IPA.

Our First Release

Luckily creating an IPA is very straightforward with Bazel, given that we already paid the cost of exhaustively defining all required configuration as a prerequisite to building the Tinder application (configuration that prior to Bazel was fragmented across tens of configuration files, build phases, Jenkins jobs, raw xcodebuild commands, Shell scripts, Ruby scripts, Groovy scripts, etc.). Since the bulk of the work was already done, creating an IPA was as simple as invoking bazel build.

Our first Bazel-built IPA was submitted to App Store Connect in October 2021, 18 months after starting our adoption of Bazel. Although our team was very excited about having a full QA verified Bazel build, we were still fearful that there was something we missed. We had even prepared a hotfix backup plan in which we would have reverted to Cocoapods if needed. Upon releasing this IPA for download on the App Store and monitoring the rollout, our fears disappeared; the Bazel IPA was smaller, faster to launch, and worked as expected. The only issue we faced concerned unused localization files, which Bazel helped us to discover and promptly remove in the following release, reducing our IPA size by another 9 MB.

Now that we could successfully build and deploy Bazel-built IPAs, we focused our attention on a process that involved multiple stakeholders and fragmented team-wide understanding: release. We had produced this first Bazel-built IPA in a world that behaved very differently from the new Bazel world that we were creating, and given the friction that this introduced, we decided to codify this part of the development cycle via Swift, as we did previously with xc-cli.

Through all of this work, we have arrived at a guiding principle for our team: every mutation to a system must have a paper trail. Without this guarantee, we jeopardize the maintainability and extensibility of the system. We achieved this principle by moving the release processes into code. What was once a dark debt is now simply a tool we can execute, modify, and monitor as we see fit.

Developer Experience

In our focused pursuit of solving the untenable growth in build and test times in the Tinder codebase, we neglected parts of the local development experience. In retrospect, we grossly underestimated the attachment of our greater engineering team to the legacy workflows, and ultimately to Cocoapods. We had to adjust our worldview of what our developer experience could be to meet the expectations of the greater team. One concrete example of such an adjustment was opting for larger, monolithic Xcode projects over highly scoped / focused projects.

As part of the development of these tools in our tooling repository, we championed the use of focused projects to open, edit, and test a small subset of targets. Additionally, we utilized lighter text editors to obtain global search and replace functionality, so as not to incur the cost of indexing a monolithic set of source files within a generated project. While this workflow was highly effective in tooling, it faced several challenges in Tinder iOS. First and foremost, developers desired a monolithic Xcode project since code organization was very loose, and indexer-backed global search (symbol lookup, jump to definition, etc.) was a deeply ingrained practice. Asking developers to use new text editing tools was a non-starter. After a period of time, it was clear that we could not move forward with a focused Xcode project approach and therefore went back to a large monolithic project. While we had not made the progress we had hoped for in this effort, we were able to accomplish git-ignoring all generated Xcode projects to enforce their use as ephemeral components of the developer experience.

Looking back, one regret of this phase of the project was not involving developers earlier. In the end, we spent a great deal more time working on the XcodeProject experience than any other phase, developing buy-in, assuaging developer fears, and understanding the complexities of Xcode itself. From the start of this effort, we leaned heavily into automation over process as it was the only way we saw to accomplish this effort in a non-blocking way (given our teams’ resources at the time). For many developers this approach was a nonissue; their workflows were not impacted. Some developers, however, felt left out of the conversation, since previously there was some level of ownership over the build system by various engineers on the team.

We struggled to strike a balance between involving developers in the decision-making process and making progress against the greater Bazel migration effort. While we are proud of the results of this project, we would advise others looking into these efforts to focus as much on the developer experience as the pure performance metrics. Anecdotal evidence, particularly that which is counter to raw performance numbers, is yet another opportunity for improvement.

We’re Hiring

Connor and Maxwell are members of the iOS Developer Experience team at Tinder. If these challenges and problems are interesting to you, we’re hiring!

Using Bazel at your company? Check out bazel-diff to start using target selection in your builds.

--

--