Developing fast & reliable iOS builds at Pinterest (Part one)

By Rahul Malik | iOS Platform Tech Lead

At Pinterest we’re focused on helping people discover inspiring ideas, from dinner recipes to try, home and style products to buy, to places to travel. Building the best products for mobile is a critical part of that, with 80% of all Pinners access Pinterest via mobile apps. On the iOS team specifically, we’re constantly working to improve that experience as efficiently and quickly as we can, and giving our team the best development and testing environment is a key step in that.

We recently looked into ways to streamline that process, and set out to improve the speed and reliability of our iOS builds on local and continuous integration environments. In addition, we began modularizing our application into standalone frameworks and needed a system to support that migration. We reviewed multiple tools, including Xcode, Cocoapods, Buck, and Bazel. We wanted to introduce a more stable foundation for the future, which is central to our ability to rapidly iterate and release new features to Pinners.

After comparing Xcode, Cocoapods, Buck and Bazel, we identified Bazel was the best fit for our goals to build a foundation for an order of magnitude improvement in performance, eliminate variability in build environments and adopt incrementally. As a result, we’re now shipping all our iOS releases using Bazel which has already resulted in wins, including:

Local Development

  • Faster builds: Reduced clean build time from 4m 38s to 3m 38s, a 21% improvement.
  • Local disk caches allow for instant rebuilds for anything you’ve built before (other branches, commits, etc).
  • Environments are identical between CI and local environments, so build issues are easy to reproduce.
  • Increased automation: Tasks like code generation are included as part of the build graph.

Continuous Integration

  • Every build is an incremental build: Since Bazel is reproducible, we haven’t performed a single clean build on CI in over a year.
  • Build once, reuse everywhere: After introducing remote build caching, build times dropped under a minute and as low as 30 seconds since we don’t need to rebuild anything that has been built on any machine
  • Reduced time to land code: Reduced build time from 10m 24s to 7m 34s, a 27% improvement.
  • Reduced time to get changes to beta testers: Beta build time went from 14m 32s to 7m 52s, a 45% improvement.
  • Faster test execution: Test runs are instant if the modified code does not affect the test.
  • Higher build success rate: Success rate of builds improved from around 80% to 97%-100% when running build tasks with Bazel.

Moving to a future of fast and reliable builds

Build speed is a constant bottleneck for developers since we’re using compiled languages (Objective-C/C++). But build speed is hard to quantify. It includes of builds in different environments, like continuous integration or local development. We also work with a variety of workflow scenarios, like clean builds, incremental builds, branch switching, rebasing, reverting changes, and others. You can’t improve what you don’t measure, so improving build speed requires tracking a variety of scenarios to allow us to pinpoint regressions and focus our performance efforts.

We can make builds faster by a combination of doing less work or performing work more efficiently. This might involve using different tools, improving parallelization, or updating the architecture of the project to require fewer source files. Having strong practices around maintaining a modular architecture and cleaning up dead code that is unreferenced or related to completed experiments will help maintain / improve build speeds. We use a variety of in-house tools and scripts to identify dead code. For experiments, we utilize automation that adds clang annotations to deprecate methods and constants that are related to the experiment which allows the compiler to warn developers that the experiment is ended and code should be removed. Identifying unreferenced code is performed adhoc by developers by periodically running tools that inspect the header include graph of our build and look for files that are have zero references recursively.

Our build process needs to be both fast and reliable. Builds are reliable if they are reproducible. Reproducible builds are important not just for reproducing bugs, but also for ensuring we ship the exact version of the app that we’ve developed and tested against. We can only achieve that if the build environment — the inputs and outputs — are consistent.

Changes to the environment can greatly affect the end product and introduce variability. A consistent environment guarantees the application behaves the same regardless of if it was built on a developer’s machine or via continuous integration, and it eliminates time spent figuring out why a build succeeds in one environment but fails elsewhere.

While the ideas and explorations are focused around iOS, the goals of fast and reproducible builds are one that we all share and will allow us to scale client engineering.

Challenges

The decision to focus on improving our build process was rooted in the impact it was having on developer productivity. As we grow our team and product, it’s paramount that we invest in our developers’ ability to work with a consistent and fast build system.

  • Scale: As we scale client engineering, the amount of time spent supporting developers, maintaining or reducing build times and improving reliability scale as well. The number of engineers that support developers does not necessarily scale proportionally with the number of developers, and Xcode doesn’t contain tools to profile builds when performance degrades.
  • Modular architecture: We’ve begun refactoring the core frameworks that compose our platform from our app in order to improve our overall architecture, documentation and quality. This adds complexity because it requires a build system that can manage a dependency graph of build targets which need to be configured and compiled in a specific order. While not impossible in Xcode, the configuration and maintenance of such a graph would be prohibitively difficult to maintain over time due to a lack of a expressive configuration API.
  • Build instability: Outside of our codebase, there are a number of tools written in different languages (Ruby, Python, Bash, etc.) that require specific versions and toolchains that must be identical to create consistent builds. These variations can result in errors that are hard to reliably reproduce. It was not uncommon for developers to have a build pass locally but fail on continuous integration, and vice versa. Only certain machines had the requirements necessary to create a release candidate. Local state can become corrupted, which required performing clean builds. That wastes time.
  • Task automation & code generation: We rely on code generation to create our immutable models (via Plank) and logging infrastructure (via Thrift). While it has support for run script phases, Xcode can’t introduce dynamic workflows like code generation or general task automation to be a part of the build process, and instead requires manually integrating generated sources, putting more work on developers and onboarding education. This also requires adding generated artifacts to version control, which increases our repository size and git clone performance.
  • Shared resources: The integration path for external repositories has not been clear and has historically resulted in periodically copying resources from other repositories. We have explored options like git subtree or git submodule, but this required an increased investment in employee education and a change to developer workflows. That introduced confusion and, again, wastes time. Xcode does not have any support for declaring external build dependencies, so we would have to rely on external tooling to provide this integration.

Solutions

We wanted solutions that would allow us to overcome these challenges with tooling and automation instead of increased load on developer education and process — and waste less time. We primarily optimize for:

  • Rapid iteration: Our solution should provide functionality to greatly improve and maintain build speed and developer velocity over time, likely achieved through better parallelism and advanced tooling features.
  • Sandboxed development: A consistent environment that allows us to have reliable builds and minimizes variability and impact on developer productivity.
  • Monorepo-like development: All sources should still remain in one repository. This minimizes the amount of work and context switching required to make changes across the application.
  • Profiling, Monitoring, Analysis: We need tools that give us insight into our build system to identify issues. Our solution needs to allow us to visualize the actions performed throughout the build and their respective durations. Assuming we have this, we will be able to track detailed changes on a frequent basis.
  • Incremental Compilation: Once we build the client once, we should be able to safely incrementally build through all workflows. That should include switching branches, reverting changes, or other parts of the workflow. Clean builds are by far the most expensive builds and are usually performed when local state is corrupted or the developer is trying to diagnose an unknown build issues.

Extensible for the future

As our application grows in complexity and our needs evolve we must ensure that we have enough extensibility in our build system to allow for change to be develop. But it must not be so specific in that it hinders further dynamic automation in our build process. This may range from being able to automate tasks to integrate third-party static analysis, custom toolchains and the in-house tools we develop at Pinterest.

Changing a build system is a significant change, and we cannot support an approach that isn’t possible to introduce incrementally. An all-or-nothing solution would require potentially pausing development or maintaining a long-lived fork and performing a risky migration atomically across developer environments and CI systems.

Stay tuned for more to come in part two!