Hate to wait: how Skyscanner used module caching to cut iOS app build speed in half

CI/CD specialist Aron Pammer helps Skyscanner app developers satisfy their serious need for speed. In this post he describes how module caching saved our developers a lot of the most valuable commodity we have: time.

Skyscanner Engineering
13 min readJun 28, 2019
Modular architecture of a different kind — the exterior of the Plaza Carso building in Mexico City, Mexico (photo: Federica Rho)

Our users are Skyscanner’s own app developers, and we want them to have an awesome experience while developing. When we held sessions to figure out what the main pain point was for our developers the most mentioned one was…build speed.

Our build speed was extremely slow. It was not unusual to wait hours for an iOS validation to complete.

Fortunately, the local development flow is a bit faster because:

  1. xcodebuild’s incremental build time (you already built the app once locally, changed something, and pressed rebuild) helps a lot, because you don’t need to rebuild every part of the app
  2. other dependencies, such as react native, ruby gems, and bundle packages are already installed locally

Even with the incremental build, though, the situation is still not ideal, which is why we started to introduce the improvements I’m about to describe to you into our local development workflow (following thorough testing in the CI pipeline).

The reason we test first in the CI is because there we use Anka, a tool that presents us with the same environment every time we start a new build. It’s a lot easier to debug when the environment is the same.

Data, operational metrics, dashboards

We’d like to improve the iOS stage validation time, but how do we even get started? First let’s take a look at the big picture:

By far the biggest parts of the stage validation time are the building and the testing parts.

The project setup time includes downloading caches, creating the Xcode project (config as code), and so on.

We decided to focus on improving the build time first, because:

  1. We knew employing caching here would be possible, as we’d seen implementations of it before
  2. It could later be easily modified to fit into the local development workflow

Project architecture

Our architecture can be best described as a modular architecture. We’ve got ~150 module targets in total. These module targets are compiled as static libraries individually and then all of them are linked together. Each module target can have multiple module target dependencies - and circular dependencies are not allowed.

We use rake and xcodeproj ruby gem to generate the same Xcode project. This process could have its own blog post, but in short, every module target has its own directory, and inside that directory we have a yml file:

---
name: ModuleTargetFoo
pod_dependencies:
- ExternalPodBar
- ExternalPodFoo
dependencies:
- ModuleTargetBar
- ModuleTargetFooCommon
frameworks: []
libraries: []
owner: foobar
slack: "#foobar-squad"

Then our setup script does the following:

  1. Creates a new Xcode project file
  2. Adds the module targets for every yml file
  3. Adds every dependency for every module target
  4. Checks for circular dependencies
  5. Generates the Podfile for CocoaPods based on the pod_dependencies specified in the yml file, then installs pods via pod install into the xcode project
  6. Adds the correct Swift and Obj-C header search paths for every module target
  7. Saves the Xcode project file

Improving the build time

Currently, the build time is highly dependent on the number of concurrently running virtual machines on the same physical computer. On average, a clean build on an 8-core Mac Pro with 32GB ram takes 40 minutes to complete. We use ccache to help with the build time; with very high ccache hit rate we were able to reduce this number to 15 minutes.

When we analysed the development process we noticed two things pretty much straight away:

  1. We only add a new external CocoaPod dependency once in every two weeks
  2. Each squad at Skyscanner works on their own modules — multiple modules are rarely changed in a single pull request

External pod cache

As a first step we decided to investigate how we could avoid rebuilding every external pod. If the Podfile doesn’t change, then the generated Pod Xcode project won’t change either, which means we can safely cache the whole build output.

This is the script we have created to build the pods, and then create a zip which we can store and reuse:

Building the CocoaPods

This zip is then uploaded to a shared storage.

If we see that there is a cache already stored for a given hash, configuration, and target, then we download that, extract it, and finally set the correct Xcode build configuration values:

PODS_CONFIGURATION_BUILD_DIR => "${SRCROOT}/podcaches/<PODLOCK_HASH>/${CONFIGURATION}${EFFECTIVE_PLATFORM_NAME}"LIBRARY_SEARCH_PATHS => "${SRCROOT}/podcaches/<PODLOCK_HASH>/${CONFIGURATION}${EFFECTIVE_PLATFORM_NAME}"SWIFT_INCLUDE_PATHS => "${SRCROOT}/podcaches/<PODLOCK_HASH>/${CONFIGURATION}${EFFECTIVE_PLATFORM_NAME}"

It’s important to note that it’s possible to cache only individual pods, and only rebuild the ones where there is a version change. We decided to cache every pod, because our external pods aren’t changing very frequently.

The result

Pod caching successfully reduced our build time by over 25% on average.

Internal module cache

The next step in improving the build time was to introduce internal module caching.

Internal module: a build target in Xcode created and maintained by Skyscanner employees

There were several questions that came up:

When to invalidate a certain module cache?

We already know that if ModuleA’s hash has changed we need to rebuild ModuleA. Is that enough, though?

ModuleA.h
ModuleB.m

Now let’s imagine we have both ModuleA and ModuleB in cache. As can be seen above, we are calling ModuleA’s function in ModuleB. If we change the implementation of ModuleA’s setSomeParam, then we have to rebuild ModuleA, but do we have to rebuild ModuleB? The answer is: it depends.

In this case, no, because the function signature didn’t change. One could say then that if we change the signature of ModuleA’s function, then we would have to change that in ModuleB as well, thus invalidating both ModuleA and ModuleB’s cache. That’s true, but what if:

ModuleA.h
ModuleB.m

Now let’s change setSomeParam to: - (void)setSomeParam:(BOOL)someParam paramB:(BOOL)paramB;.

In this case, this change will invalidate ModuleA, so rebuilding ModuleA will be required. However, as ModuleB inherits the setSomeParam method, if we pull ModuleB from cache, linking will fail. Therefore, in this case we need to invalidate ModuleB as well.

Our current solution’s limitation is that it always behaves in the safest way — we always invalidate the cache of a module that depends on a changed module, even if the signature is the same. We intend to change this in the future to achieve better cache hit rates.

What should be the cache key for a module?

We have over two hundred module targets; each module has its own folder under the Modules directory.

For generating the cache key, these were the options we came up with:

  1. Calculating the hash of the contents of the files inside of a folder:
    find Modules/ModuleToHash -type f \( -exec md5 -q “$PWD”/{} \; \) | awk ‘{print $1}’ | sort | md5
    Time to run: on average 1 second / Module (150 modules = 150 seconds ~= 3 minutes)
  2. Check only the modification time of each module folder
    Time to run: instant
  3. Get the git hash of a folder:
    git ls-tree HEAD Modules/ModuleToHash | awk ‘{print $3}’
    Time to run: instant

Option 2 was out of the question, because the modification time for a folder is different for each computer.

Option 1 and option 3 remained the viable solutions, but each one of them has their own advantages and disadvantages. The git hash is fast, because the hash is already calculated by git, but if a change is not staged in git, then the hash remains unchanged. Option 1 is slow, but always accurate.

For building on the CI, we decided to go with the third option, because the build machines aren’t changing any files, and therefore retrieving the git hash will always be accurate.

Where to store these caches?

Storing these caches as closely as possible to the build machines is crucial. A module’s build artifact’s size can range from 5 KB to 85 MB. For a 100% module cache hit rate it would mean downloading gigabytes of data. This is why we decided to create a data storage server near our build machines; this way, copying gigabytes of data to the build machine only takes seconds.

Implementation

In the beginning we were already aware that it wouldn’t be easy to cache modules that have Swift source files in them, so as a first step of internal module caching we decided to focus on Obj-C-only module caching.

Obj-C module caching

When xcodebuild runs it creates a derived data folder for the build. What we need is the built products inside this folder for a given target and configuration - so the $DERIVED_DATA_PATH/Build/Products/$TARGET_CONFIGURATION-$TARGET_SDK folder. This folder contains the module build products, namely the files ending in a .a.

There is a build product for every module. We cache these files, and upload them to our shared cache storage, with the following name: $TARGET_CONFIGURATION-$TARGET_SDK-$MODULE_NAME-$MODULE_HASH.

Then the next time we are building the application and we find that there is already a cached module for a specific module hash, we do the following:

  1. Create a Cache group in the xcode project
  2. Download the cached files and add them into this new Cache group
  3. Remove the module targets coming from cache from the project’s module target list, so that xcodebuild doesn’t build these modules
  4. Add the cached files’ references to the main target’s Link Binary with Binaries build phase

By replacing the module targets with the cached files xcodebuild will only need to build the non-cached modules and do the linking at the end.

Doing this process manually is a lot of work, so if you’d like to use this same approach then I would suggest that you create the Xcode project by code.

Swift compiler overview

Unfortunately things weren’t this easy when it came to caching module targets with a mix of Swift and Obj-C. For Obj-C-only modules it was relatively easy to find the built static library, save it, then reuse it. For Swift we had to dig a little bit deeper into the xcodebuild process.

To compile Swift sources xcodebuild calls the swiftc command. This command outputs four files:

  1. A swiftmodule file
  2. A swiftdoc file
  3. A swift compatibility header file (obj-c header file)
  4. An object file

To get the full list of swiftc's parameters you can run the following command: swiftc --help-hidden. This was useful for us when we were getting a bit more familiar with the swift compiler.

Let’s go over them one-by-one first to understand what these files are, and why they are needed.

Swiftmodule file (.swiftmodule)

A binary file format, equivalent to a collection of header files for a C library. It contains the public interface for the swift files.

Swiftdoc file (.swiftdoc)

A binary file format as well, this contains the documentation for your code. While in Obj-C you’d write the comments in the header files, in Swift you write them alongside the implementation, and the Swift compiler generates a separate swiftdoc file.

One partial swiftmodule and swiftdoc file is created for each Swift file, and when all of the Swift files are compiled in a module target they are merged together into one big swiftmodule and swiftdoc file.

It’s important to note that these files, just like clang object files, are architecture dependent. If you set Xcode to build for two architectures, then two swiftmodule and swiftdoc files will be generated, with different names.

You can find more information about the swiftmodule and swiftdoc files here.

Swift compatibility header file (.h)

In short, it’s the same as the bridging-header, but in the other direction. This file is generated so that you are able to use the Swift resources (classes, functions, etc.) in Obj-C.

Object file (.o)

Like clang, the Swift compiler also creates an object file for a Swift file. This file can be then merged with other object files clang generated with the help of libtool. libtool takes a list of object files as input, and creates a library for use with the linker.

Compilation process summary

In short, the following happens:

xcodebuild generates a build dependency tree, then for each module target compiles the input files (.swift, .m), and outputs the following for each swift file:

  1. partial swiftmodule
  2. partial swiftdoc
  3. compatibility bridging header file
  4. object file

Then the process is as follow:

  1. Merges the partial swiftmodule and swiftdoc files into one file
  2. Compiles the Obj-C files, and outputs an object file for each file
  3. Creates a library out of the previously generated object files (the .a file) with libtool
  4. Links the module targets’ library files (.a) together and generates an app file (.app)

Of course a lot more happens in the background, like copying the app resources, and compiling the asset catalogs with actool, however for our current use-case knowing the above is enough.

Swift vs Obj-C

What do you do in Obj-C when you want to use an Obj-C class that is in another module?

You simply include it with #include <Module/HeaderFoo.h>, and add the correct entry to the HEADER_SEARCH_PATHS list, so the compiler knows where to find the header you just included.

Importing Swift in Swift

Here the process is almost the same.

You include it with import FooSwiftModule, and add a new SWIFT_INCLUDE_PATHS path entry. What is the “entry”?

Like I mentioned above, Swift’s Obj-C header equivalents are the swiftmodule files, so in this case you’d add the directory that contains the directory that has the swiftmodule and swiftdoc files. For example, by default, for each module target the swift compiler creates a folder in <derived data path>/Build/Products/<target_configuration>-<target_sdk> with the name <module_name>.swiftmodule and puts the swiftmodule and swiftdoc architecture-dependent files inside this folder. So in this case the Swift search path would be - and is by default - the <derived data path>/Build/Products/<target_configuration>-<target_sdk> folder.

Back to our original situation: when you import FooSwiftModule the Swift compiler looks into the SWIFT_INCLUDE_PATHS and tries to find the FooSwiftModule.swiftmodule/<architecture_type>.swiftmodule file. If it cannot find this module then the compilation will fail with a No such module FooSwiftModule message.

Importing Swift in Obj-C

In case we want to use the FooSwiftModule resources in Obj-C all we have to do is add a new HEADER_SEARCH_PATHS entry for the Swift Compatibility Header file.

Mixed Obj-C and Swift caching

In order to be able to reuse modules with swift source files in them we need to store and download the swiftmodule and Swift Compatibility Header files as well (swiftdoc is not necessary for building). This is where we ran into issues.

Swiftmodule file contents as seen in a hex viewer

The swiftmodule file - as you can see above - has absolute paths inside it, and because of this we cannot reuse a swiftmodule file on machines where the base path of the codebase is different. This isn’t an issue when running on the build machines, since they will always have the same path.

One “solution/hack” we have for this is to create a symlink from the directory where the swiftmodule was created to the actual repository. This could cause a lot of issues (e.g what if someone has multiple copies of the same repository on their computer?), so I wouldn’t recommend going down this route. Fortunately though, in Swift 5 there will be a better workaround:

Results

Back to our operational metrics, this is the end result:

Build time with high ccache hit rate and high internal module caching hit rate
Build time with high ccache hit rate and external pod caching on
Build time with ccache off and external pod caching on

The reason the clean build still takes a lot of time with a 70%+ cache hit rate is that we are always rebuilding all of the tests. This might not be necessary, because in theory you would only need to rebuild the tests that test a module not coming from cache, but we decided to still build them for now to be on the safe side.

Furthermore, we also need to always rebuild the main module, the module that depends on every other module (this module has the AppDelegate).

Before we modularised the app this was the only module target, and it still is the biggest module - it accounts for around 20% of our whole codebase. This means that until we further modularise this main module we will never be able to reach an internal module cache hit rate higher than 80%.

The same is also true for the other modules as well, because the less dependencies a module has the more probable that it can come from cache.

Closing thoughts

To summarise, we managed to reduce the clean build time drastically, but we are far from the finish line. We still need to further modularise the source to achieve even better cache hit rates, and we also have to figure out a way to avoid running unnecessary test cases without compromising reliability.

Tips

To help with the the build time there are a few other quick wins you could try:

  • Like I mentioned before, we use ccache to improve the Obj-C build times. ccache is incredibly easy to set up with Xcode. (https://pspdfkit.com/blog/2015/ccache-for-fun-and-profit/)
  • xcodebuild indexes while the compilation process runs, which in turn slows down the build itself. You most probably don’t need the indexing feature while building on the build nodes. To turn it off set COMPILER_INDEX_STORE_ENABLE=NO
  • xcodebuild has a -jobs option where you can set the maximum number of concurrent build operations. YMMV, but we ran some experiments and found that sometimes a lower than default value here is better.

Join Skyscanner, see the world

Life-enriching travel isn’t just for our customers — it’s for our employees too! Skyscanner team members get £500 (or their local currency equivalent) towards the travel trip of their choice in 2019 — and that’s just one of the great benefits we offer. Read more about our benefits and have a look at all of our open roles right here.

We’re hiring!

About the author: Aron Pammer

Hi, I am a Software Engineer at Skyscanner. I joined two years ago, and ever since that time I’ve been working on the continuous integration and continuous delivery of iOS and Android apps.

Skyscanner’s Aron Pammer

--

--

Skyscanner Engineering

We are the engineers at Skyscanner, the company changing how the world travels. Visit skyscanner.net to see how we walk the talk!