Hate to wait: how Skyscanner used module caching to cut iOS app build speed in half
CI/CD specialist Aron Pammer helps Skyscanner app developers satisfy their serious need for speed. In this post he describes how module caching saved our developers a lot of the most valuable commodity we have: time.
Our users are Skyscanner’s own app developers, and we want them to have an awesome experience while developing. When we held sessions to figure out what the main pain point was for our developers the most mentioned one was…build speed.
Our build speed was extremely slow. It was not unusual to wait hours for an iOS validation to complete.
Fortunately, the local development flow is a bit faster because:
xcodebuild
’s incremental build time (you already built the app once locally, changed something, and pressed rebuild) helps a lot, because you don’t need to rebuild every part of the app- other dependencies, such as react native, ruby gems, and bundle packages are already installed locally
Even with the incremental build, though, the situation is still not ideal, which is why we started to introduce the improvements I’m about to describe to you into our local development workflow (following thorough testing in the CI pipeline).
The reason we test first in the CI is because there we use Anka, a tool that presents us with the same environment every time we start a new build. It’s a lot easier to debug when the environment is the same.
Data, operational metrics, dashboards
We’d like to improve the iOS stage validation time, but how do we even get started? First let’s take a look at the big picture:
The project setup time includes downloading caches, creating the Xcode project (config as code), and so on.
We decided to focus on improving the build time first, because:
- We knew employing caching here would be possible, as we’d seen implementations of it before
- It could later be easily modified to fit into the local development workflow
Project architecture
Our architecture can be best described as a modular architecture. We’ve got ~150 module targets in total. These module targets are compiled as static libraries individually and then all of them are linked together. Each module target can have multiple module target dependencies - and circular dependencies are not allowed.
We use rake and xcodeproj ruby gem to generate the same Xcode project. This process could have its own blog post, but in short, every module target has its own directory, and inside that directory we have a yml
file:
---
name: ModuleTargetFoo
pod_dependencies:
- ExternalPodBar
- ExternalPodFoo
dependencies:
- ModuleTargetBar
- ModuleTargetFooCommon
frameworks: []
libraries: []
owner: foobar
slack: "#foobar-squad"
Then our setup script does the following:
- Creates a new Xcode project file
- Adds the module targets for every
yml
file - Adds every dependency for every module target
- Checks for circular dependencies
- Generates the
Podfile
for CocoaPods based on thepod_dependencies
specified in theyml
file, then installs pods viapod install
into the xcode project - Adds the correct
Swift
andObj-C
header search paths for every module target - Saves the Xcode project file
Improving the build time
Currently, the build time is highly dependent on the number of concurrently running virtual machines on the same physical computer. On average, a clean build on an 8-core Mac Pro with 32GB ram takes 40 minutes to complete. We use ccache to help with the build time; with very high ccache hit rate we were able to reduce this number to 15 minutes.
When we analysed the development process we noticed two things pretty much straight away:
- We only add a new external CocoaPod dependency once in every two weeks
- Each squad at Skyscanner works on their own modules — multiple modules are rarely changed in a single pull request
External pod cache
As a first step we decided to investigate how we could avoid rebuilding every external pod. If the Podfile doesn’t change, then the generated Pod Xcode project won’t change either, which means we can safely cache the whole build output.
This is the script we have created to build the pods, and then create a zip which we can store and reuse:
This zip is then uploaded to a shared storage.
If we see that there is a cache already stored for a given hash, configuration, and target, then we download that, extract it, and finally set the correct Xcode build configuration
values:
PODS_CONFIGURATION_BUILD_DIR => "${SRCROOT}/podcaches/<PODLOCK_HASH>/${CONFIGURATION}${EFFECTIVE_PLATFORM_NAME}"LIBRARY_SEARCH_PATHS => "${SRCROOT}/podcaches/<PODLOCK_HASH>/${CONFIGURATION}${EFFECTIVE_PLATFORM_NAME}"SWIFT_INCLUDE_PATHS => "${SRCROOT}/podcaches/<PODLOCK_HASH>/${CONFIGURATION}${EFFECTIVE_PLATFORM_NAME}"
It’s important to note that it’s possible to cache only individual pods, and only rebuild the ones where there is a version change. We decided to cache every pod, because our external pods aren’t changing very frequently.
The result
Internal module cache
The next step in improving the build time was to introduce internal module caching.
Internal module: a build target in Xcode created and maintained by Skyscanner employees
There were several questions that came up:
When to invalidate a certain module cache?
We already know that if ModuleA
’s hash has changed we need to rebuild ModuleA
. Is that enough, though?
Now let’s imagine we have both ModuleA
and ModuleB
in cache. As can be seen above, we are calling ModuleA
’s function in ModuleB
. If we change the implementation of ModuleA
’s setSomeParam
, then we have to rebuild ModuleA
, but do we have to rebuild ModuleB
? The answer is: it depends.
In this case, no, because the function signature didn’t change. One could say then that if we change the signature of ModuleA
’s function, then we would have to change that in ModuleB
as well, thus invalidating both ModuleA
and ModuleB
’s cache. That’s true, but what if:
Now let’s change setSomeParam
to: - (void)setSomeParam:(BOOL)someParam paramB:(BOOL)paramB;
.
In this case, this change will invalidate ModuleA
, so rebuilding ModuleA
will be required. However, as ModuleB
inherits the setSomeParam
method, if we pull ModuleB
from cache, linking will fail. Therefore, in this case we need to invalidate ModuleB
as well.
Our current solution’s limitation is that it always behaves in the safest way — we always invalidate the cache of a module that depends on a changed module, even if the signature is the same. We intend to change this in the future to achieve better cache hit rates.
What should be the cache key for a module?
We have over two hundred module targets; each module has its own folder under the Modules directory.
For generating the cache key, these were the options we came up with:
- Calculating the hash of the contents of the files inside of a folder:
find Modules/ModuleToHash -type f \( -exec md5 -q “$PWD”/{} \; \) | awk ‘{print $1}’ | sort | md5
Time to run: on average 1 second / Module (150 modules = 150 seconds ~= 3 minutes) - Check only the
modification time
of each module folder
Time to run: instant - Get the git hash of a folder:
git ls-tree HEAD Modules/ModuleToHash | awk ‘{print $3}’
Time to run: instant
Option 2 was out of the question, because the modification time for a folder is different for each computer.
Option 1 and option 3 remained the viable solutions, but each one of them has their own advantages and disadvantages. The git hash is fast, because the hash is already calculated by git, but if a change is not staged in git, then the hash remains unchanged. Option 1 is slow, but always accurate.
For building on the CI, we decided to go with the third option, because the build machines aren’t changing any files, and therefore retrieving the git hash will always be accurate.
Where to store these caches?
Storing these caches as closely as possible to the build machines is crucial. A module’s build artifact’s size can range from 5 KB to 85 MB. For a 100% module cache hit rate it would mean downloading gigabytes of data. This is why we decided to create a data storage server near our build machines; this way, copying gigabytes of data to the build machine only takes seconds.
Implementation
In the beginning we were already aware that it wouldn’t be easy to cache modules that have Swift source files
in them, so as a first step of internal module caching we decided to focus on Obj-C-only module caching.
Obj-C module caching
When xcodebuild
runs it creates a derived data
folder for the build. What we need is the built products inside this folder for a given target and configuration - so the $DERIVED_DATA_PATH/Build/Products/$TARGET_CONFIGURATION-$TARGET_SDK
folder. This folder contains the module build products, namely the files ending in a .a
.
There is a build product for every module. We cache these files, and upload them to our shared cache storage, with the following name: $TARGET_CONFIGURATION-$TARGET_SDK-$MODULE_NAME-$MODULE_HASH
.
Then the next time we are building the application and we find that there is already a cached module for a specific module hash, we do the following:
- Create a
Cache
group in the xcode project - Download the cached files and add them into this new
Cache
group - Remove the module targets coming from cache from the project’s module target list, so that
xcodebuild
doesn’t build these modules - Add the cached files’ references to the main target’s
Link Binary with Binaries
build phase
By replacing the module targets with the cached files xcodebuild
will only need to build the non-cached modules and do the linking at the end.
Doing this process manually is a lot of work, so if you’d like to use this same approach then I would suggest that you create the Xcode project by code.
Swift compiler overview
Unfortunately things weren’t this easy when it came to caching module targets with a mix of Swift
and Obj-C
. For Obj-C
-only modules it was relatively easy to find the built static library, save it, then reuse it. For Swift we had to dig a little bit deeper into the xcodebuild
process.
To compile Swift sources xcodebuild
calls the swiftc
command. This command outputs four files:
- A
swiftmodule
file - A
swiftdoc
file - A swift compatibility header file (obj-c header file)
- An object file
To get the full list of
swiftc
's parameters you can run the following command:swiftc --help-hidden
. This was useful for us when we were getting a bit more familiar with the swift compiler.
Let’s go over them one-by-one first to understand what these files are, and why they are needed.
Swiftmodule file (.swiftmodule)
A binary file format, equivalent to a collection of header files for a C library. It contains the public interface for the swift files.
Swiftdoc file (.swiftdoc)
A binary file format as well, this contains the documentation for your code. While in Obj-C you’d write the comments in the header files, in Swift you write them alongside the implementation, and the Swift compiler generates a separate swiftdoc
file.
One partial swiftmodule
and swiftdoc
file is created for each Swift file, and when all of the Swift files are compiled in a module target they are merged together into one big swiftmodule
and swiftdoc
file.
It’s important to note that these files, just like clang object files, are architecture dependent. If you set Xcode to build for two architectures, then two swiftmodule
and swiftdoc
files will be generated, with different names.
You can find more information about the
swiftmodule
andswiftdoc
files here.
Swift compatibility header file (.h)
In short, it’s the same as the bridging-header, but in the other direction. This file is generated so that you are able to use the Swift resources (classes, functions, etc.) in Obj-C.
Object file (.o)
Like clang, the Swift compiler also creates an object file for a Swift file. This file can be then merged with other object files clang generated with the help of libtool
. libtool
takes a list of object files as input, and creates a library for use with the linker.
Compilation process summary
In short, the following happens:
xcodebuild
generates a build dependency tree, then for each module target compiles the input files (.swift, .m), and outputs the following for each swift file:
- partial
swiftmodule
- partial
swiftdoc
- compatibility bridging header file
- object file
Then the process is as follow:
- Merges the partial
swiftmodule
andswiftdoc
files into one file - Compiles the
Obj-C
files, and outputs an object file for each file - Creates a library out of the previously generated object files (the
.a
file) withlibtool
- Links the module targets’ library files (
.a
) together and generates an app file (.app
)
Of course a lot more happens in the background, like copying the app resources, and compiling the asset catalogs with actool
, however for our current use-case knowing the above is enough.
Swift vs Obj-C
What do you do in Obj-C when you want to use an Obj-C class that is in another module?
You simply include it with #include <Module/HeaderFoo.h>
, and add the correct entry to the HEADER_SEARCH_PATHS
list, so the compiler knows where to find the header you just included.
Importing Swift in Swift
Here the process is almost the same.
You include it with import FooSwiftModule
, and add a new SWIFT_INCLUDE_PATHS
path entry. What is the “entry”?
Like I mentioned above, Swift’s Obj-C header equivalents are the swiftmodule
files, so in this case you’d add the directory that contains the directory that has the swiftmodule
and swiftdoc
files. For example, by default, for each module target the swift compiler creates a folder in <derived data path>/Build/Products/<target_configuration>-<target_sdk>
with the name <module_name>.swiftmodule
and puts the swiftmodule
and swiftdoc
architecture-dependent files inside this folder. So in this case the Swift search path would be - and is by default - the <derived data path>/Build/Products/<target_configuration>-<target_sdk>
folder.
Back to our original situation: when you import FooSwiftModule
the Swift compiler looks into the SWIFT_INCLUDE_PATHS
and tries to find the FooSwiftModule.swiftmodule/<architecture_type>.swiftmodule
file. If it cannot find this module then the compilation will fail with a No such module FooSwiftModule
message.
Importing Swift in Obj-C
In case we want to use the FooSwiftModule
resources in Obj-C all we have to do is add a new HEADER_SEARCH_PATHS
entry for the Swift Compatibility Header
file.
Mixed Obj-C and Swift caching
In order to be able to reuse modules with swift source files in them we need to store and download the swiftmodule
and Swift Compatibility Header
files as well (swiftdoc
is not necessary for building). This is where we ran into issues.
The swiftmodule
file - as you can see above - has absolute paths inside it, and because of this we cannot reuse a swiftmodule
file on machines where the base path of the codebase is different. This isn’t an issue when running on the build machines, since they will always have the same path.
One “solution/hack” we have for this is to create a symlink from the directory where the swiftmodule
was created to the actual repository. This could cause a lot of issues (e.g what if someone has multiple copies of the same repository on their computer?), so I wouldn’t recommend going down this route. Fortunately though, in Swift 5 there will be a better workaround:
Results
Back to our operational metrics, this is the end result:
The reason the clean build still takes a lot of time with a 70%+ cache hit rate is that we are always rebuilding all of the tests. This might not be necessary, because in theory you would only need to rebuild the tests that test a module not coming from cache, but we decided to still build them for now to be on the safe side.
Furthermore, we also need to always rebuild the main module, the module that depends on every other module (this module has the AppDelegate
).
Before we modularised the app this was the only module target, and it still is the biggest module - it accounts for around 20% of our whole codebase. This means that until we further modularise this main module we will never be able to reach an internal module cache hit rate higher than 80%.
The same is also true for the other modules as well, because the less dependencies a module has the more probable that it can come from cache.
Closing thoughts
To summarise, we managed to reduce the clean build time drastically, but we are far from the finish line. We still need to further modularise the source to achieve even better cache hit rates, and we also have to figure out a way to avoid running unnecessary test cases without compromising reliability.
Tips
To help with the the build time there are a few other quick wins you could try:
- Like I mentioned before, we use
ccache
to improve theObj-C
build times.ccache
is incredibly easy to set up with Xcode. (https://pspdfkit.com/blog/2015/ccache-for-fun-and-profit/) xcodebuild
indexes while the compilation process runs, which in turn slows down the build itself. You most probably don’t need the indexing feature while building on the build nodes. To turn it off setCOMPILER_INDEX_STORE_ENABLE=NO
xcodebuild
has a-jobs
option where you can set the maximum number of concurrent build operations. YMMV, but we ran some experiments and found that sometimes a lower than default value here is better.
Join Skyscanner, see the world
Life-enriching travel isn’t just for our customers — it’s for our employees too! Skyscanner team members get £500 (or their local currency equivalent) towards the travel trip of their choice in 2019 — and that’s just one of the great benefits we offer. Read more about our benefits and have a look at all of our open roles right here.
About the author: Aron Pammer
Hi, I am a Software Engineer at Skyscanner. I joined two years ago, and ever since that time I’ve been working on the continuous integration and continuous delivery of iOS and Android apps.