iOS CI, supercharged

When Skyscanner saw a big increase in the number of developers contributing to its iOS codebase, it embarked on a rebuild of its private Mac cloud. Here’s what happened.

Developing for iOS — not as easy as Apple pie

For the past five years I have been an iOS engineer at Skyscanner, involved in shaping the features of our mobile application. Later on I joined the mobile DevOps team. This allowed me to use my mobile experience to improve our iOS platform infrastructure. Many people in the industry feel that developing for iOS could be simplified. This is even more pronounced when it comes to building an iOS CI system which is consistent and scalable.

Back when I first joined our DevOps team, Skyscanner used Jenkins CI with on-premise Mac Minis and Mac Pros. These were responsible for running both Android and iOS jobs, and we managed them one by one. If you wanted to increase the number of executors for the jobs, you needed to buy a new Mac. This is because Apple EULA states that you can only build iOS code on genuine Mac hardware. If you also want to make the executors consistent, then you have the option to create NetRestore Images or use JAMF. Either way you have to manually shut down your Macs and put up with not being able to use them while they are being provisioned. If you recognise your current CI system in all this, or face similar issues, please read on. We have a solution for you to consider.

Stairway to the cloud

Early in 2018 we saw an increasing trend in the number of developers contributing to the iOS codebase. More developers with CI resource fixes means longer queues and waiting times.

Queueing for a journey into the cloud (image: Supawat Punnanon)

We needed to implement dynamically load-balanced resources as soon as possible. For Android our task was a relatively straightforward one. The Skyscanner web stack is on AWS, so we already had the necessary cloud knowledge and tools. During a previous investigation we had already defined the commands needed to provision a virtual machine. Using Packer by HashiCorp we could generate an Amazon Machine Image. Jenkins could instantiate these with the Jenkins EC2 plugin. In the process we learned a lot about how to present the necessary states (e.g. caches) on the virtual machines. In the end we could scale up from 4 concurrent jobs to 30.

For iOS we had to feel our way more: we had a bit of a knowledge gap and needed to add some new tools to our belt. Soon after beginning the project we came across Anka, a virtualisation technology provided by Veertu. Thanks to its performance and ease of use, it became the core of our private Mac cloud. Here’s how you get it working:

  1. Set-up an Anka controller and a registry Docker container. The former accepts requests for new Mac virtual machines. The latter manages the different versions of the VMs.
  2. Install the client on the Mac hardware.
  3. Connect the clients to the controller and the registry.
  4. Integrate the controller via the Anka Jenkins plugin and configure it. E.g. You need to set the Jenkins label which the controller should trigger a new VM for.
  5. Start using the label in your freestyle and pipeline jobs.
Overview of the infrastructure

There are no silver bullets here, the changes we made involved trade-offs:

🤩 Multiple VMs can be run on a single Mac Pro, meaning more resources are available in the CI environment. With Anka we did not have to sacrifice our speed during the transition. A full project archive took ~18 mins to execute on a bare metal Mac Pro and also takes~18mins in a VM.

🤩 CI environment creation is consistent and automated. We have a Jenkins pipeline that creates our VMs and one that updates their caches.

🤩 Easier to deploy new environments. If the new environment performs badly, we can easily roll back to the previous stable one.

🤔 For each job Anka clones a VM from a template. After the job finishes it will also destroy the VM. You need to save the state (e.g. caches) to an external storage (e.g. S3) and download them on the next start. This can take time, so to speed up the network time we create a new VM template each night with daily cache baselines using the Anka-Jenkins slave template builder plugin. This way the nodes only pull caches and commits created on that day.

🤔 a VM template is usually 60–70 GB in size (macOS, Xcode and caches preloaded). You need storage for the VM templates in Anka Registry (around 1 TB should be enough). This Anka Registry storage should be easily accessible from all nodes in the cluster. If you have Mac hardware in more locations, you need to set-up Anka Registry in those locations and sync them. The nodes don’t fetch VM templates from the registry storage for each VM provisioning job request - only when the updated VM template is pushed to the nodes.

There is no integration without hiccups and Anka is no exception. For example, the new Xcode build system was crashing due to the Anka client updating the time in the VM. Veertu provided fast support and so we were able to solve this problem very quickly. In the end we could scale out our 8 iOS resources to 20.

Before we implemented our fix, queue waiting time could be up to 80 minutes. With the current setup we’ve managed to reduce this to 7 minutes. Our team can also now consistently manage iOS CI environments. Due to 40–50 simultaneously running executors, we had to update and move Jenkins to AWS, but that is another story.

In conclusion…

Our old Mac hardware park could not handle the increase in the number of developers contributing to our app code. To solve this, we first moved our Android executors to the AWS cloud. Then we created a private Mac cloud. We learnt about the trade-offs and ended up with a scaled and consistent CI system. Nowadays our developers can integrate their changes without waiting hours to do so.

Are you up for tackling similar challenges? Could you improve on our solution? We are hiring, please see our jobs page.

About the author: Péter Adam Wiesner

Hi, I’m a Senior Software Engineer at Skyscanner. I was part of the team who created the new Skyscanner iOS application, and I worked on numerous features of the app during my five years on the project. Nowadays I strengthen the Mobile DevOps team, where I use my mobile knowledge to help other developers deliver products to our users.

Peter Wiesner; Senior Software Engineer at Skyscanner