iOS Build Infrastructure

How we configured our Mac minis to run builds

Written by Michael Tauraso.

Square has dozens of iOS engineers working on Square Register. Additionally, Square publishes multiple iOS apps to the App Store and distributes several iOS apps internally. To support the work of our application programmers, we’ve invested in a continuous integration and test cluster.

We’ve recently replaced the backend of this cluster so that it could scale beyond 8 machines. As a direct result of this commitment, over the last year we’ve increased the reliability of builds and have nearly quadrupled the quantity of tests we’re able to run for every engineer’s contribution to the Register repository. This had a noticeable effect on the velocity of our feature teams and has informed us of many show-stopper regressions before we ship code.

iOS applications are a bit of a special snowflake when it comes to continuous integration. This post is about the backend infrastructure choices we made to support our iOS team.

Jenkins

We use Jenkins, along with a custom plugin to Stash, to form the backbone of our CI system. This plugin is similar to Palantir’s Stashbot; it acts as a notifier to Jenkins when various sorts of git pushes occur. The build slaves in our build cluster connect to the Jenkins master using the swarm plugin. All builds run on the slave machines. The master Jenkins instance is responsible for serving the Jenkins web interface internally and moving artifacts and logs to and from the build slaves.

Hardware

iOS applications must be built on OSX. We briefly considered using virtualized build infrastructure due to its many advantages; however in practice, we found that it was lacking in performance for us. Builds usually require many reads and writes of small files. This pushes the scheduler and the filesystem driver rather hard. With a virtualization system, there are two schedulers and two disk drivers. On Linux these systems work together well, but we found this not to be the case on OSX.

After some testing we settled on the following bare-metal configuration, which is currently ~$1200 per build slave:

  • 2.6Ghz Dual Core Intel Core i5
  • 16GB RAM
  • 256GB PCIe storage

We also purchase fake hdmi monitors so that tests using the iOS simulator will use the graphics card rather than software rendering. This optimization provides a 10% speedup to our tests.

We wish it were still possible to purchase the 2012 quad-core configuration from the Apple store. Currently only the dual-core configuration available must be purchased as new. Our workload is highly parallelizable and having more compute units in the mini form-factor is desirable.

Configuration Management

In order to get reliable and repeatable build results out of the machines, we wanted to make sure they are configured identically. We use DeployStudio to image new Mac minis over the network, and then do final setup with Ansible. Ansible allows us to have a checked-in record of our machine’s configuration, as well as what machines we have in service at any given time. This is invaluable when we’re testing new configuration or trying to understand problems that occur on the build slaves.

Changes to the ansible repo also undergo code review and testing. Even with complex playbooks, we insist on writing them in idempotent pieces so we can quickly run the whole playbook against the entire cluster and cause no changes.

Software

We’ve found 10.10.4 Yosemite to be the most stable version of OSX with Xcode 6 and its simulators. At time of writing, we’re using Xcode 6.1.1 on our cluster and are upgrading Xcode 6.3.2 piecemeal — testing reliability as we go.

Beyond installing Xcode, OSX, and the Xcode command line tools, there are a number of system setup tweaks that make Xcode and the iOS simulators more reliable on the command line than they are out of the box. We want to be able to get reliable behavior out of OSX and Xcode, and these configuration tweaks have helped us do so.

Login and the GUI

We have all of our build slaves configured to automatically log-in the build user. This provides us with a GUI context for running the iOS simulator and means that we can start the Jenkins agent as a user Launch Agent. This allows any processes started by Jenkins access to that gui context. Many issues running the iOS simulator remotely stem from lack of API facilities found in a usual OSX login.

We augment this by running caffeinate on the Jenkins slave agent, so that the mini never goes to sleep. There are many sleep settings on OSX, and new settings are added in each release. We’ve found setting a power assert using caffeinate to be the most reliable way to keep a build slave from going to sleep.

The build user being automatically logged means the build user’s password is easily discoverable on the filesystem. For build slaves that create release binaries, we disable passworded SSH access and turn off VNC entirely. A trusted group at the company can access the code-signing machines to deploy our code-signing certificates, but no other remote access is permitted. The build user has sudo access to the slave itself, and we turn on DevToolsSecurity so that Xcode can function normally.

OS X Accessibility Access

Xcodebuild uses the accessibility access hooks in OSX to control the iOS simulator. The manner that OSX gives access to these hooks has gained application-level granularity with Mavericks (10.9). Previously it was a system-level parameter that allowed all processes to access one another. If access is not granted, it can result in maddening issues where xcodebuild will run flawlessly when ran from Terminal.app, but will fail inexplicably when launched from Jenkins or SSH.

Mavericks (10.9) and Yosemite (10.10) determine if a process can access accessibility hooks via the parentage of the accessing process. By putting launchd in the list of allowed processes, processes launched via SSH or Jenkins have access to the accessibility hooks across the system. To do this you can modify the TCC database, per this gist. A reboot is required to make the change take effect.

Connecting to Jenkins

As mentioned before, we use the Jenkins Swarm plugin to configure our slave machines. This is opposite of the normal Jenkins usage, in that the slaves each connect to the master node rather than being managed from the Jenkins configuration. This keeps us from having to maintain configuration on the master when adding or removing slave nodes, or when changing their IPs or DNS names.

The main thing to watch out for in the swarm plugin is a mismatch in the Jenkins remoting component. This often occurs during Jenkins upgrades and requires a push of a new swarm jar to the slaves. Because slaves that fail to connect to Jenkins simply don’t appear in the list, it’s advisable to monitor the machines externally to make sure they remain connected.

Current Work

We maintain a high bar of build stability for the cluster, and we’re currently operating 24 Mac minis. There are some aspects of maintaining the cluster that aren’t quite smooth yet.

Xcode upgrades are still a bit of work. It’s difficult to predict the level of stability we can expect from new releases. Most often the higher the point-release number the more reliable Xcode is in a larger number of situations. We often find ourselves filing radars and figuring workarounds for the behavior of new upgrades. When Xcode 6.1 was new, we decided to only have a single configuration of Xcode on a given machine (as there were some bugs with the beta interoperating with old versions of Xcode). This support has gotten better, and we’re changing our config to allow multiple versions of Xcode on a single build slave. The support for multiple iOS simulated devices on a single host has also gotten easier over time, and we’re working to make that airtight in our installation as well.

We hope this information is useful to folks building continuous integration systems for iOS. As it continues to change, we’ll be updating The Corner with our current configuration practices. We’re also interested in your experiences with Apple’s software construction platform. Stackoverflow questions and blog posts have been some of our best teachers through this setup process, and we’re grateful to everyone who’s taken the time to educate us. Also, if you’re interested in build and release tools and passionate about Apple’s platform, drop us a line, we’re hiring.