Our Journey Migrating the iOS CI to GitHub Self-Hosted Runners on AWS Mac ARM Instances

Embracing change for improved performance and dramatic cost reduction

Published in

bitso.engineering

6 min readJul 13, 2023

In today’s fast-paced and constantly evolving world of software development, selecting the right tools can make all the difference in your team’s efficiency, productivity, and, ultimately, your overall success. With countless options available, making informed choices is crucial to stay ahead of the competition.

At Bitso, we recently embarked on an ambitious project that resulted in a dramatic improvement in our build times by 60%, from 30 minutes to around 12 minutes, and a reduction in costs by a staggering 68%! Our project involved migrating our iOS build Continuous Integration (CI) pipelines from Bitrise to GitHub’s Self-Hosted Runners (SHR) on Amazon Web Services (AWS), specifically using ARM instances equipped with Apple Mac Mini with M1 CPUs; see Figure 1.

An Apple Mac Mini integrated into a rack at an AWS datacenter — Figure 1. An Apple Mac Mini integrated into a rack at an AWS data center.

While GitHub’s VM images are starting to support ARM architecture, it still needs to be officially done and requires considerable modification to Packer templates, scripts, and packages. Furthermore, the powerful and energy-efficient M1 chips provide a compelling advantage. Our desire to reduce costs, consolidate vendors, and leverage the increased processing power and energy efficiency of Apple’s M1 processors drove the change of our build environment.

This post describes our journey, the challenges we faced, the victories we celebrated, and the immense learning we acquired along the way.

The Engine Behind Our Transition: Github’s Packer Template

Before starting this journey, we experimented with the GitHub Hosted Runners (GHR) using the x86_64 architecture. These runners, while effective, proved to be quite costly. After the initial 3,000 free minutes offered by GitHub each month, the charge is $0.32/minute for the 12 vCPUs. Moreover, the build times were far from our target of 10-15 minutes, often reaching close to an hour. These factors compelled us to seek alternatives, leading us to transition to Self-Hosted Runners (SHR) on AWS. We specifically opted for the ARM instances, attracted by their promise of better performance and cost efficiency. This selection marked the beginning of our journey toward a more optimized CI process.

With our objectives in mind, our journey began by manually setting up a test instance and runner to gauge feasibility and understand the nuances of the build process. Once we had successful builds, we created an Amazon Machine Image (AMI) from that instance and replicated it for broader-scale testing.

We forked GitHub’s runner-image repository after successfully testing the iOS application building jobs on the test instances. We also modified the Packer template to build an AMI compatible with AWS instances and the ARM64 architecture of the Apple M1 processors. This process demanded a deep dive into the inner workings of macOS and a fine understanding of the nuances of iOS builds.

We actively follow GitHub’s ongoing efforts to adapt the Packer template to the ARM architecture. While we have made significant strides with our adaptations, we believe that, given its broad control and access, GitHub can eventually create a cleaner and more efficient version of this template. It’s an exciting development we eagerly anticipate, knowing it will further enhance our CI/CD pipeline and efficiencies.

Trials, Triumphs, and Key Challenges

While building the AMI with Packer, we began by solving the difficulties macOS keychain permissions presented and the limitations Apple’s System Integrity Protection (SIP) posed. SIP, while designed to protect the system, hindered our ability to customize certain system files, such as the TCC.db (Transparency, Consent, and Control) file, which manages the privacy of the applications in the system. Navigating these restrictions was our first major hurdle. Unfortunately, we haven’t found any workaround for this in macOS 12, and permissions like granting access to disk or accessibility features have to be given directly via macOS UI, to which we connect remotely during the build.

The main challenges emerged from the time-intensive testing process of our modified Packer template and ARM-adapted provisioning scripts. With each build taking around 3–4 hours, the testing phase was lengthy at the end and demanded patience.

We acknowledge that our testing process could have been more optimal and faced additional complexities with AWS instance termination. When we terminated an instance, the dedicated host running the instance would take a few hours to become available again. This delay extended our testing cycle, further increasing the time it took to get the desired results and feedback.

Efficiently managing AWS resources such as dedicated hosts was a learning moment that pushed us to deepen our understanding of macOS, AWS, and iOS build processes.

The Migration: Strategy and Results

After successfully building the AMI with GitHub’s Packer template, our next task was establishing the necessary infrastructure to deploy new instances.

Terraform, an Infrastructure as Code (IaC) tool used extensively at Bitso for infrastructure management, emerged as the perfect solution. It facilitates defining and provisioning the requisite infrastructure using a declarative configuration language, enabling us to effectively leverage the newly created AMI. Terraform integrates smoothly with our existing practices, ensuring our infrastructure is reliably and repeatedly reproducible, thereby reducing errors from manual configurations. This was an essential step to streamline the setup process and make it as smooth as possible.

The transition to self-hosted runners was remarkably straightforward with the new AMI and Terraform setup. GitHub runners registered and seamlessly took up jobs; see Figure 2. We kept Bitrise alongside for a month to ensure everything worked correctly before disconnecting it from our repositories.

Figure 2. List of some of the runners registered with Github.

The Learning Curve and the Team Impact

This transition was a technological shift and a significant learning experience. Although I have years of experience with macOS systems, this project presented a unique challenge that required me to delve much deeper into the system than ever before. As the leading engineer on this project, I had to broaden and deepen my knowledge in various aspects, including Packer, AWS, AMI, iOS builds, and GitHub runners. Benefiting from our senior and staff engineers’ advice, I navigated this complex project and gained a deeper understanding of these tools and processes.

I look forward to sharing this knowledge with my team in the upcoming days. Meanwhile, our iOS team benefits from the lower build times, enhancing their efficiency and productivity.

Future Horizon: Planned Improvements

As we continue to adapt and improve our setup, we look forward to AWS improvements on the Mac mini instances and GitHub’s work to adapt the Packer template to ARM architecture. We are confident these advancements will enable us to optimize our setup further.

Our ambition to enhance and streamline our setup has led us to plan further automation of the AMI creation process for AWS instances. We expect this automation plan, based on GitHub’s Packer template, to increase efficiency significantly. Alongside these in-house improvements, we are exploring either open-sourcing our custom template or directly contributing to GitHub’s official repository. This way, we can lend our support to other teams encountering similar challenges, fostering a sense of community and driving the continuous improvement of these tools.

We are also focused on enhancing our autoscaling capabilities. This aligns with our cost-saving measures and makes our building environment safer and cleaner. We can further reduce costs by dynamically adapting the number of runners using the job queue size. However, an essential factor is that AWS dedicated hosts have a minimum rental period of 24 hours. This requirement is a constraint we must factor into our autoscaling strategy. Still, we are confident that we can balance cost efficiency and optimal performance with ongoing refinements. Additionally, we are looking into implementing Spotify’s XCRemoteCache. This tool could allow us to cache some objects for dev builds, meaning we wouldn’t have to rebuild everything every time but only what has been changed in the code.

A Note on Co-Authorship

This article has been a collaborative effort, co-written with ChatGPT, OpenAI’s language model. The purpose of this collaboration was to test the capabilities of AI in drafting technical content, a testament to the potential power of AI in various facets of our work. We are continually exploring how this powerful tool can be integrated into our operations at Bitso to drive efficiency and innovation.

We welcome your thoughts and experiences on similar transitions and encourage you to follow Bitso for more updates on our journey.