If you’re an iOS developer, then you’re probably familiar with the difficult task of developing CI pipelines for your app. Most of the dark corners involved in developing CI Pipelines for iOS could be avoided by outsourcing it to SaaS CI/CD solutions such as Travis, Circle or Bitrise.
But what happens if you have to build 30 applications? Or more? At Soluto, we’re building a lot of applications, and I do mean A LOT. Having to build more than 30 apps, and having that number rapidly rise, meant that using a SaaS could be costly and cause us to lose complete control of our testing environments. In order to support this kind of scale, we decided to use TeamCity as our CI tool. We chose on-premise mac vms as our build machines for TC.
We were woefully configuring our mac agents manually. This meant there was a greater chance for mistakes and a greater likelihood for inconsistencies between the various agents.
This caused undocumented and unreliable changes to the agent configuration, and in more severe cases — introduced drifting between the different agents (i.e. if someone accidentally updated only some of the agents, but not all of them).
In order to solve these serious issues, we needed to find a more reliable process for managing configurations. That’s exactly what we did. But not before taking a few wrong turns…
Into the woods:
The first approach that came to mind for solving this issue was classic Configuration Management. We were already utilizing Puppet heavily for our Linux machines, so we figured we could do the same for our mac agents. After all, they’re both *nix systems. So what could go wrong?
The answer was everything.
This approach failed miserably, as puppet modules turned out to be very poorly maintained. The only module that worked for us was the puppet-homebrew plugin and even then, it only worked after days of pulling our hair out, eventually reaching out to the (very helpful) module maintainer, and ultimately changing a system wide parameter on our puppet master.
The most painful example of what went wrong was trying to upgrade the machines’ ruby version using RVM and rbenv. Both solutions are well-maintained, widely used, easily configured manually, and were already ported into a puppet module like this one. But we needed them to support Darwin, and in the end, RVM didn’t deliver on this promise, and rbenv didn’t even claim to.
After the various modules needed for configuring the agents failed to accommodate our needs, we resorted to running bash commands which produced flaky results that were inconsistent with the manual configuration of the other agents.
Conclusion — Darwin is a bitch when it comes to configuration management. Reddit agrees.
We decided to cut our losses and stop the current iteration. The fruits of our labor seemed disproportionate to the sweat of our brow, and we figured there has to be an easier way to do this.
Into the light:
Our next approach consisted of separating our problem into two, more easily manageable subproblems.
- Using configuration-as-code for configuring the machines
- Making sure the agents are identical to each other
We realized that the configuration for the agents is fairly simple and rarely changes. Therfore, we decided to start with the second, more painful problem — eliminating configuration drifts between the agents.
We took a snapshot of the current agent state and used the teamcity vmware plugin to spawn agents upon requests whenever a build enters the queue.
The plugin worked great with our linux and windows agents. We naturally asssumed it would work just fine on mac agents… right?
Turns out one of the recent upgrades of vmware-tools changed the structure of it’s CLI. The update removed a file crucial to the proper function of the plugin. Sadly, the plugins’ code did not accommodate this change.
We’ve contacted the maintainer of the plugin. He confirmed the issue and implemented a fix to the code. However, this just brought us to another error.
A very long “it works on my machine” correspondence ensued, which ended with jetbrains guy suggesting a simple workaround (creating a bash script to simulate the old vmware-tools function, and pass it on to the upgraded version) that finally worked and allowed us to use the plugin.
So what did we achieve?
Our agents are now all spawned from a single image. Whenever we need to add a dependency or upgrade a component, we bake it into the image and change the version launched by the CI server. Goodbye drifting between agents!
So, what are we still lacking?
We still need to deal with our first problem — configuration-as-code for our agents. The ideal candidate for this job would probably be Packer, which would allow us to maintain the image and substitute various components without having to reiterate manually through the entire image creation.
We dabbled with this a bit using this template for osx packer images, but quickly found out that the task is not as simple as it is with Linux images (the motif of our tale). The task is quietly lurking in our backlog, waiting for the right moment to strike. Stay tuned as we send updates from our next foray into mac vms, hope you enjoyed the read.
Originally published at https://blog.solutotlv.com on September 10, 2017.