CI Workflow with Puppet, Packer and Vagrant

Requiem for a CI Workflow

The unfortunate timing of this post is that it’s being delivered as we deprecate our current workflow, and move forward with Docker instances managed through Kubernetes. However that makes it very easy to identify the shortcomings of this solution in comparison. So before I take the shortcoming of this approach, i’ll detail the components that enabled our current platform to be released.

Vagrant

A technology that was an absolute saviour when we first introduced it at Rightster. It’s support for multiple operating systems makes it very hard to replace within the company. We’ve investigated replacements involving SALT, LXC Containers and Docker — but it’s very easy to build a complex development environment in Vagrant.

The eventual gain from switching to Vagrant from VirtualBox is that the time for us to onboard new contractors fell from up to 3 days to around 10 minutes. Having such a short time to get a VM running means if someone ‘borks’ their VM and a ‘vagrant provision’ doesn’t fix it, then destroying it and starting again isn’t such a big deal.

Puppet

Another technology that has proved difficult to replace due to it’s suitability for building complex VM’s that do ‘lots of things’. Sure we had our issues, mostly around package management and tags of projects with dependancies which are not fixed — but it wasn’t painful enough for us to up-sticks and move to another solution (although we did investigate alternatives).

Packer

Packer has been absolutely essential in our development workflow — although most Engineers have never had to look it in the eye. Before Packer, our provisioning times were roughly between 30 minutes and 1 hour, and were much more prone to fail provisioning.

Packer allowed us to test our puppet & kickstart changes as well as create VM’s which were already provisioned — so Vagrant essentially just needs to mount the shared folders for the development environment to start running.

Jenkins Slaves

With our ‘MegaBox’ VM’s that can test any project, it was an easy move to create Jenkins slaves based on these Vagrant boxes. To reduce some of the complexity in managing a fleet of CI Slaves we using ‘ElasticBox’ to manage these machines.

Rolling out update to the slaves is essentially clicking a button in the ElasticBox UI, and watching as the latest puppet manifests are applied.

Criticisms

As mentioned in the beginning, we’re currently changing this workflow (particularly the Jenkins Slaves), here a few of the major shortcomings with this approach:

Issue One — Slow to Add New Machines

Resources are definitely not unlimited. Remembering the magic bullet it took last time to remove UEFI from that HP Windows box a month ago isn’t really suitable for automation. This being the tip of the iceberg, if you’re not in the cloud then be prepared to lose days gaining knowledge in bespoke errors that you’ll never see again.

Issue Two — Not all slaves are equal

If we were lucky enough for one of the slave hosts to have 16GB of RAM and an i7 inside then we could generally run three VM inside that which could run the majority of our build pipeline on them, or one VM configured to run the Regression Tests in Parallel.

This combined with issue one, leads to very clunky machines that are basically superheroes, you start picking favourites — and eventually you reach the point where one slave is configured to do all the heavy lifting, until it all ends in tears as the OS starting killing processes at different stages of the build execution.

Issue Three — Scaling is Everything

This is similar to issue one, but here’s the situation — you need to burst out 200 concurrent Jenkins jobs to get your regression tests to pass in under 5 minutes. Having 10 VM’s that live for months will cause you headaches, but going beyond that you’re realistically going to need a team to maintain your CI environments.

Most ideally, you want these images to be managed by your developers. Burning these on a regular basic (the machines, not the developers..) makes life a lot easier. This is where having a Docker container ‘That Works” is perfect for scaling CI.

Show your support

Clapping shows how much you appreciated Paul Sellars’s story.