A Clone Army of the Jenkins Republic

Dave North
Signiant Engineering
5 min readMay 11, 2017

I was talking to some folks a few weeks ago about scaling a Jenkins build system. I was asked if I ever had an experience managing hundreds of Jenkins build nodes. I said I didn’t but I did have experience running an infinitely scaleable set of Jenkins build nodes. We had a long conversation about how this was accomplished and I figured it may be worthwhile a quick write up on how we do this.

In the beginning….

When we first moved our build system to Jenkins at Signiant, we stuck with the formula that was used before — “static” nodes for Linux/Mac/Windows. This all worked fine but the time to spin up a new build node was way too long as well as the fact that the build nodes could become “contaminated”. By contaminated, I mean software was installed over time and the node became somewhat organic.

The other issue is that the build system became somewhat of a “black box” to the engineering teams. There were many incidents of “it builds on my machine but not in Jenkins” that happened as well as cases where how the build worked and what versions of tooling were on the machines was a mystery.

Finally, we found that expanding capacity of the build system under Linux became a chore. Most of our projects are nodejs or Java and build under Linux and with a greater adoption of a micro services architecture, the number of concurrent builds was always on the rise. We wanted to make it easy to add capacity.

Round 1 — Docker containers running on Docker hosts

We did some testing using the Jenkins docker plugin which was very promising and what we ended up using (and still using today). This plugin essentially allows you to define docker images and attach Jenkins labels to them. You then assign jobs to the labels and the plugin intercepts the use of this label and creates a new node on the fly. Further, when you define the docker images to use, you can do things like mount volumes, set launch configs, etc.

In the example above, we are saying to pull an image (hosted in dockerhub), give it a node label of docker_centos6_java7_plugin and mount in some volumes (these are under container settings). One great thing is that the actual container image itself really doesn’t contain any proprietary information. Everything specific to your organization (secrets, configs, etc.) can be mounted into the container. In fact, all of our build images are public on github (example image)

As to where these images run, we setup a static set of hosts (VMware VMs in our case) which we then had to define each docker image as being able to run on this host. It all worked well but became a pain when we had to add a new image since we had to add it to each docker host definition in Jenkins. Still, we essentially had an auto-scaled set of Linux build nodes with self-documenting contents.

Round 2 — Docker containers running on Docker Swarm

Things were running well but it was becoming painful to make changes to the container definitions in Jenkins since they were essentially duplicated for each docker host. Further, adding a new docker host to increase capacity was easy enough at the VM level (using templates) but in Jenkins, it took some time to duplicate the definitions.

Enter Docker Swarm

We’d had some experience with Docker on clusters as we were early adopters of the AWS EC2 Container service (which I highly recommend) but my previous experience setting up Docker Swarm was less than clean sailing. However, the Docker folks had really simplified the cluster setup in 1.11 (and have gone even further in 1.12) so I setup a Swarm cluster. In Jenkins then, the swarm cluster can be referenced as one big Docker host like thus:

For the actual nodes in the Swarm cluster, we used a simple VMWare template so adding a new node to the cluster is as simple as starting a new VM from the template, changing the name and telling it to join the cluster. We don’t have to touch Jenkins to expand our build node capacity now at all.

For common resources, we use a set of common NFS mounts mounted to each Swarm node. This holds things like npm and maven caches, config files, etc. This way, the nodes are truly disposable and all have the same resources available to them.

We’ve found that the requirements for different build types has grown over the years but this is now easy to satisfy. All of the “machines” that do the builds are docker images with the Dockerfile in github and built using Docker hub. Anyone in the company can run a build on their own machine by pulling the same docker image that Jenkins runs. Further, the question of “what version of node is on the build machine?” has gone away — it’s right there in the Dockerfile.

Round 3 — docker future

So what’s next? While all of this works very well for builds on Linux (basically, anything nodejs, Java or native Linux), we do have to build on Windows and Mac. Luckily, Docker Windows containers are now available for Windows 2016. This is one of our next areas of investigation. We also want to “dockerize” our Jenkins master and move to having a pool of master servers (a project we have tagged “Make Jenkins great again”).

--

--

No responses yet