Running Distributed Jenkins’ Build Executors on GKE
Background
Instead of installing and running both the Jenkins master and build agents on a single server, it’s common to execute build agents on different servers to help distribute workloads. A project our team is working on is being hosted in Google Cloud (GCP). So, although not required, it made sense to install Jenkins in GCP as well. I wrote another article outlining our CI/CD process, but did not go into any detail on the infrastructure behind our Jenkins environment.
At first we ran Jenkins on a single VM using the Bitnami image. This was a great start for our team, but we found we were just increasing disk space, and memory too frequently. We looked to GKE to help us with scaling.
There is already helpful documentation on the cloud.google.com website to get this all set up. However, at the time of this writing, due to constant enhancements with the Kubernetes Plugin for Jenkins, these instructions have become a little out of date. My purpose in writing this is to help others, including myself, that may have struggled and wondered why their Jenkins set up isn’t working after following the documented instructions.
We followed the documentation on both https://cloud.google.com/solutions/jenkins-on-kubernetes-engine-tutorial and https://cloud.google.com/solutions/configuring-jenkins-kubernetes-engine. We began to port over a few of our smaller services set up as Multibranch Pipeline builds, to start testing the new Jenkins infrastructure.
Our project tech stack required certain products to be installed on our build executors.
Tech Stack
Additionally, we had some plugins that also needed to be added for NodeJS, Slack and some additional Pipeline utilities. Also, important to mention is that many of the Plugins in the helm chart referenced in the documentation were way behind in versions, so we just upgraded these.
Everything seemed to be set up according to the documentation and the plugins and tools installed on our prior Jenkins VM. The build executors seemed to be provisioning new pod workloads for concurrent builds which was fun to see.
Unfortunately our builds were still failing.
Initial Symptoms
We ran into some of the following symptoms.
- ClosedChannelExceptions using various declarative pipeline plugin commands.
- Some forced hard-coded paths for things like “curl” I had to use with my original Bitnami Jenkins image no longer worked.
- mvn not found on builds requiring Maven.
- node not found on builds requiring npm.
- Unable to find Cloud SDK installation folder when running mvn appengine:stage command after resolving Maven issues.
Resolution
Maybe most important to note is that the helm chart that is part of the Google Cloud documentation refers to an older version of the Kubernetes Plugin for Jenkins. When you upgrade this plugin, it will add a second mapping to the Cloud configuration section in Configure System.
This section appears on top of the section outlined in the documentation and added a lot of confusion. It isn’t obvious there are two similar sections there with how the configuration screen is laid out. This duplication would have occurred if I first upgraded the Kubernetes Plugin before following the instructions on filling out the Kubernetes section.
The other odd thing was that there is an implicit container image added via the plugin, jenkins/jenkins:alpine in a container named jnlp. The Docker Image value specified in the documentation refers to a gcr.io/cloud-solutions-images/jenkins-k8s-slave image. The documentation is a little confusing because the screen shots don’t match the more up to date version of the Kubernetes plugin, but I chose to name that container template “default”.
I found out that by default, my build executors were ignoring that container image and using the implicit one. The only way to override this I found was to rename my Container Template name to default and also make sure the namespace was set to default because this is the namespace the K8s services in our cluster run under.
In the end, we found we needed a combination of what was in this section and what was outlined in the documentation (and a few other small changes).
I’m sure there are other configurations that work, but I’m also sure there are many that don’t. I hope this helps anyone that is having an issue getting Jenkins up and running on GKE!
Please feel free to post questions in the comments.