This is a tale of Dailymotion’s journey from Jenkins to Jenkins X, the issues we had, and how we solved them.
At Dailymotion, we strongly believe in devops best practices, and are heavily investing in Kubernetes. Part of our products are already deployed on Kubernetes, but not all of them. So, when the time came to migrate our ad-tech platform, we wanted to fully embrace “the Kubernetes way” — or cloud-native, to be buzzword-compliant! This meant redefining our whole CI/CD pipeline and moving away from static/permanent environments, in favour of dynamic on-demand environments. Our goal was to empower our developers, reduce our time to market and reduce our operation costs.
Our initial requirements for the new CI/CD platform were:
- avoid starting from scratch if possible: our developers are already used to using Jenkins and the declarative pipelines, and those are working just fine for our current needs
- target a public cloud infrastructure — Google Cloud Platform — and Kubernetes clusters
- be compatible with the gitops methodology — because we love version control, peer review, and automation
There are quite a few actors in the CI/CD ecosystem, but only one matched our needs, Jenkins X, based on Jenkins and Kubernetes, with native support for preview environments and gitops.
Jenkins on Kubernetes
The Jenkins X setup is fairly straightforward, and already well documented on their website. As we’re already using Google Kubernetes Engine (GKE), the
jx command-line tool created everything by itself, including the Kubernetes cluster. Cue the little wow effect here, obtaining a complete working system in a few minutes is quite impressive.
Jenkins X comes with lots of quickstarts and templates to add to the wow effect, however, at Dailymotion we already have existing repositories with Jenkins pipelines that we’d like to re-use. So, we’ve decided to do things “the hard way”, and refactor our declarative pipelines to make them compatible with Jenkins X.
Actually, this part is not specific to Jenkins X, but to running Jenkins on Kubernetes, based on the Kubernetes plugin. If you are used to “classic” Jenkins, with static slaves running on bare metal or VMs, the main change here is that every build will be executed on its own short-lived custom pod. Each step of the pipeline can then specify on which container of the pod it should be executed. There are a few examples of pipelines in the plugin’s source code. Our “challenge” here was to define the granularity of our containers, and which tools they’d contain: enough containers so we can reuse their images between different pipelines, but not too many either to keep maintenance under control — we don’t want to spend our time rebuilding container images.
Previously, we used to run most of our pipelines steps in Docker containers and when we needed a custom one, we built it on-the-fly in the pipeline, just before running it. It was slower, but easier to maintain, because everything is defined in the source code. Upgrading the version of the Go runtime can be done in a single pull-request, for example. So, having to pre-build our container images sounded like adding more complexity to our existing setup. It also has a few advantages: less duplication between repositories, faster builds, and no more build errors because some third-party hosting platform is down.
Building images on Kubernetes
Which bring us to an interesting topic these days: building container images in a Kubernetes cluster.
Jenkins X comes with a set of build packs, that uses “Docker in Docker” to build images from inside containers. But with the new container runtimes coming, and Kubernetes pushing its Container Runtime Interface (CRI), we wanted to explore other options. Kaniko was the most mature solution, and matched our needs / stack. We were thrilled…
…until we hit 2 issues :
- the first one was a blocking issue for us: multi-stage builds didn’t work. Thanks to Google we quickly found that we were not the only ones affected, and that there was no fix or work-around yet. However, Kaniko is developed in Go, and we are Go developers, so… why not have a look at the source code? Turns out that once we understood the root cause of the issue, the fix was really easy. The Kaniko maintainers were helpful and quick to merge the fix, so one day later a fixed Kaniko image was already available.
- the second one was that we couldn’t build two different images using the same Kaniko container. This is because Jenkins isn’t quite using Kaniko the way it is meant to be used — because we need to start the container first, and then run the build later. This time, we found a workaround on Google: declaring as many Kaniko containers as we need to build images, but we didn’t like it. So back to the source code, and once again once we understood the root cause, the fix was easy.
We tested a few solutions to build our custom “tools” images for the CI pipelines, in the end we chose to use a single repository, with one image —
Dockerfile — per branch. Because we are hosting our source code on Github, and using the Jenkins Github plugin to build our repositories, it can build all our branches and create new jobs for new branches on webhook events, which make it easy to manage. Each branch has its own
Jenkinsfile declarative pipeline, using Kaniko to build the image — and pushes it to our container registry. It’s great for quickly adding a new image, or editing an existing one, knowing that Jenkins will take care of everything.
The importance of declaring the requested resources
One of the major issue we encountered with our previous Jenkins platform, came from the static slaves/executors, and the sometimes-long build queues during peak hours. Jenkins on Kubernetes makes it easy to solve this issue, mainly when running on a Kubernetes cluster that supports cluster autoscaler. The cluster will simply add or remove nodes based on the current load. But this is based on the requested resources, not on the observed used resources. It means that it’s our job, as developers, to define in our build pod templates, the requested resources — in term of CPU and memory. The Kubernetes scheduler will then use this information to find a matching node to run the pod — or it may decide to create a new one. This is great, because we no longer have long build queues. But instead we need to be careful in defining the right amount of resources we need, and updating them when we update our pipeline. As resources are defined at the container level, and not the pod level, it makes things a little more complex to handle. But we don’t care about limits, only requests. And a pod’s requests are just the addition of the requests of all containers. So, we just write our resources requests for the whole pod on the first container — or on the
jnlp one — which is the default.
Here is an example of one of our
Jenkinsfile, and how we can declare the requested resources:
Preview environments on Jenkins X
Now that we have all our tools, and we’re able to build an image for our application, we’re ready for the next step: deploying it to a “preview environment”!
Jenkins X makes it easy to deploy preview environments, by reusing existing tools — mainly Helm, as long as you follow a few conventions, for example the names of the values for the image tag. It’s best to copy/paste from the Helm charts provided in the “packs”. If you are not familiar with Helm, it’s basically a package manager for Kubernetes applications. Each application is packaged as a “chart”, which can then be deployed as a “release” by using the
helm command-line tool.
The preview environment is deployed by using the
jx command-line tool, which takes care of deploying the Helm chart, and commenting on the Github pull-request with the URL of the exposed service. This is all very nice, and worked well for our first POC using plain http. But it’s 2018, nobody does http anymore. Let’s encrypt! Thanks to cert-manager, we can automatically get an SSL certificate for our new domain name when creating the ingress resource in Kubernetes. We tried to enable the
tls-acme flag in our setup — to do the binding with cert-manager — but it didn’t work. This gave us the opportunity to have a look at the source code of Jenkins X — which is developed in Go too. A little fix later we were all good, and we can now enjoy a secured preview environment with automatic certificates provided by let’s encrypt.
The other issue we had with the preview environments is related to the cleanup of said environments. A preview environment is created for each opened pull-request, and so should be deleted when the pull-request is merged or closed. This is handled by a Kubernetes Job setup by Jenkins X, which deletes the namespace used by the preview environment. The issue is that this job doesn’t delete the Helm release — so if you run
helm list for example, you will still see a big list of old preview environments. For this one, we decided to change the way we used Helm to deploy a preview environment. The Jenkins X team already wrote about these issues with Helm and Tiller — the server side component of Helm — and so we decided to use the
helmTemplate feature flag to use Helm as a templating rendering engine only, and process the resulting resources using
kubectl. That way, we don’t “pollute” our list of Helm releases with temporary preview environments.
Gitops applied to Jenkins X
At some point of our initial POC, we were happy enough with our setup and pipelines, and wanted to transform our POC platform into a production-ready platform. The first step was to install the SAML plugin to setup our Okta integration — to allow our internal users to login. It worked well, and then a few days later, I noticed that our Okta integration was not there anymore. I was busy doing something else, so I just asked my colleague if he’d made some changes and moved on to something else. But when it happened a second time a few days later, I started investigating. The first thing I noticed was that the Jenkins pod had recently restarted. But we have a persistent storage in place, and our jobs are still there, so it was time to take a closer look! Turns out that the Helm chart used to install Jenkins has a startup script that resets the Jenkins configuration from a Kubernetes
configmap. Of course, we can’t manage a Jenkins running in Kubernetes the same way we manage a Jenkins running on a VM!
So instead of manually editing the
configmap, we took at step back, and looked at the big picture. This
configmap is itself managed by the jenkins-x-platform, so upgrading the platform would reset our custom changes. We needed to store our “customization” somewhere safe and track our changes.
We could go the Jenkins X way, and use an umbrella chart to install/configure everything, but this method has a few drawbacks: it doesn’t support “secrets” — and we’ll have some sensitive values to store in our git repository — and it “hides” all the sub-charts. So, if we list all our installed Helm releases, we’ll only see one. But there are other tools based on Helm, which are more gitops-friendly. Helmfile is one of them, and it has native support for secrets, through the helm-secrets plugin, and sops. I won’t go into the details of our setup right now, but don’t worry, it will be the topic of my next blog post!
Another interesting part of our story is the actual migration from Jenkins to Jenkins X. And how we handled repositories with 2 build systems. At first, we setup our new Jenkins to build only the “jenkinsx” branches, and we updated the configuration of our old Jenkins to build everything except the “jenkinsx” branch. We planned to prepare our new pipelines in the “jenkinsx” branch, and merge it to make the move. For our initial POC it worked nicely, but when we started playing with preview environments, we had to create new PR, and those PR were not built on the new Jenkins, because of the branch restriction. So instead, we chose to build everything on both Jenkins instances, but use the
Jenkinsfile filename for the old Jenkins, and the
Jenkinsxfile filename for the new Jenkins. After the migration, we’ll update this configuration, and renaming the files, but it’s worth it, because it enables us to have a smooth transition between both systems, and each project can migrate on its own, without affecting the others.
So, is Jenkins X ready for everybody? Let’s be honest: I don’t think so. Not all features and supported platforms — git hosting platforms or Kubernetes hosting platforms — are stable enough. But if you’re ready to invest enough time to dig in, and select the stable features and platforms that work for your use-cases, you’ll be able to improve your pipelines with everything required to do CI/CD and more. This will improve your time to market, reduce your costs, and if you’re serious about testing too, be confident about the quality of your software.
At the beginning, we said that this was the tale of our journey from Jenkins to Jenkins X. But our journey isn’t over, we are still traveling. Partly because our target is still moving: Jenkins X is still in heavy development, and it is itself on its own journey towards Serverless, using the Knative build road for the moment. Its destination is Cloud Native Jenkins. It’s not ready yet, but you can already have a preview of what it will look like.
Our journey also continues because we don’t want it to finish. Our current destination is not meant to be our final destination, but just a step in our continuous evolution. And this is the reason why we like Jenkins X: because it follows the same pattern. So, what are you waiting to embark on your own journey?