One year ago, we wrote about our journey from Jenkins to Jenkins X. It’s time to take a step back and see where we are now and how this journey has impacted the way we write and deliver software at Dailymotion.
We’ve been using Jenkins X for more than one year now to handle all the build and delivery of our ad-tech platform at Dailymotion — with great results. Applying the practices described in the Accelerate book and implemented in Jenkins X allowed us to break the silos, to “shift left” some responsibilities and move faster. According to the State of DevOps Report for 2019, before our transformation, we were part of the medium performing teams, and we’re now somewhere between the high and the elite performing teams.
But nothing came for free. We experienced technical challenges as part of our adoption of Jenkins X, and while fixing them, we ended up with a “custom implementation” of Jenkins X which received the “Most Innovative Jenkins X Implementation” Jenkins Community Award in 2019.
The first limitation we found was related to the Preview Environments feature of Jenkins X. We are using the Kubernetes Cluster Autoscaler, which allows us to have just the minimal number of nodes running at night and the weekend, and to scale up during work hours. Until we hit the upper limit of our node pool as we had too many pods for all our opened preview environments. To fix this, we used Osiris, to automatically scale down our idle pods. You can read more about it in the Zero cost preview environments on Kubernetes with Jenkins X and Osiris article.
We also had specific requirements, such as deploying our applications in multiple regions — Europe, US West & East, Asia — with some applications deployed in all regions, and others in a single one, as well as different application settings per region. When we started using Jenkins X in 2018, it only supported a single Kubernetes cluster, while we needed to deploy on multiple clusters. Our solution was to look at open-source software built on top of Helm — which is one of the core components of Jenkins X — and we quickly found Helmfile, which can be used to do Gitops and declarative configuration management of Helm charts. Helmfile also supports multiple “environments” — regions for us — meaning we can easily configure an application to be deployed on a single cluster or multiple ones, with a mix of global and per-region settings.
Helmfile has native support for secrets management too, using the Helm Secrets plugin, which itself uses Sops, an open-source tool made by Mozilla to encrypt/decrypt YAML or JSON files using PGP, AWS/GCP KMS or Azure Key Vault. This means that we can store all our secrets safely encrypted in our git repositories and leverage the KMS API to give people permission to either encrypt or decrypt secrets on a per-environment and per-application basis. We found sops easy to use, and it allows us to go full-Gitops, storing everything in git.
The impact of using different tools than the ones bundled by default within Jenkins X is that we had to do a little extra work to glue them in our pipelines. We couldn’t use the default
jx promote command anymore, which does the “promotion” of a release to deploy it in the staging or production environment, because this command only knows about Helm, but not about Helmfile. So it doesn’t know how to update our
helmfile.yaml file to upgrade the version of a component. Instead, we had to find a different tool to upgrade the version in the file, and automatically push the change to GitHub and create a Pull Request for review. Fortunately for us, such a tool already exists: updatebot — whose job is exactly to create Pull Requests from version changes, using different strategies. In our case, we are using the “regex” strategy to update our
There is another impact of using
updatebot instead of the
jx promote command: we can’t use the centralized configuration of the environments. Jenkins X relies on a Kubernetes Custom Resource Definition (CRD) to define and store the configuration of the different environments, and their promotion rules — automatic or manual. With
updatebot, we are using a per-application configuration of the environments, to define all the git repositories to update. Although we could derive this configuration from the CRD if we wanted.
In the end, our custom implementation of Jenkins X consists of custom steps in our Pipelines and extra Helm charts installed on the Kubernetes cluster — all based on open-source software. And because we take open-source seriously at Dailymotion, in 1 year we made more than 50 contributions — bug fixes or new features — to open-source projects such as Jenkins X, Helm and its ecosystem, Helmfile, Osiris, Kaniko, …
It’s important to note that the Jenkins X project has not been idle in the past year. Since April 2019, Jenkins X comes with an “environment controller” that you can use to deploy on multiple clusters. And it now comes with Hashicorp Vault to manage your secrets — which is a great alternative to sops, with the downside being that your secrets are no longer stored alongside your configuration in your git repositories. The Jenkins X team is working on a proof of concept to integrate Helmfile — to allow more customization of the environments — and they plan to package a solution to automatically scale down the idle pods from the Preview Environments. Thanks to the Jenkins X team for listening to our feedback, and improving the product!
One of the topics we focused on was breaking the silos. Initially, we used to have different teams, each with different responsibilities: the developers would write code and create releases, the testers would validate the releases, and the operation team would deploy and operate the software. In a team of our size — around 40 people — this is not optimum, because it creates silos. Introducing Kubernetes and cloud-native CI/CD helps you “shift left” responsibilities such as: who owns the build environment, the integration tests or the runtime environment. Our new goal is to do as much validation as possible before creating the release, thus moving more responsibilities to the developers. They are now autonomous in their build environment — which is defined with their pipelines in their git repositories — and on the definition of the runtime environment — which is defined as a Helm chart in each git repository. The integration tests are now defined alongside each application and automatically executed in the Preview Environments before creating new releases. With these new practices, we’re getting closer and closer every day to the “you build it, you run it” mantra.
We also found that moving faster, and with different lifecycles for each component, introduced its own challenges. Always keeping the backward compatibility between components is now mandatory, and we’re using feature flags more and more, to decouple production deployment and feature activation.
One practice that really changed a lot the way we are working is Gitops. We are embracing Gitops for everything we are doing, using Terraform and Atlantis to define all our cloud infrastructure, and Jenkins X and Helmfile for all our applicative deployments. This allows more code reviews and knowledge sharing. And self-service: instead of asking and waiting for someone else on Slack or Jira, we can now write our own Pull Requests for anything, including asking for more permissions on a cloud API, creating a new cloud resource, upgrading the version of a component, and so on. We have a new mantra in the team: “if it’s not in git, it doesn’t exist”.
But the real benefit is the improved velocity. We went from 1 release every 2 weeks, to 10–15 releases per day. And from 1 deployment every 2 weeks, to 5–10 deployments in production per day. Our lead time for changes — time to go from code committed to code successfully running in production — went from weeks to hours/days. According to the State of DevOps Report for 2019, we went from being a medium performing team to somewhere between a high and elite performing team.
Even if we are quite happy with our current state, we don’t plan to stop here. Our industry is moving fast, and so are we. When we started using Jenkins X, it was built on top of Jenkins, to run the pipelines. Jenkins X is now built on top of Tekton, a cloud-native pipeline execution engine for Kubernetes. So we are now working on our next migration: from Jenkins pipelines to the new Jenkins X pipelines, to remove the single point of failure that is Jenkins, and also benefit from the flexibility of Tekton.
Recently, CloudBees announced the SaaS version of Jenkins X: CloudBees CI/CD powered by Jenkins X. This might simplify Jenkins X maintenance a lot, and allow us to focus on our core business. We are also introducing progressive delivery, deploying new releases to canary instances first, and automating the switch progressively using open-source tools such as Istio and Flagger. This allows us to gain more confidence in our automatic delivery pipeline and reduce again our lead time for changes.