By Howard Lo & Amrik Ajimal
Although a plethora of build tools have come into vogue and faded into obscurity with the times, Jenkins continues to capture our collective mindshare. When it comes to software integration, automation, and deployment, its flexibility in adapting to all manner of workflows makes it one of the better options on the market.
However, as most who have had the pleasure of working with Jenkins at their organizations can attest, this flexibility and extensibility often comes with a price often paid in tears, sweat, and caffeine.
In hindsight, the cause of this dichotomy is actually very simple. After setting up our tools and infrastructure, rarely do we apply the Continuous Integration/Delivery (CI/CD) principles to them..
We propose a more generalized, meta-application of CI/CD principles to the foundational tools and infrastructures responsible for providing CI/CD to wider product and engineering teams. Additionally, we will demonstrate how such an approach not only facilitates holistic improvement to the overall software delivery and support ecosystems we depend on. It also allows us to reap profits ranging from high confidence in infrastructure upgrades to increased peace of mind for those managing such tools.
Many factors contributed to our decision to apply a CI/CD approach to our Jenkins setup. For example, Jenkins increasingly became a focal point in our tooling with more tasks coming under its umbrella of responsibilities. While this was helpful in unifying many jobs from disparate systems into a more cohesive experience, it also meant that more teams were dependent on this critical piece of infrastructure. We quickly realized that performing upgrades, even exclusively to Long-Term-Support (LTS) versions of Jenkins, did not end up as stable as one would expect.
Oftentimes, we would discover that after an upgrade, some pipelines would start to exhibit unexpected behaviors. And such behaviors could block various teams from getting their work done as Jenkins is a central piece to our software delivery process. Once, a minor upgrade to a single plugin even prevented Jenkins from starting at all! Considerable time and effort had to be spent in finding the culprit and eventually we ended up rolling back the change and restored its data to a previous snapshot just to get it working again.
With that horrid experience, it became clear that deploying Jenkins was similar to the services that Jenkins itself was responsible for building and deploying. For this reason, it made sense to treat Jenkins the same way we treat all Microservices at Earnest — by utilizing CI/CD principles and only deploy a new version after things have been verified separately to work in a safe environment.
With our frustrations motivations incentivizing us to address some of the shortcomings we experienced, we decided to outline a few key requirements as we embarked on this project.
One of the critical objectives we wanted to address with our new setup was the “what-if-I-get-hit-by-a-bus” problem. To subvert this inevitable pitfall, we made it a priority to democratize the infrastructure setup for this critical system. Now anyone (with the correct permissions/privileges, of course) could perform an upgrade as easy as pushing a button. This will also allow us to open up the process so our team can contribute improvements with confidence!
We wanted to enforce a blue/green deployment model for all proposed changes into our build system. Doing so will allow the rollout with the flip of a network switch. This also gives us the substantial benefit of rolling back an upgrade quickly and seamlessly in case catastrophic issues were found after the rollout.
When addressing the core set of problems above, it is important that everything happening under the hood is not only reliable, but also easy-to-use so it can be performed with minimal guidance.
To ensure that we were going down the right path as we evolved our setup, we delivered each process of the upgrade in separate pipelines, with minimal documentation. An entire upgrade would be all of these discrete pipelines executed one after another.
In the end, there is only one way to really validate any of this if you want respect from the Most Interesting Man In The World. We would have a teammate with no prior experience with Jenkins administration to perform each stage of an upgrade while we sat back with a beer and observed. If there were usability problems found during this process, we would address them and have our teammate repeat it all over again. This practice is known as dogfooding.
We wanted to ensure any novice Jenkins administrator could perform a Jenkins upgrade and rollback, if a domain expert is unavailable. To achieve this, we required infrastructure-as-code so that Jenkins and its plugins configurations are always available in source control. Doing so would also help us avoid the snowflake problem. Even though the initial implementation would take considerable time and effort, we believe this would be beneficial to the team in the long run.
Avoid Unanticipated Downtime
The goal of all of the above requirements was to fulfill a single need: avoid downtime during any Jenkins upgrade. By implementing blue-green deployment, infrastructure-as-code, and prioritizing usability together, we could ensure that releases to production would be easy and would not interrupt our engineering team’s day-to-day activities.
Given these requirements, our expectations for this project were pretty high. If we did everything correctly, we could perform an upgrade as soon as a new Jenkins LTS version is released, or anytime a new security patch dotfix dropped. We can ensure our engineering team is always able to use the most cutting-edge Jenkins feature as soon as it is available, without worrying about stability. In our next blog post, we will take a deep dive into how we accomplish such feat, so don’t forget to check back for an article on the details of building that pipeline!