Jenkins as Code

Published in

apree health (Castlight) Engineering

9 min readApr 22, 2022

Castlight’s Infrastructure Automation team has embraced Infrastructure as Code, Configuration as Code and GitOps to empower almost 30 software engineering teams to build and maintain over 130 applications in 14 different Kubernetes environments spread across GCP, AWS and two private data centers. This was accomplished by creating a foundation of reusable tools and configuration that teams can self-service to onboard new applications, deprecate old applications, test, refactor, redeploy, scale, monitor and control the full application life cycle. Each team knows the needs of their applications and can define those requirements in code and the automation infrastructure we built will read those requirements and create the necessary resources to support the application in whichever environment they are deploying to.

A cornerstone of this process is the CI/CD automation we built in Jenkins.

Automating Jenkins

We deploy and manage Jenkins itself in a fully automated way. Jenkins and its agents run in Docker, deploy to Kubernetes, configure global settings at startup and build jobs using JobDSL. This is possible by following Configuration as Code principles for Jenkins and keeping it all under git version control. This setup allows us to run multiple identical Jenkins servers for backup and job testing so as to minimize downtime/regressions and enable rollback to previous working versions with ease. We can also run Jenkins locally to quickly test Jenkins updates, update plugins or modify seed scripts. If the servers somehow got corrupted, we would lose only the job histories. Jenkins and all its jobs can easily be rebuilt back to a known good state.

Installation and Runtime

We install Jenkins and plugins in a Docker image and maintain the Dockerfile and supporting files in a git repo. When running Jenkins locally we use docker-compose to run the server and a local agent. From the local Jenkins environment, we can build Kubernetes clusters in the cloud or our data center and deploy Jenkins to Kubernetes where it becomes available to all teams.

Agents Run Docker in Docker

To avoid issues related to versioning, backward compatibility, etc., our Jenkins agents run Docker inside of them (Docker in Docker). When jobs need a custom tool that isn’t already installed they are required to use a docker image and run their script in it. Docker images are portable and provide isolation from other tools so there will never be compatibility issues installing and running tools. Jenkins allows running shell scripts inside a docker image:

docker.image('mysql:5') { c ->
   sh 'mysqladmin -v'
}

Global Configuration

At startup Jenkins runs a series of Post Initialization Scripts scripts we built into our docker image to provide all the initial system configuration. It is also possible to use the Jenkins Configuration as Code plugin but we have a lot of dynamic configuration that we built before that plugin was available and we never felt the need to change.

You can create a Groovy script file $JENKINS_HOME/init.groovy, or any .groovy file in the directory $JENKINS_HOME/init.groovy.d/, to run some additional things right after Jenkins starts up. The groovy scripts are executed at the end of Jenkins initialization. This script can access classes in Jenkins and all the plugins.

We configured our Jenkins servers to use the Kubernetes Cloud plugin to automatically create agents on demand when needed and stops them when idle. Agents can run in the same Kubernetes cluster or other Kubernetes clusters that are closer to the application environments. For instance, we run our primary Jenkins in our data center but as we test and deploy to AWS and GCP, we run the workers local to those cloud environments. Running distributed workers increases our agent capacity since the data center cannot scale like the cloud.

// Post initialization script init.groovy.d/clouds.groovydef jenkins = Jenkins.instanceOrNull
jenkins.clouds.clear()OpsEnvironments.getAll().each { opsEnv ->
    def kubernetesCloud = new KubernetesCloud("kubernetes-${opsEnv.name}")
    kubernetesCloud.with {
        serverUrl = opsEnv.kubeApiAddress()
        credentialsId = opsEnv.kubeconfigCredentialID()
        connectTimeout = 0
        readTimeout = 0
        containerCapStr = "${opsEnv.jenkinsContainerCaps().default + opsEnv.jenkinsContainerCaps().large}"
        jenkinsUrl = opsEnv.jenkinsUrl()
        webSocket = true
    }
    jenkins.clouds.add(kubernetesCloud)opsEnv.agentTemplates.each { instanceTemplate ->
        kubernetesCloud.addTemplate(instanceTemplate)
    }
}
jenkins.save()

Running the agents inside AWS and GCP allows the workers to act as proxies and have access to AWS and GCP services and permissions that our data center Jenkins does not have. From a security perspective, this minimizes the number of open ports in a remote environment and allows us to restrict access to local services to only the agent’s network.

When a job runs, a worker is selected for the appropriate environment using a Pipeline groovy node block. Jobs can also define additional agent selectors to ensure the agent has enough memory, CPU, etc.

agentLocation = selectAgentLocation(params.ENV)
node( "${agentLocation}&&${agentSize}" ) {
  ...
}

Credentials Management

Credentials are defined by the post initialization scripts as well, but we must refresh some secrets periodically because they can expire. We use Hashicorp Vault for secrets and there are Java libraries that make it easy to automate Groovy scripts to read secrets and generate certificates from Vault and populate Jenkins Credentials. We abstracted this seed credential step into a reusable Groovy script that runs on a cron schedule.

job('seed-credentials') {
    description('Seeds credentials from Vault.')
    disabled(false)
    blockOnUpstreamProjects()
    logRotator {
      daysToKeep(60)
      numToKeep(40)
    }
    
    configure { project ->
        project.remove(project / scm)
        project / scm(class: 'hudson.plugins.filesystem_scm.FSSCM') {
            path("/usr/share/jenkins/lib")
        }
    }triggers {
        cron('H */6 * * *')
    }steps {
        systemGroovyCommand('''
            def cl = new GroovyClassLoader(this.getClass().classLoader)
            cl.addURL(new File("\${build.project.workspace}/src").toURI().toURL())
            def shell = new GroovyShell(cl, binding)
            shell.evaluate(new File(build.project.workspace.child('init/seed_vault.groovy').toURI()))
        ''') {
            sandbox(false)
        }
    }  
}

Jenkins Deployment

We have Jenkins jobs that can deploy Jenkins to any one of our cloud or data center environments. It isn’t a good idea to deploy an instance of Jenkins from itself in case it fails or exits prematurely due to restarting. We deploy our Primary Jenkins from our Backup and vice-versa. Jenkins server runs in Kubernetes so we deploy it using Helm. The initial secrets needed to authenticate to Vault are provided by a Kubernetes Secrets resource and available at runtime in the filesystem.

Test, Backup and Restore Jenkins Server

All components are committed to git repos and deployed by branch/tag name so they can easily be rolled back and redeployed. Running everything in docker and relying on external secrets servers allows us to run Jenkins and workers anywhere including on our local workstations using docker-compose. We have multiple identical Jenkins servers that operate as Backup in case of emergency or Test servers to test new Job configurations or Jenkins upgrades.

Self-Service Principle

Jenkins is such a useful tool for so many use cases we didn’t want to lock it down so we want job configuration to be self-serviceable. However, Configuration as Code requires discipline to avoid manual configuration sneaking in because that is what makes a system brittle and hard to reproduce. When we start Jenkins it rebuilds the configuration from scratch which forces us to put it in code or we lose it after the next deploy.

To solve both of these problems we designed a standalone Job Configuration git repo and allow all engineers to be developers. We still need safeguards to avoid breaking things so all code must be reviewed and we’ve implemented a test suite that must pass before merging. We also allow testing your own branch on a Test Jenkins server. The Test Jenkins cannot deploy to production or other sensitive environments but it allows testing the same jobs in lower environments.

Job Configuration

We made job configuration a separate step so we can run it as frequently as necessary. We use the JobDSL Seed job pattern to be able to re-run the seed job at any time and configure git commit hooks to trigger a CI/CD pipeline to run the seed job whenever changes are pushed to the master branch of our job config repo. Our Test Jenkins server allows us to specify custom branches when running the seed job so we can test code before it is ready to be merged into master and applied to the primary Jenkins server.

Application teams can update the job configuration git repo themselves to add/remove/update jobs for their own team. When it comes to shared code and reusable components the Infrastructure Automation team creates and maintains Job Templates and reusable libraries to ensure all services get updated when major changes must be made.

Job Templates

Most of our jobs are written using the Jenkins Pipeline plugin and use the Scripted Groovy configuration. One of the first issues we encountered using Groovy DSL directly was the lack of reusability in the code we wrote. We could copy and paste changes from one job to another but we have hundreds of nearly identical jobs that deploy Kubernetes applications and could benefit from code reuse so we quickly felt the need to create a template system for jobs. We found writing that common code in Shared Libraries helped make the complex components much more reusable and composable.

The shared libraries made it possible to write template code for “build-time” JobDSL Job creation:

// Instantiates template using custom JobDSL code
new DockerImageTemplate(
        args: [
                imageName: 'docker-kube-dashboard',
                gitUrl: '...',
                dockerUrl: '...'
        ]
).build()

and composable “runtime” code for the Pipeline scripts:

// Pipeline groovy runtime code@Library('jenkins-config') _
Map vars = [:]
Map args = [
   imageName: '...',
   gitUrl: '...',
   dockerUrl: '...'
]node( "${agentLocation}&&default" ) {
    
    stepHelpers.stepStage("Git Clone") {
        vars.gitOutputs = gitSteps.gitCloneStep(
            url: args.gitUrl,
            dir: "git_service",
            branch: params.BRANCH
        )
    }    stepHelpers.stepStage("docker build") {
        dockerSteps.dockerBuildStep(
            url: args.dockerUrl,
            dir: "git_service",
            tag: vars.gitOutputs.dockerBranchWithSha,
            useProxy: false,
            buildArgs: [
                REVISION: vars.gitOutputs.dockerBranchWithSha,
                "GIT_SHA": vars.gitOutputs.commit,
                "SERVICE_NAME": args.imageName,
            ]
        )
        
    }    stepHelpers.stepStage("docker push") {
        dockerSteps.dockerPushStep(
            url: args.dockerUrl,
            dir: "git_service",
            sourceTag: vars.gitOutputs.dockerBranchWithSha,
            tags: [
                vars.gitOutputs.dockerBranch,
                vars.gitOutputs.dockerBranchWithSha,
            ]
        )
    }
}

reusable Shared Library code:

// var/dockerSteps.groovyMap dockerBuildStep(Map args) {
    Map outputs = [:]
    ...
    return outputs
}

Map dockerPullStep(Map args) {
    Map outputs = [:]
    ...
    return outputs
}

Map dockerPushStep(Map args) {
    Map outputs = [:]
    ...
    return outputs
}

Templates make it easier for application teams to add their own services without having to maintain the underlying Job code.

Dynamic Job Parameters

Creating jobs from templates also allows us to dynamically add new choices to Job params. We can use one job to deploy to any environment by using an ENV Choice parameter and the values for the list come from a different data source in the configuration. When we add a new environment to the data source we can update all jobs simultaneously to support the new environment. Similarly, some jobs are generic and can operate on any application so we have a data source for applications to populate that param.

import com.castlight.Environments...def envs = Environments.getAll().collect { it.name }job.choiceParam('ENV', envs, "Environment name")

Environment Promotions

Since some environments are more important than others we have a promotional model that even our infrastructure must follow. Things like docker images or helm charts that are still being tested or aren’t ready for some environments can be configured to use different versions on different environments:

args.chartVersion = envObj.getHelmChartVersion(chartName)

Drawbacks

Getting to where we are took a long time and there were quite a few missteps along the way. Even now there are things we want to change and improve and it will take time. As with all automation, the most time is spent on the first instance, but once a job template exists or a Jenkins server is deployed, the next one is easy.

A self-service model does put a burden on application teams to understand it technically and understand the process before being able to use it, but we have thorough documentation, and it doesn’t require much effort to educate them. The benefit is they have greater control over their applications and jobs. Most teams appreciate that independence much more than waiting for someone else to do the work for them.

Takeaways

Jenkins as Code has eliminated many problems related to upgrades and compatibility since it is easy to test and rollback changes and, in a worst-case scenario, rebuild it from scratch and have it fully functional. Running in Kubernetes helps with scalability and using Configuration as Code with self-service principles allows application teams to develop, test and use Jenkins jobs themselves.

The Infrastructure Automation team is now able to support any number of applications and can deploy to any new environment with minor configuration changes in our Job Configuration git repo.