Jenkins G: Customized CI/CD for cloud-native applications on Kubernetes

Jose Beneyto
Geoblink Tech blog
Published in
14 min readOct 30, 2020

The story of how we went from Jenkins to Jenkins X, what we liked and what we changed, and finally how we created our own CI/CD that we have named Jenkins G.

From Jenkins towards Jenkins X

Overview situation

At Geoblink we reached a point where we had to improve our CI/CD solution which would be neither scalable nor optimal in the long term.

The situation at this point was a Jenkins service running in one EC2 instance. All the workload of the jobs had to be supported by the instance, then it had to be a large machine with a lot of CPU and RAM. But on the other hand the heaviest workloads were very punctual, having a low workload most of the time. It was not an optimal solution and we started looking for a better one.

We started migrating our instance based workloads to containers orchestrated by Kubernetes.

We tried to do our own Kubernetes CI/CD pipelines within our existing Jenkins (running in one EC2 instance) and we sorta did it but the result was far from good and didn’t scale. So we needed something more powerful and out of the box technology able to scale.

Our requirements

In our old CI/CD solution there were things that we already did and wanted to maintain. And there were many others that it made sense to add thinking about native applications in the cloud.

We considered a CI/CD flow that mainly follows these steps:

  • A Feature Branch (FB) is created from master. The developer will add his code and periodically synchronize and merge the changes from master to his/her branch
  • A Pull Request (PR) is created when all the work in a FB is done
  • The PR is reviewed and validated when it passes all the unit tests, integration, e2e, etc.
  • Finally, if the PR gets the green light after passing all the validation, the FB is merged to master and the new version is deployed in production

With that flow we wanted the developer not to have to worry about anything and that the CI/CD was transparent to him/her and just worry about the PR status to obtain a green flag to merge.

We also wanted to take into account the following for each PR:

  • A new CI pipeline is triggered to build the code and run all the tests ensuring to keep the feature branch in a ready to merge state
  • The PR is deployed to a Preview Environment (more on this later)

When a PR is merged to the master branch:

  • A new semantic version number is generated
  • New versioned artifacts are published including: docker images, helm charts and any language specific artifacts (e.g. pypi libraries, jar files, npm packages, go binaries, etc)
  • The new version is promoted to Environments (more on this later)
  • The new version is tagged in git repository

After many Spikes evaluating different services and taking into account our budget, bandwidth as a small team and tech requirements, Jenkins X (JX) was our choice.

Jenkins G

Why the change

For a while we were happy and JX seemed like the right fit for us. We liked Jenkins, as it has a huge community behind it and an endless number of plugins to do practically everything. And we had managed to make CI/CD in Kubernetes but as we started to use it more and more we realized that something did not fit for us (for several reasons):

  • JX development goes super fast and our small two member infra team can’t keep up with the pace. Each time we wanted to upgrade the JX platform we basically had to redo everything from scratch, but after many hours reading and understanding the new documentation.
  • We had several problems with the integration for our Git solution (Bitbucket Cloud). e.g. https://github.com/jenkins-x/jx/issues/2149
  • We didn’t use the majority of JX added functionalities
  • JX is tending toward the serverless model (not using Jenkins) while for the moment we want to keep using it.
  • We felt that “we didn’t owned the pipeline” because of many code stuff inside the Jenkinsfile we didn’t fully grasp.

On the other hand we found several situations in which we were blocked and JX had not yet addressed, such as support for multi-clusters, using HTTPS by default for URLs, etc.

After some time, several workarounds to adjust it to our purpose and after several discussions we decided that it would cost us less to assemble our own solution. In the end, we only really wanted Jenkins running OK in Kubernetes. Nothing else.

Philosophy

We started to build our own CI/CD system in Kubernetes with Jenkins, but always respecting some minimum principles:

  • Simplicity (KISS)
  • Immutability
  • Configuration as Code
  • Add functionality with Jenkins own existing plugins and not new external tools/apps
  • Scalability
  • Elasticity
  • Modularity
  • Security

How does it work?

We borrowed many concepts from JX and others that we were improving or adding if we did not have them to better fit our needs.

1. Kubernetes Plugin

This plugin¹ runs dynamic agents in a Kubernetes cluster and automates the scaling of Jenkins agents running in Kubernetes.

Creates a Kubernetes Pod for each agent started, defined by the Docker image to run, and stops it after each build.

In our case we use a file named KubernetesPod.yaml for each job to define the behavior of the pod used to do the build. Although we use external pod templates with KubernetesPod.yaml we also contributed a new feature in Jenkins (https://github.com/helm/charts/pull/21671) to centralize these Pod Templates in Jenkins’s own configuration. Before that we should deal with JX upgrades, have to fork JX’s Pod Templates repo and wait for a merge then upgrade the JX platform. That’d take some time.

2. Multibranch Pipelines

This enables you to implement different Jenkinsfiles for different branches of the same project. In a Multibranch Pipeline² project, Jenkins automatically discovers, manages and executes Pipelines for branches which contain a Jenkinsfile in source control.

This eliminates the need for manual pipeline creation and management.

3. Preview Environments

This is something we liked about JX, which lets you spin up a Preview Environments³ (PE) for your PR so you can get fast feedback before changes are merged to master. This gives you faster feedback for your changes before they are merged and released and allows you to avoid having human approval inside your release pipeline to speed up delivery of changes merged to master.

A typical directory tree with the basic Helm stuff needed to create a PE:

charts
├── myapp-sample
│ ├── Chart.yaml
│ ├── templates
│ │ ├── deployment.yaml
│ │ ├── _helpers.tpl
│ │ ├── ingress.yaml
│ │ └── service.yaml
│ └── values.yaml
└── preview
├── Chart.yaml
├── Makefile
├── requirements.yaml
├── templates
│ └── secrets.yaml
└── values.yaml

Then in Jenkinsfile you can use this to create the PE

// Create Preview Environment
sh “PREVIEW_VERSION=$RELEASE_VERSION make preview -C charts/preview”

We also modified the original charts/preview/Makefile that makes the preview to fit our requirements.

OS := $(shell uname)preview:
ifeq ($(OS),Darwin)
sed -i "" -e "s/version:.*/version: $(PREVIEW_VERSION)/" Chart.yaml
sed -i "" -e "s/version:.*/version: $(PREVIEW_VERSION)/" ../*/Chart.yaml
sed -i "" -e "s/tag:.*/tag: $(PREVIEW_VERSION)/" values.yaml
else ifeq ($(OS),Linux)
sed -i -e "s/version:.*/version: $(PREVIEW_VERSION)/" Chart.yaml
sed -i -e "s/version:.*/version: $(PREVIEW_VERSION)/" ../*/Chart.yaml
sed -i -e "s|repository:.*|repository: $(DOCKER_REGISTRY)\/$(ORG)\/myapp-sample|" values.yaml
sed -i -e "s/tag:.*/tag: $(PREVIEW_VERSION)/" values.yaml
else
echo "platform $(OS) not supported to release from"
exit -1
endif
echo " version: $(PREVIEW_VERSION)" >> requirements.yaml
helm lint
helm init -c
helm repo update
helm dependency update
helm install --wait --name $(PREVIEW_NAMESPACE) --namespace $(PREVIEW_NAMESPACE) -f values.yaml .

And our charts/preview/requirements.yaml:

# !! File must end with empty line !!
dependencies:
- alias: preview
name: myapp-sample
repository: file://../myapp-sample

One of the things we didn’t like from JX is that PE URLs were public endpoints and we were forced to route the traffic outside our VPC. This was not optimal and posed some security problems. So we started to use the Kubernetes ClusterIP for accessing each PE.

This is what we defined in our Jenkinsfile (environment block):

// URL for accessing the PE
PREVIEW_URL = “http://${APP_NAME}.${PREVIEW_NAMESPACE}.svc.cluster.local".toLowerCase()
// Port for accessing the PE
// Corresponds to a port in a container
// Multiples ports can be declared, like PREVIEW_PORT_2, etc..
PREVIEW_PORT = “8080”

With that feature we were able to use this URL for end to end testing completely inside the cluster which reports some benefits to us. In addition to exposing a PE outside the cluster we use Xposer.

4. Deployments

JX has the GitOps strategy by default but it was intended to be deployed in the same cluster where JX lived and it keeps things isolated by namespaces. At Geoblink we have an EKS cluster for PRODUCTION and another for QA to separate workloads and JX should live in a separate cluster with the rest of the platform tools. Due to using a multi-cluster setup we had problems trying to use the GitOps way (https://github.com/jenkins-x/jx/issues/479) and decided to use Helm directly.

Nowadays, this is an example of how we deploy to both clusters in our Jenkinsfile for PRODUCTION and QA

// Deploy release in EKS qa (qa namespace)
withCredentials([file(credentialsId: 'jenkinsg-eks-qa-kubeconfig', variable: 'KUBECONFIG')]) {
sh "helm upgrade ${APP_NAME}-qa ${APP_NAME}-${RELEASE_VERSION}.tgz --kubeconfig $KUBECONFIG --atomic -i --cleanup-on-fail --version $RELEASE_VERSION --namespace qa"
}
// Deploy release in EKS production (prod namespace)
withCredentials([file(credentialsId: 'jenkinsg-eks-prod-kubeconfig', variable: 'KUBECONFIG')]) {
sh "helm upgrade ${APP_NAME} ${APP_NAME}-${RELEASE_VERSION}.tgz --kubeconfig $KUBECONFIG --atomic -i --cleanup-on-fail --version $RELEASE_VERSION --namespace prod"
}

5. Builder Image

Docker image with all necessary tools for building CI/CD pipeline steps in JG. Inspired by JX builder image.

In contrast with what JX has (a builder image per language: Go, Python, Scala, NodeJS, etc.) where CI/CD specific software was packed along with the language interpreters and related tools, we opted to have a unique builder-base image with all CI/CD specific software and then let the Developer use extra upstream images (from DockerHub or private registry) declaring them in KubernetesPod.yaml.

Here is a list of containing software in our builder-base image:

- git 2.27.0
- git-remote-codecommit 1.15.1
- jq-release 1.5
- yq-release 2.4.1
- docker 18.09.9
- helm 2.14.3
- helm3 3.0.3
- jg 0.0.4
- skaffold 1.10.1
- container-structure_test 1.9.0
- kubectl 1.16.0
- awscli 1.18.125

6. jg command line tool

One of the things we didn’t like about JX was that they had a jx⁴ command that was supposed to do all the magic but it turned out it didn’t suit our needs. We did not like a binary that was used both to install a JX cluster, to manage it and also for use within pipelines.

We found that we only used a couple of features provided by the jx command. It didn’t cover or partially cover our requirements (e.g. auto-versioning) and we had some problems that it could not resolve, especially when we changed everything to HTTPS by default, in this case jx import stopped working.

This is why we decided to create a small bash script that would do what we needed and thus reduce boilerplate code in pipelines.

Among other things with our jg⁵ command we can get the new release version string that a project should use or help us tag the repo with the same format (nothing fancy but very useful)

An example, for a project called myapp-sample that has an existing git repository and tags. If there were no tags, the first version that we specify would be created.

We start from the following situation where there are these last tags:

$ git tag -l | sort -un | tail -3
4.2.14
4.2.15
4.2.16

Then we need to have the version variable in the proper form: A.B.dev

NOTE: In this example we use Makefile for myapp-sample project, but we can also use pyproject.toml, build.sbt or package.json.

$ grep -i version Makefile
VERSION := 4.2.dev

This would be the result of calculating the next available version.

$ jg release-version
4.2.17

Job examples

As we said before, we are based on the Jenkins Kubernetes Plugin with the modifications that allow us to define the behavior of the pod used to build each job. This is configured in the KubernetesPod.yaml file where we can indicate which images the containers will use, the resource limits for each container, pod volumes and volumeMounts for the containers that requires it. In this file we can also define pod annotations such as the IAM role that the pod will assume to interact with our resources, like an S3 bucket, operations with EC2 instances, etc.

This is one example of KubernetesPod.yaml with commented lines for a Python based project:

apiVersion: v1
kind: Pod
metadata:
# Name of the pod
name: myapp-sample
# Assign any labels to the pod below, "kind" is useful for commands like "kubectl get po -l kind=myapps-group"
labels:
kind: myapps-group
spec:
serviceAccount: jenkins-admin
volumes:
- name: workspace-volume
emptyDir: {}
# The two following volumes are mandatory for operations with docker (build, push)
- name: docker-daemon
hostPath:
path: /var/run/docker.sock
- name: volume-0
secret:
secretName: jenkins-docker-cfg
# Define as many containers as you may need to
containers:
# "base" container is mandatory for all CD operations, image building and pushing and deploying to EKS
- name: base
image: myregistry.domain.tld/myorg/builder-base:0.0.28
command:
- cat
tty: true
securityContext:
privileged: true
volumeMounts:
- mountPath: /home/jenkins
name: workspace-volume
# Mandatory volumeMounts for using docker daemon
- name: docker-daemon
mountPath: /var/run/docker.sock
- name: volume-0
mountPath: /home/jenkins/.docker
env:
- name: DOCKER_CONFIG
value: /home/jenkins/.docker/
- name: XDG_CONFIG_HOME
value: /home/jenkins
# Define below the resources for your build pod
resources:
requests:
cpu: 500m
# !!! CRITICAL !!!
# Assign a memory request that will suffice your build demands, including peaks, otherwise your build may suffer from an occasional OOM killing
memory: 512Mi
limits:
cpu: 500m
# At your judgment, maybe 20% more that your memory request, but you may configure the same amount for memory requests and limits
memory: 700Mi
# Another example of container within the builder where tests could be run
- name: python37
image: myregistry.domain.tld/myorg/python-generic-37:0.0.20
command:
- cat
tty: true
resources:
# Define below the resources for your build pod
requests:
cpu: 1
# !!! CRITICAL !!!
# Assign a memory request that will suffice your build demands, including peaks, otherwise your build may suffer from an occasional OOM killing
memory: 256Mi
limits:
cpu: 1
# At your judgment, maybe 20% more that your memory request, but you may configure the same amount for memory requests and limits
memory: 512Mi

Platform Setup

Instead of using jx create as was done in JX, we decided to use a simple Makefile to run kops commands, since it was easier for us to customize, maintain and manage the cluster itself without having to deal with the problems of new versions that happened to us in JX.

For the rest of the platform we use helm commands to replace jx install functionality. That allows us to customize everything.

This is the Makefile we have right now:

# 
# jenkinsg/Makefile
#
SHELL = /bin/bash
# Variables for kops
KOPS_STATE_STORE = s3://location-of-kops-state-store
KOPS_CLUSTER_NAME = jg.cluster.k8s.local
KOPS_CLUSTER_YAML = k8s-cluster-kops.yaml
KOPS_PUBKEY_FILE = kops.pub
# Variables for helm
HELM_COMMON_OPTS = -i --atomic --cleanup-on-fail
# Include vars.mk and overlay variables
ifneq ("$(wildcard vars.mk)", "")
include vars.mk
endif
# Export environment variables required for kops command and confirm steps
export KOPS_STATE_STORE KOPS_CLUSTER_NAME KOPS_CLUSTER_YAML KOPS_PUBKEY_FILE
unexport AWS_SESSION_TOKEN
unexport AWS_PROFILE
ifndef AWS_ACCESS_KEY_ID
$(error "Must set AWS_ACCESS_KEY_ID")
endif
ifndef AWS_SECRET_ACCESS_KEY
$(error "Must set AWS_SECRET_ACCESS_KEY")
endif
# Check required files
ifeq ("$(wildcard $(KOPS_CLUSTER_YAML))", "")
$(error "File does not exists: $(KOPS_CLUSTER_YAML)")
endif
ifeq ("$(wildcard $(KOPS_PUBKEY_FILE))", "")
$(error "File does not exists: $(KOPS_PUBKEY_FILE)")
endif
# Shows environment variables used by kops
env:
env | grep KOPS
# Confirms before running a command
confirm:
@echo -ne "These are your current kops variables\n\n"
@env | grep ^KOPS_
@echo -ne "\nYou are about to run \`make $(MAKECMDGOALS)\`. Continue? [y|N] "
@read line; if [ "$$line" != "y" ]; then exit 1; fi
# Creates cluster
cluster:
kops create -f $(KOPS_CLUSTER_YAML)
kops create secret --name $(KOPS_CLUSTER_NAME) sshpublickey admin -i $(KOPS_PUBKEY_FILE)
kops update cluster $(KOPS_CLUSTER_NAME) --yes
while ! kops validate cluster >/dev/null; do sleep 60; done
kops validate cluster
# Updates cluster without rolling
update:
kops replace -f $(KOPS_CLUSTER_YAML)
kops update cluster $(KOPS_CLUSTER_NAME) --yes
# Updates cluster with rolling (no-force)
update-rolling:
kops replace -f $(KOPS_CLUSTER_YAML)
kops update cluster --yes
kops rolling-update cluster --yes
# Updates cluster forcing the rolling update
update-force:
kops replace -f $(KOPS_CLUSTER_YAML) --force
kops update cluster $(KOPS_CLUSTER_NAME) --yes
kops rolling-update cluster --force --yes
# Deletes cluster
delete: confirm
helm delete $(helm ls --short --all 2>/dev/null) --purge || true
kops delete cluster $(KOPS_CLUSTER_NAME) --yes
# Deletes platform
delete-platform:
helm delete $(helm ls --short --all 2>/dev/null) --purge || true
kubectl delete -f manifests/
kubectl delete clusterrolebinding jenkins-cluster-admin
kubectl delete serviceaccount --namespace jg jenkins-admin
# Installs platform
platform:
# kubectl ad-hoc commands
kubectl apply -f manifests/
kubectl create serviceaccount --namespace jg jenkins-admin
kubectl create clusterrolebinding jenkins-cluster-admin --clusterrole=cluster-admin --serviceaccount=jg:jenkins-admin
# Initialize Helm and add repos
helm init --service-account tiller --wait
helm repo add stakater https://stakater.github.io/stakater-charts
helm repo add netdata https://netdata.github.io/helmchart/
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install or upgrade charts
# namespace: kube-system
helm upgrade nginx-ingress stable/nginx-ingress $(HELM_COMMON_OPTS) --namespace kube-system --version 1.33.5 -f values/ingress.yaml
helm upgrade kube2iam stable/kube2iam $(HELM_COMMON_OPTS) --namespace kube-system --version 2.5.0 -f values/kube2iam.yaml
helm upgrade cluster-autoscaler stable/cluster-autoscaler $(HELM_COMMON_OPTS) --namespace kube-system -f values/cluster-autoscaler.yaml
helm upgrade metrics-server stable/metrics-server $(HELM_COMMON_OPTS) --namespace kube-system -f values/metrics-server.yaml
helm upgrade kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard $(HELM_COMMON_OPTS) --namespace kube-system --version 2.0.1 -f values/kubernetes-dashboard.yaml
helm upgrade netdata netdata/netdata $(HELM_COMMON_OPTS) --namespace kube-system --version 2.0.0 -f values/netdata.yaml
helm upgrade node-exporter prometheus-community/prometheus-node-exporter $(HELM_COMMON_OPTS) --namespace kube-system --version 1.11.2 -f values/node-exporter.yaml
# namespace: jg
helm upgrade xposer stakater/xposer $(HELM_COMMON_OPTS) --namespace jg -f values/xposer.yaml
helm upgrade chartmuseum stable/chartmuseum $(HELM_COMMON_OPTS) --namespace jg --version 2.13.0 -f values/chartmuseum.yaml
sh tools/build-jcasc-config
helm upgrade jenkins stable/jenkins $(HELM_COMMON_OPTS) --namespace jg --version 1.25.0 -f values/jenkins.yaml -f values/jenkins-jcasc.yaml

Configuration as Code

This is one of the important things we wanted to cover. For that we use a Jenkins plugin called JCasC⁷(Jenkins Configuration as Code), nowadays integrated in the core of Jenkins source code. That allows us to reproduce and redeploy our JG setup from scratch as many times as we want.

Scalability

One of our goals was to have a fully scalable solution adapted to each workload. To fit our purpose, we have various types of nodes for different pipelines requirements.

  1. OnDemand nodes

For business critical builds we have a special nodegroup with on-demand instances. This is because we can’t afford the build to be interrupted by a possible termination of spot instances.
In order to create build pods in these nodes we have these lines in KubernetesPod.yaml:

nodeSelector:
lifecycle: ondemand
tolerations:
- key: "lifecycle"
operator: "Equal"
effect: "NoSchedule"
value: "ondemand"

2. Spot-based nodes

These types of nodes support most of the work. With this we can adapt our workload to the most suitable EC2 instances and pay a low cost for them. The most important thing in this case is to correctly use the limits defined for each container of the pod, in this way the autoscaler⁸ will be able to better manage and place the pods to a new node within the nodegroup if necessary.

Conclusions

We now have a CI/CD system 100% tailored for Geoblink that we fully understand every aspect of it both in concept and code.

This adventure from Jenkins to JenkinsX and later our custom solution let us learn about the internals and come up with Jenkins G which we’re able to maintain, evolve and scale as a small infra team.

We will shortly share this project via Github so if you wish to read more about it subscribed to our Medium account for future announces ;-D

We have no doubt that Jenkins X is a great piece of engineering. Surely all the problems that we had in the beginning have been solved. We like both Jenkins and Jenkins X and that’s why we discarded other CI/CD alternatives. Maybe with a greater bandwidth for our small infra team we would have taken more advantage of its features and be able to keep up to date.

Thanks for reading!

References

--

--