Infrastructure CI with Kubernetes: A Practical Example

W. Jenks Gibbons
6 min readMar 31, 2023

--

Author: W. Jenks Gibbons

Recently, in Lightweight, Fast Kubernetes for Dev, we explored kind. Now, let’s explore continuous integration for dev using Python, kind and GitHub Actions. The focus is going to be on dev, so let’s define the goals and requirements.

Goals

  • test feature branches in a local dev environment prior to check-in
  • test feature branches after check-in
  • notify when an issue occurs
  • notify and merge a feature branch into main when the run is successful

Requirements

  • ability to test in a dev environment (e.g. a laptop)
  • ability to automate the stand-up/teardown of a local k8s cluster
  • ability to test the manifests (e.g. a deployment)
  • ability to test the deployed application
  • ability to test with synthetics using localhost

Local Testing

As I update my infrastructure (e.g. k8s manifests) I need to be able to test it often, on demand and prior to a check-in. The focus at this point is on infrastructure. The application CI is earlier in the process; here we use tested application containers. To begin, I will test a simple Java application locally using a python script. Let’s look at an example:

  1. I create a feature branch:
$ git branch
* dev-app-java
main

2. I configure my test environment (e.g. secrets, manifests etc.) using config.yaml

3. I make changes to the manifest (e.g. update the container or some other change):

- image: jenksgibbons/app-java:new-tag

4) I run a test to ensure my changes work as expected:

$ python3 dev.py
Creating cluster "kind-k8" ...
✓ Ensuring node image (kindest/node:v1.25.3) 🖼
✓ Preparing nodes 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind-k8"
You can now use your cluster with:

kubectl cluster-info --context kind-kind-k8 --kubeconfig config

Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/
Deleting cluster "kind-k8" ...

5) Check my notification, for example a log:

Error from test in a local dev environment

6) Update the image (e.g. latest) and test again seeing a different issue:

API error

These are basic examples outlining a process.

7) Fix the apiVersion field and test again:

Success

The idea is to make small, incremental changes testing each one along the way to decrease risk and increase quality. With the latest successful run, we now have a working change staged in the feature branch.

$ git diff ../../../app-java/kubernetes/app-java.yaml
diff --git a/app-java/kubernetes/app-java.yaml b/app-java/kubernetes/app-java.yaml
index 16319f6..05605ec 100644
--- a/app-java/kubernetes/app-java.yaml
+++ b/app-java/kubernetes/app-java.yaml
@@ -27,7 +27,7 @@ spec:
ad.datadoghq.com/app-java.logs: '[{"source":"tomcat","service":"tomcat","log_processing_rules":[{"type":"multi_line","name":"log_start_with_date","pattern":"\\d{4}-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])"}]}]'
spec:
containers:
- - image: jenksgibbons/app-java
+ - image: jenksgibbons/app-java:latest
name: app-java
imagePullPolicy: IfNotPresent
env:

At this point, we can check in the change and kick-off the CI or stage further changes. Before continuing however, let’s explore what we are doing here.

dev.py

This script takes it’s configuration from config.yaml Here, secrets, manifests, a synthetic browser test and other configurations are setup. When run locally (tested on OSX) it:

  • creates a local k8s cluster using kind and loads the kubeconfig
  • creates secrets
  • deploys manifests (e.g. create -f [deploy, service])
  • waits for pods to (un)successfully deploy
  • creates a port forward to test the service
  • runs a synthetics browser test using a tunnel
  • deletes the cluster
  • logs the result and forwards it to an aggregator

So, how are we looking on our goals and requirements thus far?

  • test feature branches in a local dev environment prior to check-in ✅
  • notify when an issue occurs ✅
  • ability to test in a dev environment (e.g. a laptop) ✅
  • ability to automate the stand-up/teardown of a local k8s cluster ✅
  • ability to test the manifests (e.g. a deployment) ✅
  • ability to test the deployed application ✅
  • ability to test using localhost¹ ✅

Looking good so far. Let’s now check-in our staged change.

CI

$ git add ../../../app-java/kubernetes/app-java.yaml;git commit -m "ci demo";git push  --set-upstream origin dev-app-java
[dev-app-java eb183ef] ci demo
1 file changed, 1 insertion(+), 1 deletion(-)
Enter passphrase for key '/Users/jenks.gibbons/.ssh/id_rsa':
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 8 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 426 bytes | 426.00 KiB/s, done.
Total 5 (delta 4), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
To github.com:jgibbons-cp/datadog.git
1a726c1..eb183ef dev-app-java -> dev-app-java
branch 'dev-app-java' set up to track 'origin/dev-app-java'.

The check-in triggers a GitHub Actions workflow:

Triggered by a check-in

We validated our change in a local dev environment and now have confirmed it with CI. The logs note a successful run and that the change was merged into main:

CI success for checked-in change, merge change to main

Now I can update main and continue developing:

$ git checkout main
...
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
$ git pull
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
Unpacking objects: 100% (1/1), 225 bytes | 225.00 KiB/s, done.
From github.com:jgibbons-cp/datadog
efaa0b2..3df34ec main -> origin/main
Updating efaa0b2..3df34ec
Fast-forward
app-java/kubernetes/app-java.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

So, let’s take a look at a failure:

CI failure after check-in

Bummer, but the upside is that since the run failed we did not merge.

Log stream for CI

Here are logs for success and the merge, followed by an error. So, what does the workflow job do?

GitHub Actions Workflow

The workflow can be found in .github/workflows/deploy_test.yml from here (Cloudflare is blocking the full URL). It does the following:

  • triggers off of a push to a branch
  • configures the Ubuntu host for the tests (e.g. install python, install libraries with pip)
  • checks out the feature and main branch
  • sets up the config for dev.py
  • sets up the config for a synthetics browser test
  • runs the dev script
  • merges

If any step fails a log will be sent and the merge will not happen.

This wraps up completion of the goals and satisfaction of the requirements:

  • test feature branches after check-in ✅
  • merge the feature branch to main when the run is successful ✅

Here we have wrapped up our example of infrastructure CI with k8s. We started down the road so next we will take a step back to CI of the application then move down the pipeline to this, pre-prod and prod.

Footnote

¹ We use CI for dev, pre-prod, UAT, prod etc. because environments are different. I hit my app in my local environment on localhost or the loopback using a port forward. Nobody else can hit these so to use a SAAS synthetics test I tunnel. In dev this works, but in reality it is a smoke test. For example, I can change this in my application manifest,

ports:
- containerPort: 443
hostPort: 443

, and still test successfully because the port-forward bypasses the infrastructure setting. If I create a load balancer however as I do in later environments it fails as this application is not listening on this port.

--

--

W. Jenks Gibbons

I listen to the music of the Dead and write about technology.... maybe I will write about other things someday too :)