Integration Tests in Kubernetes and Cloud Build

Ivan N.
Datasparq Technology
5 min readApr 3, 2020

3 simple steps to build a simple CI/CD using your existing Kubernetes cluster

TLDR:

  • We run our tests as Kubernetes jobs so that are as close to production as possible
  • A python script running in Cloud Build monitors the job until completion. If the job fails, so does the rest of the build
  • We used a custom Docker container that had all the required packages and could automatically authenticate with most GCP resources

A diagram and the source code are at the bottom of the page.

Background

Houston is our home grown server-less solution for workflow management. It has a couple of moving parts like an API and a web app living as Kubernetes deployments; a managed database; and a managed Redis instance. There are several namespaces (production, staging, testing, etc) on the K8 cluster and they all connect to the same DB and Redis ).

Simplified architecture of callhouston.io

When deciding the architecture, it was deemed that a separate DB and Redis instances per deployment (production, staging, testing…) was a massive overkill and honestly a waste of money.

Spinning them on demand was the next logical option, but it did adds 15–20 minutes to the build time, which did not sit very well with my borderline ADHD. It also added extra layers of complexity, such as:

  • Running Terraform scripts in Cloud Build ( and Terraform was already provisioning Cloud Build, so I did not want to be the one breaking the Internet with some wacky recursion)
  • Managing the destruction of those resources. That had to be done manually because (quite often) I want to be able to debug problems by SSH-ing into the culprit container and smashing things around until one of us rolled over.
  • The fact that GCP will trow a fit if you try to create a new Cloud SQL instance with the same name as the one you deleted not half an hour ago. I spent 3 days and 1000 hair follicles before I came to the above conclusion. It was a (un/poorly) documented “error” of GCP and Terraform was handling it like bad joke at a Christmas party — pretending to not have heard it. I never did find out what the maximum timeout of Terraform was, I just ctr-c-ed it after leaving it for 2 hours.

So it was decided to go with a “single” (as in single endpoint, GCP can scale horizontally behind the scenes) instance for DB and Redis. And inside we would have different databases for production, staging, testing, etc. With SQL the sky is the limit, Redis on the other hand allows only 16. But we would never need so many (…famous last words).

Integration Tests inside Kubernetes?

I will spare you the lecture why you need to test your code as close to production as possible. If you didn’t know that already, you wouldn’t be reading this in the first place.

In the beginning I was running the full test suite from my dev machine, and I had a very elaborate and over-engineered solution to connect to the remote DB and Redis instances. Things were going quite smoothly and accidentally dropping all production tables only once. But then one fine day we added another developer and they also needed the ability to run tests.

At that time there already was a Kubernetes job that run the tests. But there were two major problems:

  • The job would run the test, but you needed to use the Kubernetes Engine web interface to see if it failed or not. Also the failure of a job would not fail the pipeline
  • The jobs were not being “cleaned” when they failed, so when you run kubectl apply .....next time, it would not recreate them. So you have to delete the jobs every time before you push your changes

That is why I was running the test locally. Between you and me I also prefer doing it in PyCharm, as the offending line of code is usually one click way.

Obviously I never ran tests in prod, that’s what staging is for (but I do love a good Austin Powers meme)

Any self respecting DevOps person would have at that point turned to something like CircleCI, Travis or even Jenkins. I myself have used Semaphore in the past and wasn’t particularly exited at the opportunity to deal with yet another service provider; getting payment approvals; managing access credentials; or babysitting an on premise CI (even it’s self managed).
I needed a simpler solution that would take advantage of the existing infrastructure, namely Cloud Build and Kubernetes. And soon a cunning plan was hatched.

The Solution

Cloud Build was already responsible for running the kubectl apply , so all I needed was to add another step that would wait for the unit test jobs to finish and then exit(0) if all passed, or exit(1) if a test had failed.

I could have smashed something in Bash that could have used kubecrtl and jq, but I hate debugging bash scripts. Also speaking from experience, none wants to maintain those, and are destined to become this mystical entity that future developers just run as part of the CI ritual and pray it doesn’t break.

Enter Python and the kubernetes library. It provides all the functionality one could ask for and it is also very well written. The only foreseeable problem was authentication. Luckily, this is all running on Cloud Build which (once you set up the IAM permissions) can magically access any Kubernetes cluster in your GCP project.

I first tried using the default gcloud container (as it already has the credentials sorted out) and install kubernetes on each run. For better or worse it refused to cooperate due to some package dependency issue. So I went ahead and created a custom container that had all the packages I needed and it authenticated the same way the gcloud one did.

Diagram of the CloudBuild steps

The code

Disclaimer: While I am a DataSparQ employee and I am actively working on Houston, the views and opinions expressed in this work are entirely mine.

--

--

Ivan N.
Datasparq Technology

When the machines take over, I will be on the winning side 🤖