E2E Kubernetes CI/CD with Google Cloud Build

Uzziah Eyee
Nov 4 · 8 min read

Google Cloud Build is a service for provisioning build agents to run Continuous Integration and Deployment (CI/CD) tasks [1]. The image above is an architecture of a CI/CD pipeline for deploying a Kubernetes application. This article summarizes all the components and actions involved in setting up this Cloud Build-based CI/CD pipeline. It provides a 1000ft view without going into the step-by-step details of any one specific task.

There are lots of articles about doing specific tasks, but I seldom see ones describing the big picture of what is required to set up an end-to-end solution. I believe that if you understand what you are building, it becomes easier to source individual parts…not the other way round.

Before we proceed, I’ll be covering intermediate to advanced level concepts on Kubernetes, Infrastructure Security, and DNS, etc. So I assume familiarity with these concepts or that you can read up on them later. The entire infrastructure is provisioned with Terraform [2] and all source code is available on Github [3]. Let’s dive in.

A note on the multi-cloud architecture

Having various components split across different providers makes inter-operation a bit more challenging, but it also provides benefits like using the most cost-effective provider for a specific service, avoiding vendor lock-in, and getting a deeper understanding of how various pieces fit together. The latter being my main reason for designing the architecture this way.

The Change Management Process

A common idea is to roll-out changes first to a Staging environment, then promote that change to a Production environment. In my case, the triggers for moving changes from one environment to another are Git operations (like merging a Pull Request)…a principle known as GitOps [4].

The diagram below shows how our system will propagate change from commit to production.

  1. First, the Developer creates 3 Triggers on Cloud Build, one for each pipeline.
  2. Creating a Pull Request triggers the Development Pipeline which runs tests and reports a Commit Status to Github. Assuming all tests pass, the PR is approved and merged into master.
  3. Changing master triggers the Staging Pipeline which builds a Docker image, pushes the image to a Google Container Registry and deploys the application to the Staging Cluster (by applying its K8s YAML files). Then developers can sanity-check the app running on Staging.
  4. If deemed correct, the master commit running on Staging can be tagged with a version like v1.1.2. This action triggers the Production Pipeline which applies the K8s files to the Production Cluster.

With that, let’s walk through each piece of implementing the pipeline.

Setting up the Github Repository

Also, we commit 2 KUBECONFIG files: app/k8s/staging.kubeconfig.yaml and app/k8s/prod.kubeconfig.yaml [6], containing the connection parameters to access the Staging and Production clusters respectively. Each file is used by the corresponding pipeline to deploy the application to the cluster. Importantly, the User Token is removed from the KUBECONFIG files and provided to each Cloud Build pipeline as an environment variable. So, it is safe to commit these files to source control.

Setting up Google Cloud Build

Then, we create 3 Triggers, each configured to run one of our 3 pipelines [9]:

  1. Feature Branch Trigger: Watches Pull Requests, and runs the steps in app/dev.cloudbuild.yaml.
  2. Deploy to Staging Trigger: Watches the master branch and runs the steps in app/staging.cloudbuild.yaml.
  3. Deploy to Prod Trigger: Watches for Tags of the form v1.2.3 and runs the steps in app/prod.cloudbuild.yaml.

Also, you can configure a Trigger to expose certain environment variables to the pipeline processes [10]. So, we set the environment variable _DEPLOYER_TOKEN=<User Token> in the Deploy to Staging and Deploy to Prod triggers.

Lastly, Cloud Build comes with a default Service Account authorized to push images to the GCR registry of the same project [11]. However, our DigitalOcean clusters lack such authorization to pull images from the Registry. So, we create a dedicated GCP IAM Service Account with Storage Object Viewer role on the GCP Project. Then we download a JSON Key for this Service Account [12]…later this file will be used to create an ImagePullSecret in both Staging and Production clusters for pulling images from the registry.

Setting up the Kubernetes Clusters

When a Deployment is created in a namespace, the Kubelet needs the authorization to pull images from GCR. So, we create an ImagePullSecret from the GCP Service Account Key which we downloaded earlier. Then add this secret as an imagePullSecret to the default Service Account in the cluster namespace [14]. Consequently, every Deployment in that namespace can pull images from GCR.

Finally, we are run a Reverse Proxy in each cluster to load-balance incoming requests to the appropriate application service. Traefik is one such reverse proxies with support for automatically provisioning Letsencrypt TLS Certificates [15]. Importantly, Traefik is run as a LoadBalancer-type K8s service, so that the Cloud Provider (in this case DigitalOcean) provisions an actual load balancer device with a public static IP address [16]. You can then access the cluster — and consequently your application — through that IP address.

Setting up DNS

In order to redirect such requests to our DigitalOcean clusters, we add NS Records to the Namecheap account to delegate query resolution to the DigitalOcean nameservers [17]. Then we create 2 A Records that map to the Static IP addresses of our Staging and Production clusters respectively.

  1. Namecheap NS Records: opsolute.com. → DigitalOcean Nameservers
  2. DigitalOcean A Record: polite.opsolute.com. → (Production Cluster Traefik LoadBalancer IP) → Application
  3. DigitalOcean A Record: staging.polite.opsolute.com. → (Staging Cluster Traefik LoadBalancer IP) → Application

Testing the entire setup

  1. Edit the Deployment file to set --greeting="Bonjour!". Make a Pull Request for this change and watch the Cloud Build Development Pipeline kick-in to run tests and report a success status. Then approve and merge the PR.
  2. On merging to master, the Staging Pipeline should activate to build and deploy the application image to the Staging Cluster.
  3. From your browser visit the staging URL https://staging.polite.opsolute.com and you should see the greeting Bonjour!.
  4. Now tag the master branch like so: git tag v1.0.0 origin/master && git push --tags. This should trigger the Deploy to Prod Pipeline which will deploy the image to the Production cluster. Visit the URL https://polite.opsolute.com and you should see the greeting Bonjour!.

We now have one version of our app running on both Staging and Production. Next, we introduce a change that makes our Spanish, then deploy that change to Staging.

  1. Repeat steps 1 and 2 above but this time, set --greeting=”Hola!”.
  2. Visit https://staging.polite.opsolute.com and you should see Hola!. However, the Production URL https://polite.opsolute.com should still return Bonjour!.

And that’s it! Our pipeline can deploy different versions to Staging and Production.

Challenges I faced

  • Every time you deploy a K8s LoadBalancer service, DigitalOcean provisions a Load Balancer with a random public static IP address. So you need to update your DNS A Records to point to this new IP. This is very inconvenient. The solution would be to purchase what they call a Floating IP (a static IP with a different lifecycle from your service) and specify this as the value of your service’s LoadBalancerIP…but this is sadly not supported [19].
  • Cloud Build does not support triggering on Github Pull Merge branches. Whenever you make a Pull Request, Github simulates the resulting merger of the source and destination branches in a branch named refs/pull/ID/merge. Hence, you can test and build the result of merging your PR without actually merging it. Sadly this is not supported by GCP Cloud Build [20].
  • I tried using the Kubernetes Provider and the Helm Provider to manage Kubernetes resources in Terraform. The aim was to declaratively manage important cluster resources like Namespaces and the Traefik ingress controller as Infrastructure as Code. However, these providers had poor support for Kubernetes object specifications and had resource-dependency issues. So for now, it’s best to keep Kubernetes stuff outside Terraform until these providers become more mature.

Overall, I found Cloud Build great for quickly setting up CI/CD pipelines for Kubernetes applications.

[1]: Cloud Build — Automated builds for continuous integration | Cloud Build | Google Cloud

[2]: Terraform by HashiCorp

[3]: eyeezzi/k8s-cloud-build-cicd
Eyeezzi — https://github.com/eyeezzi/k8s-cloud-build-cicd

[4]: GitOps what you need to know

[5]: eyeezzi/k8s-cloud-build-cicd

[6]: eyeezzi/k8s-cloud-build-cicd

[7]: Running builds with GitHub Checks | Cloud Build Documentation | Google Cloud

[8]: Mirroring a GitHub repository | Cloud Source Repositories Documentation | Google Cloud

[9]: Automating builds with Cloud Build | Cloud Source Repositories Documentation | Google Cloud

[10]: Substituting variable values | Cloud Build | Google Cloud

[11]: Setting service account permissions | Cloud Build | Google Cloud

[12]: Creating and managing service account keys | Cloud IAM Documentation | Google Cloud

[13]: How to Create Kubernetes Clusters Using the Control Panel

[14]: Google Cloud Registry (GCR) with external Kubernetes

[15]: Let’s Encrypt
Containo.us — https://docs.traefik.io/https/acme/

[16]: How to Add Load Balancers to Kubernetes Clusters

[17]: How To Point to DigitalOcean Nameservers From Common Domain Registrars
DigitalOcean — https://www.digitalocean.com/community/tutorials/how-to-point-to-digitalocean-nameservers-from-common-domain-registrars

[18]: eyeezzi/k8s-cloud-build-cicd

[19]: How to set static IP for loadbalancer in Kubernetes?
DigitalOcean — https://www.digitalocean.com/community/questions/how-to-set-static-ip-for-loadbalancer-in-kubernetes

[20]: https://issuetracker.google.com/issues/119662038

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Uzziah Eyee

Written by

Site Reliability Engineer @ Dapper Labs - What I cannot build, I do not understand.

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade