GKE Making Kubernetes Upgrades (Extremely) Easy

Tariq Islam
Google Cloud - Community
5 min readJun 22, 2018
Ever feel like your IT operations are something out of the movie Speed?

At some point, you have to upgrade. And invariably, upgrades are dicey. So much so that entire organizations would rather continue running old (and quite vulnerable or unsupported) software rather than deal with upgrading to the latest (and safest) stable version of a given software platform. And that’s just traditional software.

Kubernetes is not (yet) traditional software, and it includes numerous moving parts and pieces that in aggregate provide one of the most compelling operational deployment platforms we’ve seen in tech history. This inherent complexity has implications in two ways:

  1. Multi-tenancy is an almost forced decision due to the difficulty with which Kubernetes distributions are managed. Operations teams today are simply not built to support at-will provisioning of Kubernetes clusters. This is precisely why GKE exists, to give organizations the inherited capability to run like Google and provision at will.
  2. Upgrading Kubernetes distributions is operationally error-prone. Artifacts are left over, configurations go stale, components are missed, deployments break, etc.

The second you enter into a multi-tenant environment, upgrades become a decision by committee at best and a non-starter at worst. But this isn’t a discussion of multi-tenancy (that’s another post). This is more about the fact that an upgrade is often a traumatic event. And it really doesn’t have to be. Google’s Kubernetes Engine (GKE) provides a means by which Kubernetes can be used and operated effectively.

There are two primary ways, in my mind, to safely and reliably perform cluster upgrades without putting your workloads at risk:

  1. You can programmatically provision a new cluster in GKE on demand at the target Kubernetes version, and deploy/test your workloads before sun-setting the older cluster in a blue/green format.
  2. GKE uniquely provides you with a way to define a node pool as part of your existing cluster, where specific nodes are provisioned at the target Kubernetes version. This allows you to perform a blue/green deployment environment in a single cluster. Once satisfied, you can upgrade your original nodes.

There is a third option for executing an in-place upgrade in GKE which can and will work for most use cases with little disruption:

gcloud container clusters upgrade cluster1 --cluster-version=latest --master

gcloud container clusters upgrade cluster1 --zone=us-east4-a

Which after a measly 5 minutes results in:

That was unnervingly easy.

And in the space of in place upgrades, CoreOS Tectonic also provides a similar push-button capability in their K8s distribution (this is done by deploying control plane components a bit differently than normal). However for those of you who want more separation and assurance, we’ll focus on the above two options as provided out of the box in GKE.

Let’s look at the first option where we have a deployed application in cluster1 that’s at Kubernetes version 1.9.6, and we want to upgrade to version 1.10, safely and reliably:

Here’s what we have deployed. Nothing exciting.
$ gcloud container clusters list
NAME LOCATION MASTER_VERSION MACHINE_TYPE NODE_VERSION STATUS
cluster1 us-central1 1.9.6-gke.1 n1-highmem-2 1.9.6-gke.1 RUNNING

To perform the upgrade, let’s provision our new cluster. The following command will provision a cluster at the latest available Kubernetes version (n), but you may select n-1 or n-2 as well:

gcloud container clusters create cluster2 --zone=us-east4-a --cluster-version=latest

After a few minutes we have our new production cluster ready to go, and our kubeconfig is already contextualized against it:

And listing out our clusters:

gcloud container clusters list

We find both:

NAME LOCATION MASTER_VERSION MACHINE_TYPE NODE_VERSION STATUS
cluster1 us-central1 1.9.6-gke.1 n1-highmem-2 1.9.6-gke.1 RUNNING
cluster2 us-east4-a 1.10.4-gke.0 n1-standard-1 1.10.4-gke.0 RUNNING

Now we can deploy our application using Skaffold:

skaffold run

This will leverage our current Kubernetes context to build, tag, push, and deploy our code to the new cluster.

If you’d like, here is a 5 minute getting-started on Skaffold.

Looking at our deployment in cluster2:

Now we’re ready to test our application(s) out, whatever that means to you. This part may or may not take some time depending on how good your testing and QA processes are. Once you’ve made the switch of moving service and request traffic to your application(s) on the target cluster (out of scope for this post, as this entails application-level things like session management, multi-cluster ingress, etc), you can decommission cluster1 after confirming that it is no longer in use or needed, depending upon your retention policies:

gcloud container clusters delete cluster1 --zone=us-east4-a

I know it seems a bit jarring to just delete an entire cluster, but the key thing to realize here is that with GKE, in just a few commands, you’ve completely upgraded your deployment environment and deployed your apps/workloads accordingly, serving traffic, and not having to worry about the actual upgrade process. Your provisioning commands will most certainly be more complex, utilizing the plethora of configuration options available to you in GKE to make your cluster(s) production ready per your needs.

Enterprise Kubernetes clusters can and should be treated as cattle in many circumstances, not pets.

Treating our clusters as multi-tenant immovable monolithic structures causes our organizations to fall into the same operational trap as every other tech adoption misstep, where we gain a bit of capability but are still arrested and inhibited in how we manage what we’ve adopted.

It should be no surprise that nothing in this post is complex. It’s rather simple and that’s how enterprise Kubernetes at any scale should be.

Note: I’ve skipped the node pool method of upgrading because it’s covered quite well by Sandeep Dinesh in this highly related post on the official Google Cloud Blog: https://cloudplatform.googleblog.com/2018/06/Kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime.html

--

--

Tariq Islam
Google Cloud - Community

Father, Engineer, Googler. All posts and opinions are my own.