Kubernetes Zero-Downtime Upgrade using Opt-in Approach — On GKE (Part-I)

There are quite a few blogs that advocate Zero-Downtime Upgrade of GKE, including an official blog from GKE. The concept these blogs introduce in order to ensure Zero-Downtime Upgrade work well on most of the occasions but not all the time(we will see that in detail in the section below - “Problem and Motivation”). In this blog, we will look at Kubernetes upgrade from a different dimension by considering certain concepts that are introduced lately in Kubernetes release. And like the title suggests, we will upgrade Kubernetes(GKE) with Zero-Downtime by using the Opt-in approach.

What is an Opt-In approach?

An Opt-In approach to Upgrade Kubernetes(GKE) can be defined as a general practice where individuals & teams migrate only the workloads(deployments etc,.) that they are responsible to run on a newer GKE version, at their convenience. This approach gives K8s Administrators and K8s Users(Developers) to migrate to a newer version of GKE at different time thus avoiding the need to bring all the stakeholders(K8s administrators and K8s users) together at the same time and place. In short, this means within a given K8s Cluster each Individual & Team can run on a desired K8s version on the same cluster. Please continue to read on, to know how.

Problem and Motivation

Before moving further let us clearly understand the problem with current GKE upgrade mechanism. As known, Kubernetes is a Contain Orchestration System - it can manage huge number of pods at any given point in time. Regardless of the fact that whether your GKE hosts tens or thousands of apps, for obvious reasons, you would want to upgrade your GKE cluster with zero-down time.

Most of the blogs that explain the process to achieve Zero-Down time upgrade, asks you to perform the following steps:-

  1. Upgrade the master to latest Kubernetes version(Done by one click).
  2. Create a new node pool with latest Kubernetes version.
  3. Cordon the old node pool.
  4. Drain all the nodes of old node pool one after the other, until the application are up and running in the new node pool.
  5. Delete the old node pool.

Lets discuss the problem with above approach.

  1. The first and the foremost problem with the above approach is the need to bring the whole infrastructure team, application team, client team and every other related teams at one point in time and place. You bring all these teams together because when you upgrade a Kubernetes cluster, there is a high chance that you may be caught off the guard with a feature change or simply because you might also be upgrading your applications. And in other cases you may accidentally upgrade your application running on Kubernetes by simply using the “:latest” tag in container image(this happens often in a K8s Upgrade).
  2. The second most important problem with the above approach lies when you cordon all the nodes, as mentioned in the 3rd step above. When you cordon all the nodes in GKE, the GLBC — a GCE L7 load balancer controller, considers all the nodes to be “unready” and hence, if you are using ingress for your traffic on Kubernetes, you loose all the traffic, this leads to Downtime of your entire Kubernetes Cluster. For more details on this checkout the issue that I have created on Kuberentes repo:- https://github.com/kubernetes/kubernetes/issues/65013

Our approach — Zero-Downtime Upgrade with Opt-in approach

Now Finally, lets come down to discuss our approach to upgrade GKE, which brings the advantage of allowing every team to migrate to latest Kubernetes version at their own choice of time. And the rollback is much simpler and faster when compared to the approach mentioned in the previous section. Let us consider that we want to upgrade the K8s version from 1.11.1-gke.0 to 1.12.1-gke.0. The following are the steps that we use in our approach:-

1). Upgrade the master to latest Kubernetes version(Done by once click). In our case to the version 1.12.1-gke.0 of Kubernetes.

2). Create a new node pool with latest Kubernetes version and also add Taints to your new node pool(you can also use the node label instead of Taints). Let the values for the Taint be the following:- Effect = NO_SCHEDULE, Key=kubernetes_version, Value=1.12.1-gke.0

3). Now, because GKE supports node pools versions to be three minor version behind the Kubernetes Master version, It is absolutely fine to run node pool versions of 1.11.1-gke.0 (old pool version) and 1.12.1-gke.0(new pool version), together on a GKE cluster.

4). Now, add Tolerations, to your workloads(i.e., statefulset, deployment, replicaset, replication controller etc.) with respect to the Taints that you have mentioned in the step 2) right above and deploy them on K8s cluster. The Tolerations look like below:-


tolerations:
- key: "kubernetes_version"
operator: "Equal"
value: "1.12.1-gke.0"
effect: "No_Schedule"

5). If your workloads with the correct Tolerations (i.e., statefulset, deployment, replicaset, replication controller etc.) successfully run on the new node pool version(with Taints), then you have successfully migrated your application. You can repeat this for other workloads as well.

6). Rollback is pretty straightforward and faster when compared to the other approach. For rollback, you only delete/modify the Tolerations in your workloads(so that it matches the old node pool version) and redeploy. Hence, in a matter of seconds you rollback your upgrade for your application(unlike rolling back the whole Kubernetes cluster).

The main advantage of the above approach is that K8s administrator is less involved and every team performs its own upgrade in a simple manner, thus bringing in decentralised/distributed approach to K8s upgrade.

An important tip for the above approach, is to use cluster Autoscaler for the new node pool, so as to scale the node pool from minimum to maximum number of nodes as & when the workloads arrive. This helps to avoid unnecessary resource waste.

Conclusion

In the above approach, we have seen how a K8s upgrade can be further simplified and decentralised/distributed. An important issue that arises out of the above approach is the need to add Tolerations in every workload of K8s , which can be overwhelming for an Administrator to do it alone. So how do we communicate this to individual teams responsible for their own workloads?. We will address this in the part-2 of this blog, using validation and mutation webhook feature of K8s. This feature intercepts and modifies the request to K8s apiserver, which you allows you to validate and mutate the workloads respectively, more on that in the Part-2.

Thank you so much, please feel free to correct me with your suggestions and advice.