Kubernetes Engine (GKE) multi-cluster life cycle management series

Part III: GKE Cluster lifecycle management

Ameer Abbas
Google Cloud - Community
4 min readApr 21, 2020

--

Distributed Service Foo on GKE

In the previous two parts (part I and part II), I discussed the need for multi-cluster architectures and defined the concept of a Distributed Service, which is a Kubernetes Service running on multiple clusters. In this part, let’s dive into what we mean by GKE cluster lifecycle management. Cluster lifecycle management can be defined as strategies and planning required to maintain a healthy and updated fleet of Kubernetes clusters. Specifically, this refers to keeping the entire fleet of Kubernetes clusters updated without violating service SLOs. With proper strategies and planning in place, cluster lifecycle management should be routine, expected and “uneventful”.

Even though this blog series specifically focuses on Kubernetes Engine (GKE) lifecycle management, the concepts covered may be applied to other distributions of Kubernetes.

GKE versioning and upgrades

Before discussing strategies and planning for cluster lifecycle management, it is important to understand what constitutes a cluster upgrade.

A cluster is composed of two pieces: master(s) and nodes. A Kubernetes cluster upgrade requires all nodes and master(s) to be upgraded to the desired version. Kubernetes follow a semantic versioning schema. Kubernetes versions are expressed as X.Y.Z. Where X is the major version, Y is the minor version and Z is the patch version. Minor releases occur approximately every three months (quarterly) and the Kubernetes project maintains release branches for the most recent three minor releases. This means a nine month old Kubernetes minor release is no longer maintained. In addition to not being maintained, lagging too far behind Kubernetes releases might also result in API changes when it is eventually upgraded to the latest versions. This may end up being more challenging and risky than keeping up to date with the Kubernetes versions. Needless to say, Kubernetes upgrades must be planned at a regular cadence. We suggest quarterly or every two quarters is a good time frame for planned GKE upgrades.

GKE clusters support running Kubernetes versions from any supported minor release. At least two, if not three, minor versions are available at any given time.

GKE offers three types of clusters.

  1. Single zone clusters — Single master and all node-pools in a single zone in a single region
  2. Multi zonal clusters — Single master in one zone and node-pools in multiple zones in a single region
  3. Regional clusters — Multiple masters and node-pools in multiple zones in a single region

GKE is a managed service and offers auto-upgrades for both masters and nodes.

GKE auto-upgrades

GKE auto upgrades is a popular and often used cluster lifecycle strategy. GKE auto upgrades provide a no-ops way to keep your GKE clusters updated to supported versions. GKE auto upgrades upgrade master and nodes separately.

Master auto upgrades — By default, GKE masters are automatically upgraded. Single zone and multi-zonal clusters have a single control plane (master). During master upgrade, workloads continue to run, however you cannot deploy new workloads, modify existing workloads, or make other changes to the cluster’s configuration until the upgrade is complete.

Regional clusters have multiple replicas of the control plane, and only one replica is upgraded at a time. During the upgrade, the cluster remains highly available, and each control plane replica is unavailable only while it is being upgraded.

Node upgrades — Node pools are upgraded one at a time. Within a node pool, nodes are upgraded one at a time, in an undefined order. You can change the number of nodes upgraded at a time. This process might take several hours depending on the number of nodes and their workload configurations.

GKE auto upgrade lifecycle strategy

We recommend using GKE auto upgrades where possible. GKE auto upgrades prioritizes convenience over control. Even so, GKE auto upgrades provide many ways to influence when and how your clusters get upgraded within certain parameters. You can influence the maintenance windows and exclusions. Release channels influence the version selection and surge upgrades influence how many nodes at a time to upgrade. Despite these controls and even for regional clusters (with multiple Kubernetes control planes), GKE auto upgrades does not guarantee services’ uptime.

You may choose not to utilize the GKE auto upgrade feature if you require one or more of the following:

  1. Control the exact version of GKE clusters
  2. Control the exact time to GKE upgrade
  3. Control the upgrade strategy (discussed in the next blog) for your GKE fleet. For example, using multi-cluster as part of the cluster lifecycle management.

If any of the above reasons exist for you, you cannot use the GKE auto-upgrade strategy. In the next part, I will dive into few multi cluster upgrade strategies as well as the planning associated with each.

Up next… Part IV: GKE multi-cluster lifecycle management

--

--

Ameer Abbas
Google Cloud - Community

I do Cloud architecture, Kubernetes, Service Mesh, CI/CD and other cloud native things. Solutions Architect at Google Cloud. Opinions stated here are my own.