Design your Landing Zone — Design Considerations Part 2 — Kubernetes and GKE (Google Cloud Adoption Series)

Published in

Google Cloud - Community

16 min readMar 13, 2024

Welcome to Part 2 of Landing Zone Design Considerations. This is part of the Google Cloud Adoption and Migration: From Strategy to Operation series.

In the previous part, we covered:

What is a landing zone?
Why do you need one?
The LZ design process overview.
Seven categories of design considerations that are important when designing your landing zone, and the key design decisions you need to make.

Today we’ll cover considerations relating to running Kubernetes in your landing zone.

8. Google Kubernetes Engine (GKE)

As I’ve said previously in this series, containers are great! They are lightweight, fast, portable, and easily scaled. They are well-suited to microservices architectures.

Google Cloud offers a couple of different ways to run containers. But inevitably, if you’re running modern container-based workloads in the cloud, you’re going to need Kubernetes. And if you’re running Kubernetes on Google Cloud, you should be doing so with Google Kubernetes Engine.

GKE is Google’s managed Kubernetes platform. It allows you to run Kubernetes in Google Cloud, but offers a number of benefits over self-managed Kubernetes, including:

GKE takes care of deploying clusters, installing Kubernetes, and registering nodes. A new cluster can be deployed in minutes!
The Kubernetes control plane nodes are fully-managed and completely abstracted from you as a consumer.
Hosts run a hardened, pre-configured container-optimised OS that is transparently managed and patched by Google.
Google transparently manages Kubernetes upgrades, through release channels.
Clusters automatically scale elastically, to meet the demands of the workloads. (Without you having to care about creating managed instance groups and load balancers with autoscalers.)
Automatic non-disruptive repair and replacement of unhealthy cluster nodes.
Out-of-the-box regional high availability.
GKE is natively integrated into Google Cloud Operations, meaning that access to monitoring, logging and metrics is super-easy.
With GKE Autopilot, clusters are preconfigured with out-of-the-box best practices.
With GKE Autopilot, you pay for the workloads that are running (called “per-pod billing”), rather than paying for the clusters you deploy. This is a game-changer!
Allocate specific machine types to workloads that need them, through GKE nodepools and — in Autopilot — with compute classes. You can even allocate GPUs to workloads that need them, such as AI/ML training.
Seamless integration with Google Anthos Service Mesh, which can itself be consumed as a fully-managed service.

(As an aside, I’ve covered the various reasons why trying to run a self-managed Kubernetes environment on GCE is a terrible idea, here.)

There are a few design decisions and best practices to consider. It would be possible to do a series on this one topic alone! But for brevity, I’ll just provide summaries of the considerations here.

GKE Autopilot or GKE Standard?

Autopilot has been the default cluster deployment mode for some time now. Out-of-the-box, it deploys a number of best-practices, such as:

Mandatory enrollment into a release channel. This means that your clusters are automatically patched and maintained, and it’s simply not possible to run an old, insecure version of Kubernetes.
Clusters are regional, ensuring cluster high availability.
Cluster autoscaling — where GKE automatically adjusts the number of nodes in the cluster — is enabled by default.
Node auto-repair is enabled by default.
Nodes are built using shielded instances with secure boot.
Workload identity is preconfigured. This allows Kubernetes service accounts to act as Google Cloud IAM service accounts. Thus, we can provide fine-grained access control from the GKE services to Google Cloud APIs.

And as I mentioned previously, since you pay for the actual deployed and running pods, you avoid the common problem of wasting spend on underutilised GKE clusters. (This is a common challenge when running GKE Standard.)

I would recommend Autopilot in most use cases. There may be some niche use cases where you want to use Standard. For example:

If you want to deploy nodes with TPUs.
If you want to run a specific version of Kubernetes, and you want to opt-out of automatic upgrades. (Which is generally a bad idea, and somewhat defeats the point of using a managed service.)
If you want to run privileged pods (i.e. workloads that require elevated privileges) that are not on the Google Autopilot allowed list.
If you want to deploy experimental workloads to a single zone cluster, and you don’t care about ensuring high availability.

Multitenant versus Single Tenant Clusters?

This is less black and white than it used to be. Before Autopilot, the answer was pretty simple: use multitenant clusters as much as possible. The idea is that you have very few clusters, and each cluster hosts workloads from multiple tenants. Here, the tenants are typically different teams within the organisation. And each tenant will own one (or more) namespaces.

With a multitenant GKE cluster:

There is only one cluster to manage; so the overhead of managing the cluster is relatively small.
Individual tenants / applications do not need to provision any clusters. They can simply deploy their workloads to the existing cluster.
You can delegate responsibility for managing the cluster itself to a cloud platform team, so that the application teams don’t need to care about any GKE management responsibility.
We achieve isolation between tenants through use of namespaces. Tenants deploy to their own namespaces.

Conversely, if you have many small single tenant GKE (Standard) clusters — i.e. where each cluster is only hosting the services for a single application — then this rapidly gets costly:

Each application has to manage its own cluster. This creates a relatively significant amount of cluster management overhead for each team.
Furthermore, in order to ensure high availability, each cluster will have to deploy a minimum number of nodes across a region. For single tenant applications, this frequently results in a minimum cluster size that is FAR LARGER than the needs of the application. And thus, we end up with a large number of significantly overprovisioned, underutilised clusters. And because — with GKE Standard — we pay for the cluster we’ve deployed rather than the pods we’re running, this results in throwing away a huge amount of cash!

Setting fire to your money is quicker and easier than running many single-tenant GKE clusters

With Autopilot, the impact of having many single tenant clusters is not so bad:

Because we’re only paying for the pods we’re running, we’re not burning cash through the deployment of a large number of underutilised clusters.
Because a lot of the best-practice is pre-configured out of the box, the amount of cluster management overhead per tenant is relatively small.

But to wrap-up: I would generally recommend using multitenant GKE Autopilot clusters where possible. Use single-tenant clusters only for exceptional scenarios.

You can always set up a few multitenant clusters, deployed at a level of granularity you are comfortable with in your organisation. For example, you might choose to have a multitenant cluster aligned to each line of business.

Assuming you’ve built your landing zone around a shared VPC design (as discussed in the previous part), then you’ll typically want to deploy your multitenant GKE cluster(s) within the shared VPC. This might be:

A shared VPC that hosts the multitenant GKE, but which is peered to a “hub” VPC
The same shared VPC that hosts other shared services.

Regardless, your platform team will own the host project where the shared VPC lives, and where the multitenant GKE cluster lives. And the tenants will own service projects. They can deploy workloads directly to the multitenant cluster (in a namespace), and they can deploy non-GKE resources to their service project.

The Google Cloud documentation illustrates this approach like this:

Tenants own service projects, and can deploy resources to a shared VPC where the MT GKE cluster lives

VPC-Native or Routes-Based Cluster?

A routes-based cluster is a cluster that relies on VPC custom routes for pod-to-pod traffic.

A VPC-native cluster is a cluster that uses alias IP address ranges for pod addressing. This means that the pods are natively routable within the cluster’s VPC network, as well as any peered VPC networks. Pod addresses do not require configuration of custom static routes. Network management overhead is relatively low.

VPC-native clusters also allow the use IP addresses that are outside of the RFC 1918 IP range. This can be helpful if you expect to need a lot of pod IP addresses. For example, GKE can use the 240.0.0.0/4 CIDR range for nodes, pods and services, which provides an additional 268 million IP addresses!

Google recommends VPC-native clusters. It is the default for GKE Autopilot, and it has been the default for GKE Standard since version 1.21.0-gke.1500 (which is quite old, having been released in 2021).

In short: use VPC-native clusters!

Private clusters?

By default, GKE clusters are public. This means that the control plane and worker nodes have public IP addresses. However, it is best practice for an organisation to create private clusters. This allows us to isolate the cluster from the Internet.

In a private cluster:

Worker and control plane nodes have only private IP addresses; they are not exposed to the the Internet. So for example, a service of type NodePort will not be accessible to clients on the Internet because the nodes do not have Internet-routable public IP addresses.
Nodes use Private Google Access to communicate with Google APIs and services.
Any outbound access to the Internet must be through specific controls you have implemented, such as Cloud NAT, or an Anthos Service Mesh egress gateway.
We allow inbound connections from external clients only through external services. Typically using: a LoadBalancer service, with an external Ingress or an Anthos Service Mesh public ingress gateway.
Communication between the worker nodes and the GKE control plane is through a private endpoint on the control plane. And any additional networks requiring access to the control plane must be authorised.

Share IP Address Ranges Across Clusters

When we create a VPC-native cluster, we must allocate IP addresses for the nodes, for the pods, and for the services.

Node IP addresses: i.e. the addresses used for the host machines that are the nodes of the cluster. The cluster uses the subnet’s primary IPv4 range for assigning IP addresses to nodes. We should size the subnet based on the largest size we expect the cluster to grow to, in terms of nodes.
Pod IP addresses: the cluster uses secondary IPv4 address ranges when assigning IP addresses to pods. This is the largest IP address requirement in GKE, since each node can host many pods. By default, GKE Autopilot sets the maximum number of pods per node to 32, and allows 64 IP addresses per node, to allow for pod churn. Kubernetes assigns each node a secondary IP address range, such that each pod can be allocated a unique IP address. By default, GKE Autopilot assigns a /17 secondary subnet range, which allows for 32766 usable pods. (And this would equate to a cluster with a little over 1000 nodes.)
Service (ClusterIP) IP addresses: the cluster uses a separate secondary IP address range for internal service addresses. Service IPs are assigned to ClusterIP services; they are virtual IPs within the cluster and are only addressable from within the same cluster. GKE Autopilot, by default — since GKE 1.27 — assigns IP addresses from a Google-managed network, using the range: 34.118.224.0/20. This same range is allocated to every cluster. Thus, each cluster gets over 4K service addresses, and organisations do not need to provision or allocate IP addresses for services on GKE.

The IP address ranges we need to consider for our GKE cluster

You may want to run a few clusters in the same shared VPC. For example, you may want to define a cluster per line of business. (As I already mentioned: having many small single-tenant clusters is a bad idea.) In this case, you can share your primary and secondary IP address ranges between the clusters, which are hosted on the same subnet. This is a good thing to do because:

You don’t have to allocate a subnet per cluster.
You don’t have to worry about sizing the subnets per cluster.
You reduce the network adminstration overhead.
It is far less wasteful on IP addresses, and reduces the risk of IP address starvation.

If you choose to share your ranges between your clusters in the VPC, then you should:

Define your named subnets in advance.
Consider using the non-RFC 1918 private CIDR ranges, such as 100.64.0.0/10(which allows for approximately 4.2 million pods) and 240.0.0.0/4 (which allows for approximately 268 million pods).

Release Channels

Here you decide your strategy for how your GKE clusters will be upgraded, maintained and patched.

Kubernetes versions are expressed in the format x.y.z where x is the major version, y is the minor version, and z is the patch version. There are typically three to four significant (major or minor) Kubernetes release per year, whereas patch releases typically occur weekly. Major and minor releases contain both new features and security patches. A given minor release is supported for approximately one year. So at any point in time, there will usually be around three minor releases in support.

My recommendation (and Google’s) is to always enroll your clusters in release channels. With release channels, Google manages cluster upgrades for you. Your worker nodes are auto-upgraded using a rolling surge upgrade strategy, in order to minimise impact on your workloads. Consequently, cluster administrators do not have to spend any time or effort upgrading clusters.

With GKE Standard you can choose not to use release channels. You may want to do this if you want to keep your cluster running a specific version of Kubernetes. However, in my experience, organisations that opt-out of release channels soon find themselves running an estate full of vulnerable clusters. Such organisations will have themselves a huge patching headache!

What you get if you don’t enroll your clusters in a release channel

However, with GKE Autopilot, it is not possible to opt-out of release channels. (Which is a good thing!)

(Side note: the GKE control plane nodes are always upgraded by Google, and there is no way to opt-out of this process. This is regardless of release channel enrollment.)

There are three release channels to choose from:

Rapid — Your Kubernetes cluster is upgraded a few weeks after the open source release has gone generally available. This channel offers the newest features, but has the poorest stability. Furthermore, clusters on this release channel will NOT be supported by the GKE SLA.
Regular — Kubernetes is upgraded to the release version that has been running in Rapid for around 2–3 months, including any patches. This provides a balance between new features and stability. By default, GKE Autopilot enrolls your nodes to this release channel.
Stable — Kubernetes is upgraded to the release version that has been running in Regular for around 2–3 months. This release will have been available in the community for around 5–6 months, before hitting your cluster nodes. This channel will be the most proven and most stable, but will not have recent Kubernetes features.

When a new GKE version becomes default in a release channel, that cluster will typically be upgraded within 10 days.

Here are my general recommendations with respect to adoption of release channels:

Your production workloads should be enrolled into Stable or Regular, depending on your risk appetite and criticality of the cluster. I recommend that mission-critical workloads are enrolled into Stable.
Your final non-production environment (typically a Staging, Pre-Prod or UAT environment) should be enrolled into the same release channel as Production. Why? Because you don’t want your workloads to land in production on a version of Kubernetes that you’ve never tested with before.
Your upstream non-production environments (e.g. Dev or QA) should be deployed to the adjacent upstream release channel.
Only use the Rapid release channel for development environments where you want to experiment with new features. Remember that any brand new features should appear in the Stable release channel within a couple of months.

For example:

Selecting GKE release channels for your environments

One issue with release channels is that you cannot guarantee when your cluster will be upgraded. If you have enrolled your final non-prod environment and your Prod environment into the same release channel (which is Google recommeded best practice), then you will likely want to implement guardrails to ensure that your non-prod environment is upgraded before your Prod. This is easy to do with GKE fleets and scopes.

Here is an example of such a rollout sequence:

In this example:

The GKE clusters in the Staging environment belong to one fleet: Staging.
The clusters in Production belong to another fleet: Production.
All the clusters in Staging and Production are enrolled to the Stable release channel.
A soaking period of 7 days has been specified, such that the Production fleet will only be upgraded once the Staging fleet has been running for 7 days.

Workload Identity Federation

Workload identity federation allows Kubernetes service accounts to act as an IAM service account. Pods that use the configured Kubernetes service account will automatically authenticate as the IAM service account when accessing Google Cloud APIs. In this way, we can ensure that your GKE application workloads are authenticated and authorised, before allowing access to Google Cloud services.

Autopilot clusters enable workload identity federation for GKE by default.

Autoscaling Strategy

Autoscaling refers to your cluster’s ability to scale in order to meet the demand for the applications running on it.

GKE has four scaling dimensions:

Workloads are scaled horizontally, by adding and removing pods. This is managed by the horizontal pod autoscaler (HPA) in response to demand. The HPA responds to standard metrics like CPU utilisation, or custom metrics, like requests per second. The HPA is intended to rapidly scale in response to demand.
Infrastructure is scaled horizontally, by adding and removing cluster nodes. This is managed predictively by the cluster autoscaler (CA), to accommodate scheduled pods. For example, if there are no nodes available to schedule a newly created pod, then the cluster autoscaler will create a new node.
Workloads are scaled vertically, by increasing and decreasing pod size. This is managed by the vertical pod autoscaler (VPA). The VPA monitors the CPU and memory utilisation of pods over time, and then adjusts pod sizing accordingly. Over time, this results in a more optimal and cost effective pod size. Oversized pods are downsized, which reduces wasted spend. And undersized pods are upsized, which improves application performance and reliability. Note that the VPA can be turned off — also called recommendations mode. In this mode, the VPA will create recommendations which you can then use to manually adjust pod sizes.
Infrastructure is scaled vertically, by deploying or deleting node pools with optimal node (VM) sizes. This is called node auto-provisioning (NAP) and it selects compute node sizes to achieve the most efficient bin packing of the pods. Note that this is relatively slow, compared to other types of scaling.

The four dimensions of GKE cluster autoscaling

Some pointers:

When using GKE Autopilot, Google configures the infrastructure autoscaling (CA and NAP) for you. You only need to consider the pod autoscaling configuration.
Don’t try to use the HPA and VPA at the same time, with the same resource metric. For example: avoid setting both the HPA and VPA to respond to CPU utilisation.
Workload scaling is simplified using the Multidimensional Pod Autoscaler (MPA), which manages both horizontal and vertical pod autoscaling.

To conclude:

With GKE Standard, you need to manage both infrastructure scaling, and workload (pod) scaling.
With GKE Autopilot, you only need to care about workload scaling, and you can simplify workload autoscaling using the MPA.

Deploying Infrastructure and Workloads

Google recommends that when deploying clusters themselves, you should — as is always the case for infrastructure in cloud — be using infrastructure-as-code (like Terraform). So use IaC for deploying clusters and namespaces.

To deploy your workloads — including deployments, services, jobs, StatefufSets, ingress, policies, etc — you should be using native Kubernetes API calls using declarative yaml files. Helm can help you manage your application deployments on Kubernetes.

GKE Design Decisions Summary

To recap: here are the key Kubernetes design decisions you need to consider, along with my recommendation for each.

GKE, or self-managed? Recommendation: GKE.
GKE Autopilot vs GKE Standard. Recommendation: Autopilot.
Multi-tenant clusters? Clusters at what level? Recommendation: multi-tenant.
Ability to create single-tenant clusters. Recommendation: by exception only.
VPC-native or routes-based cluster? Recommendation: VPC-native.
Private clusters? Recommendation: use private clusters.
Share IP address ranges across clusters? Recommendation: if you have few or many clusters within a shared VPC, you should do this.
Release channels? Always enroll to release channels. Keep your Staging and Prod clusters on the same release channel. Use fleets and rollout sequences to ensure your clusters are upgraded in the correct sequence.
Workload identity? Yes — use it.
Autoscaling strategy? Use the cluster autoscaler, to underpin workload autoscaling. Cluster autoscaling is managed for you when you use GKE Autopilot. Use the MPA to simplify workload autoscaling.
Deploying infrastructure and workloads? Use IaC (e.g. Terraform) for cluster deployment and management. Use declarative Kubernetes manifests to deploy your application workloads.

Wrap-Up

That’s it for the key considerations of how to design GKE for success in your landing zone. In the next part, we’ll complete LZ design considerations, and cover topics including: logging and monitoring strategy, billing, and infra-as-code (IaC).

Before You Go

Please share this with anyone that you think will be interested. It might help them, and it really helps me!
Please give me claps! You know you clap more than once, right?
Feel free to leave a comment 💬.
Follow and subscribe, so you don’t miss my content. Go to my Profile Page, and click on these icons: