GKE Enterprise: A platform engineered for success with Kubernetes

Published in

Google Cloud - Community

9 min readSep 11, 2024

Kubernetes has been widely adopted as the platform for running containerized workloads. Operationally, many organizations have adopted a multi cluster strategy, mainly for reasons around isolation and reliability.

For many, what started as a small, DevOps-based approach to running containers, has moved towards a centralized platform engineering team, standardizing on Kubernetes, at scale.

But, managing multiple clusters is hard. From what I have seen, the projects that see most success in this space are those that try to tame the complexity. If the diverse cluster population can be managed with common configurations and processes, then the intricacies of each cluster become less of a problem.

GKE Enterprise is a multi-cluster management platform, with extensive features and functionality. Several of these features are aimed at simplifying cluster management. Four of these I consider to be key are as follows:

Visibility — providing a clear view of the cluster landscape, as it is and as it grows
Reducing operational toil and effort — by grouping resources for effective, centralized management
Desired state for compliance and governance — continuous configuration and remediation using from a source of truth
Cost Optimisation — integrated platform usage data and optimisation recommendations

This post will delve a little deeper into each of these features.

1. Visibility: Understanding what is out there

Having the ability to visualize your container infrastructure is a must in their effective management.

GKE Enterprise supports onboarding any CNCF Compliant Kubernetes cluster. Once on-boarded into GKE-E, two key things happen

Clusters are linked to the GKE Enterprise console in Google Cloud using a GKE connect and Connect agent
Metadata associated with the attached cluster is synchronized with Google Cloud, ensuring consistent visibility and management

This visibility of clusters through a single pane of glass as a “first principle” of multi cluster management often gets overlooked — considered just a means to get to the more complicated steps of cluster management.

But in my experience, this simple, risk free way of getting information about all the clusters, in one place, is powerful in GKE-E because:

Consolidated view — Eliminate the need to juggle multiple tools and consoles for different environments when used in multi- or hybrid cloud environments. In the immediate short term, this saves annoying context switching. In the long term it should streamline platform operational effort and impact OpEx.

Insights — With a simple step, information on workloads, services, metrics on capacity and utilization, cost optimisation, logging and much more*, are all visible in the console (see sample below). This information is important in understanding what is out there, and goes a long way into planning of a consistent platform management strategy

Status — A unified view enables administrators to quickly assess the overall health and status of their entire Kubernetes environment. They can easily identify performance bottlenecks, resource utilization patterns, potential issues across clusters and possible cost savings.

Lifecycle management — With minimal set-up effort, new clusters can be provisioned or deleted, and workloads can be deployed to clusters.

What this visibility represents is a quick win. Platform engineers are up and running quickly, armed with insights into their cluster estate, and out of the box capability (see below), with little effort and risk.

*Here is the list of features available across different cluster types, at the time of writing

2. Reduce Operational load with Fleets — Configure once, apply to many

As stated previously, different clusters can be used to isolate environments (non-prod, prod), projects, or teams. With an increase in the number of clusters comes an increase in operational overhead for the platform engineers. It becomes more complicated to maintain security and governance requirements, end user management and operational efficiency.

Here is where the ability to group clusters can help.

For example, if clusters from 2 different environments can be grouped effectively and managed together as one entity, then the differences between them become less relevant.

Platform engineers can build policies for each group type as opposed to individual clusters.

In GKE-E, there is a granular, grouping mechanism called Fleets. With fleets, a logical perimeter is placed around clusters, allowing them to be managed as a unit.

Fleets have the additional capability of grouping the Kubernetes constructs within the clusters in the fleet, in particular cluster namespaces.

Using Fleets, cluster namespaces can be turned into Fleet namespaces. This makes that namespace span all clusters in the fleet.

Again, this is working to the concept of trying to reduce multi-cluster complexity — making the differences in the underlying clusters less of a factor from a management perspective.

The namespace that is the unit of management here, not the cluster. Have a look at the example below.

Clusters 1, 2 and 3 are registered to the Fleet F1. The namespace ns-web exists on cluster 1. By creating a logic Fleet namespace (FNS) of the same name, i.e. ns-web, the scope of this namespace is now extended across all three clusters. Any configurations can be applied to this FNS as a single entity rather than individually.

With GKE-E Fleets you can apply Fleet features, across groups of clusters and namespaces — simplifying management across clusters. Two of these features that are key are Policy Manager and Configuration Sync. Lets have a look at these next.

3. Always ON — Desired state for Cluster Configuration and Compliance

I have never worked with an organisation that did not need to adhere to external and internal policies for compliance and security. These are never optional and are often the primary focus outside of running workloads. A constant challenge for platform engineers of multi-cluster environments is to continually maintain and accurately audit multiple cluster configurations and compliance posture.

Two key of these Fleet features of GKE-E are Policy Controller and Config Sync

Policy Controller (based on OPA Gatekeeper) enables the enforcement of security policies for Kubernetes clusters.

Using GKE-E, platform engineers can set up policies at the Fleet level and have these applied to the clusters within the fleet. They can use pre-built policy bundles (like CIS kubernetes benchmark, or PCI compliance) , or build their own custom profiles.

These policies are audited, so that any violations can be viewed from the dashboard.

In the screenshot below we can easily see what clusters have violations for both the Kubernetes and Industry standard policies that have been configured and applied.

GKE-E policy Controller — cluster policy status

In addition, the actual violations can be investigated, see below.

GKE-E Policy Controller — policy violation view

The affected resources can be identified…

GKE-E Policy Controller — detailed policy violation view

and ultimately remediated…

GKE-E Policy Controller — violation remediation

The overall Fleet compliance can also be viewed (in preview mode at the time of writing)

GKE-E Policy Controller — Fleet-based compliance

In addition to Policy Controller, the Config Sync fleet feature allows platform engineers to define cluster configurations in a source control repository, and deploy automatically across the cluster estate.

Consider the example below of a configuration that creates a namespace ns-app , with a namespace quota of 5Gi CPU and 10Gi memory.

In GKE-E, the configuration file for this can be stored in source control, and defined within Config Sync to be applied to current clusters in a Fleet — in this case cluster 1, 2 and 3. It will also be applied to any new clusters registered to the Fleet. If the namespace is deleted, like in cluster 4 here, it will be remediated by Config Sync and the namespace ns-app is recreated as per the stored configuration.

Config Sync audit information is visible in the GKE-E console, see below.

By drilling into the failed packages, we can see that a configuration for a deployment failed…

This means that a cluster is not in the desired state and remediation needs to be taken

Both Policy Controller and Config Sync together play a crucial role in the ability to ensure desired compliance and configuration of your clusters, and is especially important in multi cluster environments.

4. Clarity on platform costs

I have not worked within any organisation where the cost of the infrastructure is not a key factor. It may not always be the responsibility of the platform engineer, but if that data is easily available, it helps to drive infrastructure optimization and can help drive some of the processes implemented within organisations. For example, I have worked on several projects that implemented a deletion policy on clusters based on usage metrics and associated costs.

GKE Enterprise helps here, by providing a central view of your platform spend — breaking down costs across clusters, namespaces, teams, and even individual workloads. See a sample from the GKE-E console below.

In this example, we can see the estimated monthly costs of four GKE standard clusters over the last 30 days.

What is also presented are the potential cost savings for each of these clusters for that same 30 day period. This is the total cost of unallocated CPU and memory in that time. So, the cost of resources that were paid for but have not been used.

This is really useful information and can help drive cost reduction initiatives within the organisation.

In addition, we can even drill a bit further down into the actual workloads themselves to understand where the issues are with resource utilisation, see screenshot below.

GKE-E workload resource usage for cost optimization

This shows the Used(black), Requested(green) and Limit(grey) values for CPU and memory for containers in the workload.

And we can see which workloads (in this case all) are using less capacity(black) than that requested(green).

This information can be really useful for understanding where configuration mismatches may exist in terms of workload resource allocations or node sizing, which can help drive operational behaviour. For example — a direct result of this data could be the creation of a policy or policies that make sure optimised resource limits and requests are applied across workloads — and implement these using Policy Controller.

For more information on this topic, Google provides detailed guidance on best practices for running cost-optimized Kubernetes applications on GKE

Summary

Multi cluster management is hard. I see that success can come when the complexity of the underlying cluster infrastructure is abstracted such that the management units are logical grouping like GKE-E’s Fleets and Fleet namespaces.

GKE-E makes multi-cluster management easier at scale, by

Visibility and control of multi clusters within one management plane
Providing a granular framework for grouping clusters and namespaces
Simplifying governance — Forget about manually granting and revoking permissions for each namespace and cluster. With fleets and fleet namespaces, you define access once, apply across the fleet, ensuring teams have just the right level of control.
Simplifying configuration — requirements like cluster policies and namespace quotas can all be configured once at the fleet and fleet namespace level and automatically applied.
Simplifies tenancy based on dedicated namespaces
Providing essential information of usage costs with the ability to implement policies around resource requirements

If you are interested in exploring GKE-E more, this is a great resource. If you need another take on Fleets and namespaces, this resource is a great read. Lastly, have a look at the guide for getting started on GKE Enterprise.