Scaling your GKE applications using external metrics

Published in

Zencore Engineering

7 min readMar 20, 2023

Intro

Google Kubernetes Engine (GKE) is a container orchestration platform that provides advanced capabilities for managing the applications’ lifecycle. This includes deploying, managing, scaling, and self-healing of applications.

One of the most important features of GKE is its scalability, which enables developers to adjust the resources allocated to their applications based on the demand. This feature is particularly important in ensuring optimal performance of applications and avoiding overprovisioning, bottlenecks, and other related issues that can adversely affect application operations.

With Cluster Autoscaling, GKE automatically adapts the number of nodes in a cluster in response to the current load. When the CPU and memory utilization of the nodes reach a certain threshold, the Cluster Autoscaler automatically adds more nodes to the cluster. Similarly, when the utilization decreases, the Cluster Autoscaler automatically removes the nodes to save resources.

GKE’s scalability extends beyond just the cluster, allowing developers to scale workloads vertically or horizontally by adjusting the CPU and memory allocated to the pods, or by adding or removing replicas. This elasticity ensures that applications can efficiently handle fluctuations in demand, without affecting performance.

One of the most important features of GKE for scaling applications is the Horizontal Pod Autoscaler (HPA). This Kubernetes feature enables automatic scaling of the number of pods in deployments based on different metrics, such as CPU utilization. HPA ensures that the number of replicas in a deployment is always optimal to handle the current load on the application. There are several metrics that can be configured to scale workloads effectively:

Resource metrics: pod’s CPU or memory consumption
Custom metrics: metrics from other resources on the cluster, such as network traffic
External metrics: metrics from services outside the cluster, such as the amount of Pub/Sub undelivered messages

In this article, we will explore how to scale GKE workloads using external metrics that reside on a Cloud SQL database.

An example use case

In this walkthrough, we will be implementing GKE HPA with an external metric located in a Cloud SQL for MySQL database. The architecture for the implementation will be the following one:

High-level diagram of our HPA implementation

To retrieve the metric value from the DB, we will code a small Python app that will be responsible for reading the external metrics, leveraging Cloud SQL Auth proxy for connectivity. We will run a Cloud SQL Auth proxy as a sidecar to our Python application, on the same pod.

Our custom metrics application will read the metric and, afterwards, will write it on Cloud Monitoring. We will also deploy Stackdriver Adapter, so that the cluster can read these metrics from Cloud Monitoring. Lastly, we will configure the Horizontal Pod Autoscaler to use these metrics to scale up or down our hello-server deployment.

Detail of how our Custom Metrics Writer app works

You may be wondering why we are deploying a separate app for writing the metrics instead of having this functionality embedded on the main deployment itself. Well, if we don’t have a separate app for writing metrics, each of the replicas on our hello-server deployment would be querying and saving metrics on Cloud Monitoring at the same time. Having a separate deployment for writing custom metrics, we guarantee that only one replica will query and save the metric.

For workload authentication, we will implement Workload Identity to avoid using credentials files, which are not secure. With Workload Identity, you bind a Kubernetes Service Account (KSA) to a Google Service Account (GSA), and then grant the GSA the IAM permissions required by the KSA. There is no credentials file involved.

The Metric Writer app needs access to write metrics to Cloud Monitoring but also has to connect to the external database via Cloud SQL Auth proxy. On the other hand, the Stackdriver Adapter needs to be able to read the metrics on Cloud Monitoring. Hence, we will grant the following roles to each GSA:

Metric Writer: roles/cloudsql.client and roles/monitoring.metricWriter
Stackdriver Adapter: roles/monitoring.viewer

Our setup will involve two separate namespaces. One of these, metrics-writer, will include both the Python Custom Metrics Writer and the hello-server deployment. The other namespace, custom-metrics, will contain the Stackdriver Adapter.

You can find all the needed files, including a readme with more details, on our Zencore repo: https://github.com/zencore-dev/gke-hpa-custom-metrics

Getting down to business

Cloud SQL setup

Let’s first create our metrics’ database. For that, create a Cloud SQL for MySQL instance, and then a database named custom-metrics-db.
Create the metrics table and populate it with data:

USE custom-metrics-db;

CREATE TABLE metrics (
    metric_name varchar(255),
    metric_value int
);

INSERT INTO metrics (metric_name,metric_value) VALUES ("app-metric-1",100);

GKE setup

Create a GKE cluster or update an existing one, making sure you enable Workload Identity.

Create the namespaces and KSAs:

kubectl create ns metrics-writer
kubectl create ns custom-metrics
kubectl create serviceaccount ksa-metrics-writer -n metrics-writer
kubectl create serviceaccount custom-metrics-stackdriver-adapter -n custom-metrics

Create the GSAs and grant them the required permissions:

gcloud iam service-accounts create gsa-metrics-writer
gcloud iam service-accounts create gsa-custom-metrics-adapter

gcloud projects add-iam-policy-binding <PROJECT_ID> \
    --member serviceAccount:gsa-metrics-writer@<PROJECT_ID>.iam.gserviceaccount.com \
    --role roles/cloudsql.client

gcloud projects add-iam-policy-binding <PROJECT_ID> \
    --member serviceAccount:gsa-metrics-writer@<PROJECT_ID>.iam.gserviceaccount.com \
    --role roles/monitoring.metricWriter

gcloud projects add-iam-policy-binding <PROJECT_ID> \
    --member serviceAccount:gsa-custom-metrics-adapter@<PROJECT_ID>.iam.gserviceaccount.com \
    --role roles/monitoring.viewer

Bind the KSAs and GSAs for Workload Identity:

gcloud iam service-accounts add-iam-policy-binding \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:<PROJECT_ID>.svc.id.goog[metrics-writer/ksa-metrics-writer]" \
    gsa-metrics-writer@<PROJECT_ID>.iam.gserviceaccount.com

gcloud iam service-accounts add-iam-policy-binding \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:<PROJECT_ID>.svc.id.goog[custom-metrics/custom-metrics-stackdriver-adapter]" \
    gsa-custom-metrics-adapter@<PROJECT_ID>.iam.gserviceaccount.com

Annotate the KSAs:

kubectl annotate serviceaccount -n metrics-writer ksa-metrics-writer \
iam.gke.io/gcp-service-account=gsa-metrics-writer@<PROJECT_ID>.iam.gserviceaccount.com

kubectl annotate serviceaccount -n custom-metrics custom-metrics-stackdriver-adapter \
iam.gke.io/gcp-service-account=gsa-custom-metrics-adapter@<PROJECT_ID>.iam.gserviceaccount.com

Create a GKE secret with the database details so Cloud SQL Auth proxy can connect to the external metrics database:

kubectl create secret generic custom-metrics-secrets \
  -n metrics-writer \
  --from-literal=db_user=<YOUR-DATABASE-USER> \
  --from-literal=db_password=<YOUR-DATABASE-PASSWORD> \
  --from-literal=db_name="custom-metrics-db"

On Artifact Registry, create a US multi-region Docker repo named metric-writer.

Build the Dockerfile, tag the image with the image ID and push it to the Registry you just created:

cd metrics-writer/
docker build -t metric-writer-app .
docker images
docker tag <IMAGE ID> us-docker.pkg.dev/<PROJECT_ID>/metric-writer/latest
docker push us-docker.pkg.dev/<PROJECT_ID>/metric-writer/latest

Update file k8s-resources/custom-metrics-writer.yaml replacing the following placeholders:

<GCP_PROJECT_ID>
<CLOUDSQL_CONNECTION_NAME>

Deploy the Metrics Writer:

kubectl apply -f k8s-resources/custom-metrics-writer.yaml

Deploy the Stackdriver Adapter:

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

Deploy the sample deployment hello-server along with its HPA:

kubectl apply -f k8s-resources/hello-server.yaml
kubectl apply -f k8s-resources/hpa.yaml

You can check the status of the HPA on the GCP Console or via command line:

kubectl describe hpa hello-server -n metrics-writer

The output should mention the current status of the HPA, including the metrics being read and the amount of pods currently deployed:

Reference:                                                   Deployment/hello-server
Metrics:                                                     ( current / target )
  "custom.googleapis.com|my-metric" (target average value):  20 / 20
Min replicas:                                                1
Max replicas:                                                30
Deployment pods:                                             5 current / 5 desired

You may be wondering why do we have 5 pods ? Well, that’s because we set the AverageValue on hpa.yaml to 20. The HPA calculates the amount of pods required using the following ratio: metric value / pods = AverageValue

Testing the HPA

To test our setup, we can update the metric on custom-metrics-db database and check what happens on GKE. You can try scaling up and then scaling down:

UPDATE metrics SET metric_value = 2000;
UPDATE metrics SET metric_value = 5;

Bear in mind that we have set the maxReplica to 30, so the HPA won’t create more than 30 pods, even if the metrics goes beyond this threshold.

You can check the deployment’s status:

kubectl describe hpa hello-server -n metrics-writer

Conclusion

Running applications on Google Kubernetes Engine can provide numerous benefits, including the ability to easily scale applications. By utilizing HPA, as shown in this article, we make sure that applications running on GKE can scale appropriately, adapting to changes in the demand. Moreover, it allows us to reduce resource consumption and costs, shrinking the replica count when these pods are not needed.

Overall, the use of HPA and other scalability features on GKE ensures that applications are able to successfully adapt to demand while maintaining optimal performance. This means that companies can deliver fast, reliable services to their customers without worrying about scalability issues or downtime.

Furthermore, with GKE’s ease of use and integration with other Google Cloud services, developers can focus on building and deploying applications without getting bogged down by infrastructure concerns. GKE is the ideal platform for organizations that need a powerful and reliable solution for managing their containerized applications.