Kubernetes HPA Autoscaling with External metrics — Part 1

Use GCP Stackdriver metrics with HPA to scale up/down your pods

Matteo Candido
9 min readMar 5, 2022

Kubernetes makes it possible to automate many processes, including provisioning and scaling. Instead of manually allocating the resources, the autoscaling process allows you to respond quickly to peaks in demand, and reduces costs by scaling down when resources are not needed.

The autoscale is divided into:

  1. Node-based scaling
  2. Pod-based scaling

Node-based scaling

When the cluster is not able to allocate new pods or existing nodes are under pressure, new nodes are automatically added to the cluster.

This feature is supported by the Cluster Autoscaler and it’s usually provided by default by the main CSP ( Google, AWS, Azure, etc… ).

Pod-based scaling

The resources involved in the scaling process are Pods.

The pod-based scaling, in turn, is divided into:

  1. Vertical pod autoscaling
  2. Horizontal pod autoscaling

Vertical Pod Autoscaling

The Vertical Pod Autoscaling allows the user to adapt, automatically, the Pods resources ( request and limit ). In this way that values could be optimized having more efficent resources usage and scheduling inside the cluster.

It’s performed by the Vertical Pod Autoscaler and is usually only available with a CSP.

Horizontal Pod Autoscaling

The Horizonal Pod Autoscaling is a mechanism to adapt the number of Pods in order to respond to peaks in demand. Autoscaling can add or remove Pods based on specific metrics.

Metrics

Photo by Luke Chesser on Unsplash

What’s a metric?

A metric is a meaningful measurement taken over a period of time that communicates vital information about a process or activity, leading to fact-based decisions. Metrics are usually specialized by the subject area. In business, they are sometimes referred to as key performance indicators (KPI).

In a Kubernetes cluster the metrics are collected and made available through a component named Metrics Server. This component, installed as a Pod in the Kubernetes cluster, collects metrics from:

  1. Pods
  2. Nodes

and makes them available to be used in HPA or other different components.

Metrics are exposed via the metrics.k8s.io/v1beta1 API resource.

Metrics API Resources

Metrics values can be retrieved using kubectl command or direclty from the Kubernetes API.

Metrics via kubectl
Metrics via API ( through Kubectl )

Below is the list of commands available to obtain the metrics.The first pair is to obtain cumulative metrics.

# Get the metrics for all nodes
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes

# Get the metrics for all pods
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods

The second one is to obtain specific resource

# Get the metrics for all nodes
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/NODE_NAME

# Get the metrics for all pods
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/NAMESPACE/pods/POD_NAME
Metrics for a specific node by its name

How to use standard metrics with HPA

Prerequisites

Initially only one Nginx Pod is up&running and is ready to serve requests

Nginx resources

Let’s check if Nginx is serving requests correctly

Nginx output

When there are no ongoing requests the CPU/RAM usage for the single Pod could be something like the belove picture

Nginx pod resources usage

The Pod resources ( Request/Limits ) are configured as described in the belove code snippet

Nginx resources definition

Let’s go stressing the pod with multiple requests using Locust. The locustfile.py is defined as follow

Locust test definition

We can now launch the Locust process that will simulaltes 1000 single connections at time

locust --headless --users 1000 --spawn-rate 1 -H http://35.201.95.239.nip.io

We can see that CPU/RAM usage is increasing during stress test and all the requests are served in around 110ms as the following

Locust stats

After a while the CPU usage reaches its limit. In this case the CPU will be throttled by the underlying system, so the Pod will no longer be able to respond as fast as before.

Nginx CPU/RAM usage

Consequently, the average response time of requests is increasing as can be seen from the Locust statistics.

Locust stats

This mean that clients are blocked waiting for a response. As we can see they can wait up to 6 seconds.

How to react to peaks?

One solution to respond to traffic peaks is to increase the number of nginx Pods to distribute requests across all instances.

Obviously we can do it manually, but why!?!?!

Let’s define an HPA for the nginx deployment to do it for us!!

HPA definition

The HPA can scale up to 10 nginx instances when the average CPU usage among all current instances is greater than 80%.

After applying the HPA the status is as follows

HPA Stats before test

Let’s stress test it again with Locust as we did before. After a while the target value will be over the 80% limit and HPA will start to scale replicas to respond to the peak.

HPA Stats after test

When the target limit is again below the limit threshold, the HPA stops scaling.

HPA Stats when target is under control

And the average response time for customers is, in contrast to the previous test, much lower.

Locust Stats

If we stop the locust test we can see that the HPA, after a while, scales down the replicas according to the target value.

HPA Stats with no test in progress

This was a small demonstration of how HPA works. I don’t want to go into too much details because there are already plenty of guides on how to use standard metrics with HPA.

I want to focus the attention on how to use External Metrics from GCP Stackdriver.

External metrics

Sometimes, due to business or other requirements, CPU and RAM are not the right resources used to choose if a Pod should scale or not.

In this case we need different metrics that are not natively scraped and exposed by Kubernetes API. For example we want to scale the Pod based on the number of messages in a queue, the number of connections on the LoadBalancer, etc …

Natively Kubernetes doesn’t collect metrics from outside the cluster, but, in case of GCP Stackdriver, it’s possible with a custom metrics adapter developed by Google named Custom Metrics — Stackdriver Adapter.

First of all we should install the adapter in our Kubernetes cluster. We should modify a little bit the Deployment before deploy it, enabling the capability to collect also the Distribution Metrics and not only the Counter metrics.

Download the yaml file

wget https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

And modify it adding enable-distribution-support=true as the following

Follow the official guide to install it and wait Pod is Up&Running.

Custom metrics pod

If you have configured the Workload Identity you should configure the service account as described here

Three new API versions are now available

Metrics API Versions

Two are related to custom metrics and one to external.

We will focus our attention to the External one. Let’s find how to use that metrics.

First of all open the Monitoring page on the GCP Console and go to the Metrics Explorer subsection. In this page we can explore all the metrics available in the project

Metrics Explorer page

We have a lot of metrics grouped by service type ( here the complete list ). In our test we will use the Google Cloud HTTP/S Load Balancer → Https →Request Count

LoadBalancer metrics description

As described in the above picture, this metric collects the number of requests served by the External HTTP/S LoadBalancer.

When we select the metric the console shows us all the collected values in the selected time range.

LoadBalancer HTTP/S metrics in the past 6 hours

Now let’s check if metrics are also available using the Kubernetes APIs. From the previous screenshot, or from the GCP list, we retrieve the right metric name

loadbalancing.googleapis.com|https|request_count

Names are provided with / as separator, replace the / with a | before use it!!

k get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/loadbalancing.googleapis.com|https|request_count"| jq .
LoadBalancer metrics from Kubernetes

As you can see the metric value is available in the Kubernetes APIs and could be used with an HPA.

Metrics have a lot of information that we can use. In particular:

  • metricLabels : we can use this labels to filter the metric in case we got different results.
  • value: the measurement of the metric that will be used by the HPA to scale up/down. It’s in millivalue, so you should divide it by 1000 to obtain the real value.

In this case we have: 490400m → divided by 1000 → 490,400req/s that is, approximatly, the value we can see from the Locust stress test.

Locus Stress test

How can we use it in a HPA?

Now we have metrics, let’s use them!

We will use the HPA as defined in the belove snippet

Kubernetes 1.23 will introduce the API Version autoscaling/v2. Please refer to the official documentation at https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2/

The allowed values of metrics.target.type are:

  • Value: is the target value of the metric (as a quantity).
  • AverageValue: is the target value of the average of the metric across all relevant pods (as a quantity)
  • Utilization: is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type

To obtain more information about HPA structure and fields refer to the offical documentation at https://v1-20.docs.kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2beta2/

We are going to create an HPA that use the metric we choose before. In addition we add a selector for the label that identify our nginx backend ( in case we have different backends served by the same LoadBalancer )

resource.labels.url_map_name: "k8s2-um-4vubmsmu-default-nginx-ingress-91yjparm"

We configured the HPA to scale Pods if the average of requests is greater than 50 per seconds. Let’s apply the HPA and see what happens.

External Metric HPA before

Demand is higher than the target we can handle with a single pod, so, as expected, the pods are scaled up.

External Metric HPA after

Cool we have achieved our goal: using an external metric to control the scale of the up/down pod.

Photo by Immo Wegmann on Unsplash

It is also possible to manipulate the value of the metrics in accordance with the Metrics Explorer console.

Metrics explorer aggregator

The custom adapter also has the option of defining an aggregator (called Reducer). This is done by using a specific label in the HPA which may have one of the following values:

  • REDUCE_NONE
  • REDUCE_MEAN
  • REDUCE_MIN
  • REDUCE_MAX
  • REDUCE_SUM
  • REDUCE_STDDEV
  • REDUCE_COUNT
  • REDUCE_COUNT_TRUE
  • REDUCE_COUNT_FALSE
  • REDUCE_FRACTION_TRUE
  • REDUCE_PERCENTILE_99
  • REDUCE_PERCENTILE_95
  • REDUCE_PERCENTILE_50
  • REDUCE_PERCENTILE_05

All example files can be found on my Github repository

If all this is not enough for your application, there are two other solutions:

Part 1: this article

Part 2: https://medium.com/@matteo.candido/kubernetes-hpa-autoscaling-with-external-metrics-part-2-dffac36a1f4e

Part 3: coming soon

We will go into more detail in future guides. Stay tuned!

Photo by Huma Kabakci on Unsplash

--

--

Matteo Candido

Cloud Architect at NTT DATA Italia, passionate about technology, Cloud and everything that revolves around IT.