Kubernetes HPA Autoscaling with External metrics — Part 1
Use GCP Stackdriver metrics with HPA to scale up/down your pods
Kubernetes makes it possible to automate many processes, including provisioning and scaling. Instead of manually allocating the resources, the autoscaling process allows you to respond quickly to peaks in demand, and reduces costs by scaling down when resources are not needed.
The autoscale is divided into:
- Node-based scaling
- Pod-based scaling
Node-based scaling
When the cluster is not able to allocate new pods or existing nodes are under pressure, new nodes are automatically added to the cluster.
This feature is supported by the Cluster Autoscaler and it’s usually provided by default by the main CSP ( Google, AWS, Azure, etc… ).
Pod-based scaling
The resources involved in the scaling process are Pods.
The pod-based scaling, in turn, is divided into:
- Vertical pod autoscaling
- Horizontal pod autoscaling
Vertical Pod Autoscaling
The Vertical Pod Autoscaling allows the user to adapt, automatically, the Pods resources ( request and limit ). In this way that values could be optimized having more efficent resources usage and scheduling inside the cluster.
It’s performed by the Vertical Pod Autoscaler and is usually only available with a CSP.
Horizontal Pod Autoscaling
The Horizonal Pod Autoscaling is a mechanism to adapt the number of Pods in order to respond to peaks in demand. Autoscaling can add or remove Pods based on specific metrics.
Metrics
What’s a metric?
A metric is a meaningful measurement taken over a period of time that communicates vital information about a process or activity, leading to fact-based decisions. Metrics are usually specialized by the subject area. In business, they are sometimes referred to as key performance indicators (KPI).
In a Kubernetes cluster the metrics are collected and made available through a component named Metrics Server. This component, installed as a Pod in the Kubernetes cluster, collects metrics from:
- Pods
- Nodes
and makes them available to be used in HPA or other different components.
Metrics are exposed via the metrics.k8s.io/v1beta1 API resource.
Metrics values can be retrieved using kubectl command or direclty from the Kubernetes API.
Below is the list of commands available to obtain the metrics.The first pair is to obtain cumulative metrics.
# Get the metrics for all nodes
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
# Get the metrics for all pods
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
The second one is to obtain specific resource
# Get the metrics for all nodes
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/NODE_NAME
# Get the metrics for all pods
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/NAMESPACE/pods/POD_NAME
How to use standard metrics with HPA
Prerequisites
- GKE Cluster
- Nginx running on Kubernetes
- Locust
Initially only one Nginx Pod is up&running and is ready to serve requests
Let’s check if Nginx is serving requests correctly
When there are no ongoing requests the CPU/RAM usage for the single Pod could be something like the belove picture
The Pod resources ( Request/Limits ) are configured as described in the belove code snippet
Let’s go stressing the pod with multiple requests using Locust. The locustfile.py is defined as follow
We can now launch the Locust process that will simulaltes 1000 single connections at time
locust --headless --users 1000 --spawn-rate 1 -H http://35.201.95.239.nip.io
We can see that CPU/RAM usage is increasing during stress test and all the requests are served in around 110ms as the following
After a while the CPU usage reaches its limit. In this case the CPU will be throttled by the underlying system, so the Pod will no longer be able to respond as fast as before.
Consequently, the average response time of requests is increasing as can be seen from the Locust statistics.
This mean that clients are blocked waiting for a response. As we can see they can wait up to 6 seconds.
How to react to peaks?
One solution to respond to traffic peaks is to increase the number of nginx Pods to distribute requests across all instances.
Obviously we can do it manually, but why!?!?!
Let’s define an HPA for the nginx deployment to do it for us!!
The HPA can scale up to 10 nginx instances when the average CPU usage among all current instances is greater than 80%.
After applying the HPA the status is as follows
Let’s stress test it again with Locust as we did before. After a while the target value will be over the 80% limit and HPA will start to scale replicas to respond to the peak.
When the target limit is again below the limit threshold, the HPA stops scaling.
And the average response time for customers is, in contrast to the previous test, much lower.
If we stop the locust test we can see that the HPA, after a while, scales down the replicas according to the target value.
This was a small demonstration of how HPA works. I don’t want to go into too much details because there are already plenty of guides on how to use standard metrics with HPA.
I want to focus the attention on how to use External Metrics from GCP Stackdriver.
External metrics
Sometimes, due to business or other requirements, CPU and RAM are not the right resources used to choose if a Pod should scale or not.
In this case we need different metrics that are not natively scraped and exposed by Kubernetes API. For example we want to scale the Pod based on the number of messages in a queue, the number of connections on the LoadBalancer, etc …
Natively Kubernetes doesn’t collect metrics from outside the cluster, but, in case of GCP Stackdriver, it’s possible with a custom metrics adapter developed by Google named Custom Metrics — Stackdriver Adapter.
First of all we should install the adapter in our Kubernetes cluster. We should modify a little bit the Deployment before deploy it, enabling the capability to collect also the Distribution Metrics and not only the Counter metrics.
Download the yaml file
wget https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
And modify it adding enable-distribution-support=true as the following
Follow the official guide to install it and wait Pod is Up&Running.
If you have configured the Workload Identity you should configure the service account as described here
Three new API versions are now available
Two are related to custom metrics and one to external.
We will focus our attention to the External one. Let’s find how to use that metrics.
First of all open the Monitoring page on the GCP Console and go to the Metrics Explorer subsection. In this page we can explore all the metrics available in the project
We have a lot of metrics grouped by service type ( here the complete list ). In our test we will use the Google Cloud HTTP/S Load Balancer → Https →Request Count
As described in the above picture, this metric collects the number of requests served by the External HTTP/S LoadBalancer.
When we select the metric the console shows us all the collected values in the selected time range.
Now let’s check if metrics are also available using the Kubernetes APIs. From the previous screenshot, or from the GCP list, we retrieve the right metric name
loadbalancing.googleapis.com|https|request_count
Names are provided with / as separator, replace the / with a | before use it!!
k get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/loadbalancing.googleapis.com|https|request_count"| jq .
As you can see the metric value is available in the Kubernetes APIs and could be used with an HPA.
Metrics have a lot of information that we can use. In particular:
- metricLabels : we can use this labels to filter the metric in case we got different results.
- value: the measurement of the metric that will be used by the HPA to scale up/down. It’s in millivalue, so you should divide it by 1000 to obtain the real value.
In this case we have: 490400m → divided by 1000 → 490,400req/s that is, approximatly, the value we can see from the Locust stress test.
How can we use it in a HPA?
Now we have metrics, let’s use them!
We will use the HPA as defined in the belove snippet
Kubernetes 1.23 will introduce the API Version autoscaling/v2. Please refer to the official documentation at https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2/
The allowed values of metrics.target.type are:
- Value: is the target value of the metric (as a quantity).
- AverageValue: is the target value of the average of the metric across all relevant pods (as a quantity)
- Utilization: is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type
To obtain more information about HPA structure and fields refer to the offical documentation at https://v1-20.docs.kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2beta2/
We are going to create an HPA that use the metric we choose before. In addition we add a selector for the label that identify our nginx backend ( in case we have different backends served by the same LoadBalancer )
resource.labels.url_map_name: "k8s2-um-4vubmsmu-default-nginx-ingress-91yjparm"
We configured the HPA to scale Pods if the average of requests is greater than 50 per seconds. Let’s apply the HPA and see what happens.
Demand is higher than the target we can handle with a single pod, so, as expected, the pods are scaled up.
Cool we have achieved our goal: using an external metric to control the scale of the up/down pod.
It is also possible to manipulate the value of the metrics in accordance with the Metrics Explorer console.
The custom adapter also has the option of defining an aggregator (called Reducer). This is done by using a specific label in the HPA which may have one of the following values:
- REDUCE_NONE
- REDUCE_MEAN
- REDUCE_MIN
- REDUCE_MAX
- REDUCE_SUM
- REDUCE_STDDEV
- REDUCE_COUNT
- REDUCE_COUNT_TRUE
- REDUCE_COUNT_FALSE
- REDUCE_FRACTION_TRUE
- REDUCE_PERCENTILE_99
- REDUCE_PERCENTILE_95
- REDUCE_PERCENTILE_50
- REDUCE_PERCENTILE_05
All example files can be found on my Github repository
If all this is not enough for your application, there are two other solutions:
- External metrics from Log-based Metrics
- Custom Metrics
Part 1: this article
Part 3: coming soon
We will go into more detail in future guides. Stay tuned!