EXPEDIA GROUP TECHNOLOGY — SOFTWARE

Autoscaling in Kubernetes: Options, Features, and Use Cases

Second part of a series exploring application autoscaling in Kubernetes

Sasidhar Sekar
Expedia Group Technology

--

View of an airplane cockpit, showing lots of gauges
Photo by Leonel Fernandez on Unsplash

If you haven’t read the first post on manual scaling vs autoscaling and autoscaling acceptance criteria, you can find it here. This post will cover the different autoscalers available in Kubernetes, their use cases, and their limitations.

What is the Autoscaler in Kubernetes called?

That was a trick question! There is no single autoscaler in Kubernetes. There are several of them.

Below is a list of the commonly used autoscalers in Kubernetes.

  • Horizontal Pod Autoscaler
  • Vertical Pod Autoscaler
  • Cluster Autoscaler
  • Addon Resizer

But isn’t autoscaling a single concern? You might then be wondering why there are so many different autoscalers to solve this one concern. While the concern is the same, the perspectives are different. The below figure illustrates the focus of each of these different kinds of autoscalers.

4 autoscalers: VPA & HPA for applications, CA for clusters, Addon resizer for add-ons
Autoscalers in Kubernetes

Horizontal Pod Autoscaler (HPA)/ Vertical Pod Autoscaler (VPA)

These autoscalers deal with application autoscaling in Kubernetes, i.e., if you have a microservice deployed in a Kubernetes cluster and you need the service to scale out/in based on the workload, then HPA and VPA are the options most suited for your purpose. HPA changes the number of replicas your deployment is running, VPA changes the resources requested by the replicas in a deployment.

The Horizontal Pod Autoscaler (HPA) and the Vertical Pod Autoscaler (VPA) deal with application autoscaling

To illustrate the use of HPA/VPA, let’s assume:

  • You have a Kubernetes cluster with 4 worker nodes, each with 3 CPUs available.
  • You have a service with 3 replicas, with each replica requesting 2 CPU cores

So, the current state of the cluster looks as shown below.

4 nodes each with 3 CPUs; 3 replicas one on each of 3 nodes, each replica using 2 CPUs

Assuming that the workload on this service increases over time to a level that 3 replicas can no longer handle, Kubernetes offers two ways to handle this scalability concern:

  • Increasing the number of replicas (HPA) or
  • Increasing the resource allocated per replica (VPA)

The below figures illustrate the operation of HPA, in mitigating the scalability concern.

Shows the HPA steps 1,2,3 from the text, with the resulting state (4) having 4 replicas, 1 on each node, each using 2 CPUs
HPA Operation

As illustrated in the above figure, HPA based autoscaling involves the following steps:

  1. HPA monitors the metrics across all pods of a deployment
  2. If the metrics indicate that the target threshold is breached, HPA sends a “Scale” request to the Kubernetes Master component
  3. Kubernetes’ Master then scales out (or in) replicas as per the HPA request

Horizontal Pod Autoscaler (HPA) scales (out/in) the deployment by adding /removing replicas

In contrast, the below figure illustrates the operation of VPA.

Shows the VPA steps 1,2,3 from the text, with the resulting state (4) having 3 replicas, 1 on each node, each using 3 CPUs
VPA Operation

VPA autoscaling involves the following steps:

  1. VPA monitors the metrics across all pods of a deployment.
  2. VPA comes up with a recommendation for pod resource allocation.
  3. If the current resource configuration is not in line with the recommendation, the VPA updates the runtime resource configuration as per the recommendation, to resize the pods. This involves evicting the current pod and scheduling a new pod.

Vertical Pod Autoscaler (VPA) scales (out/in) the deployment by resizing replicas with more/less resources

A summary of the key differences between the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler are shown in the below table.

* Vertical Pod Autoscaler, in its current form, has a number of limitations that make it challenging to use it as the preferred choice of autoscaler for application autoscaling.

It’s also worth reviewing this link on VPA limitations https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#known-limitations

Decorative separator

Cluster Autoscaler (CA)

Unlike the HPA/VPA, this autoscaler does not deal with application autoscaling. The Cluster Autoscaler rather deals with autoscaling the number of worker nodes in a Kubernetes cluster as the usage fluctuates.

Cluster autoscaler scales (out/in) the Kubernetes cluster by adding/removing worker nodes when required

To illustrate the use of CA, let’s assume:

  • You have a Kubernetes cluster with 3 worker nodes, each with 3 CPUs available.
  • You have a service with 3 replicas, with each replica requesting 2 CPU cores.

So, the current state of the cluster looks as shown below.

3 nodes each with 3 CPUs; 3 replicas, one on each of 3 nodes, each replica using 2 CPUs

If the workload on this service increases over time to a level that 3 replicas can no longer handle, then the HPA will attempt to scale out the deployment by adding an additional replica. But the scale-out operation would fail because there is no worker node in the cluster with 2 CPU cores available where the new replica could be scheduled.

The purpose of the Cluster Autoscaler (CA) is to handle these kinds of situations. If CA is set up on this cluster, it keeps monitoring schedule failures and when it sees these failures, it handles them by adding additional worker nodes to the cluster.

The process is illustrated in the below figure.

Shows the HPA failing to scale, CA scaling, then HPA succeeding, to add a node and replica to get to 4 replicas on 4 nodes
CA Operation

Addon Resizer

This autoscaler deals with scaling out/in some of the Kubernetes addons. The addons currently supported include heapster, metrics-server, and kube-state-metrics.

While the HPA/VPA scale based on resource usage and the CA scales based on scheduling failures, the addon resizer vertically scales addon containers based on the number of nodes in the Kubernetes cluster.

The Addon resizer updates (increases/decreases) the container resource configuration of specific singleton Kubernetes addons based on the size of the Kubernetes cluster

Key configuration parameters:

--cpu: The base CPU resource requirement
—-extra-cpu: The amount of CPU to add per node
--memory: The base memory resource requirement
--extra-memory: The amount of memory to add per node
--acceptance-offset: The dependent's resources are rewritten when they deviate from expected by a percentage that is higher than this threshold

Addon resizer operation involves the following steps.

  1. Monitor the number of nodes(n) in the cluster
  2. Calculate the recommended container resource requirements for the target addon (heapster, metrics-server, and kube-state-metrics) as follows

3. If the difference between the recommended container resource requirement and the actual resource configuration is > acceptance-offset (%), then:

  • evict the current pods
  • update their resource configuration inline with the recommendation

That was a brief overview of the different autoscalers in Kubernetes and their purposes. The next post in this series will focus solely on application autoscaling with the Horizontal Pod Autoscaler — the most mature* autoscaler available in Kubernetes for application autoscaling — including some key considerations that need to be taken into account, to ensure HPA works for your application.

References

Learn more about technology at Expedia Group

--

--