What is Autoscaling in Kubernetes?

Eric Muccino
Mindboard
Published in
3 min readJan 25, 2023

Autoscaling is a useful feature in Kubernetes that allows you to automatically adjust the number and resource consumption of pods in your deployment to meet the changing needs of your application. There are two types of autoscaling in Kubernetes: Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA).

Horizontal Pod Autoscaling

HPA allows you to automatically scale the number of pods in a replication controller, deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or custom metrics. It works by periodically evaluating the current resource utilization of the pods in the target resource and comparing it to the target utilization specified by the user. If the current utilization is lower than the target, HPA will increase the number of pods in the target resource. If the current utilization is higher than the target, HPA will decrease the number of pods in the target resource.

To use HPA, you’ll need to enable the metrics-server in your cluster. This component collects resource usage data from the Kubernetes API server and makes it available to the HPA controller. Once the metrics-server is up and running, you can create an HPA resource in your cluster by specifying the target resource utilization and the deployment that you want to scale. The HPA controller will then use the resource usage data collected by the metrics-server to determine whether to scale up or down the number of replicas in the deployment.

HPA is a useful tool for automatically scaling the number of pods in your deployment to meet the changing needs of your application. It allows you to ensure that your application has the resources it needs to handle increased load without having to manually adjust the number of replicas in your deployment.

Vertical Pod Autoscaling

VPA enables the CPU and memory requests and limits of pods to be scaled based on resource utilization. It works by constantly monitoring the resource usage of your pods and comparing it to the resource limits and requests that you have configured. If the actual resource usage exceeds the limits or requests, VPA will automatically increase them to allow the pod to consume more resources. On the other hand, if the actual resource usage falls below the limits or requests, VPA will decrease them to reduce unnecessary resource consumption.

To use VPA, you’ll also need to enable the metrics-server in your cluster. Once the metrics-server is up and running, you can create a VPA resource in your cluster by specifying the resource limits and requests that you want to adjust and the deployment that you want to scale. The VPA controller will then use the resource usage data collected by the metrics-server to determine whether to increase or decrease the resource limits and requests of the pods in the deployment.

VPA can be used to ensure that your pods have the resources they need to handle increased load without having to manually adjust the resource limits and requests of your pods. It’s especially useful in environments where the workload on your application is highly variable and it’s difficult to predict how much resource each pod will need at any given time. VPA can also help you optimize the resource usage of your pods by reducing the resources consumed when the load on your application is low, which can help you save money on your cloud infrastructure costs and make better use of your resources.

--

--