Kubernetes: Vertical Pod Autoscaler Explained

Arie Bregman
6 min readJan 8, 2023

--

This post aims to summarize the topic of Vertical Pod Autoscaler (VPA) in a simplified way, in a form of questions and answers.

What is Vertical Pod Autoscaler (VPA) ?

VPA is a method of vertically scaling Pods. It’s used to scale up or down Pods, by increasing or decreasing the CPU and memory resources of a Pod. In comparison, HPA (which is Horizontal Pod Autoscaler) performs scaling by adding additional Pods instead of increasing the resources of existing Pod(s).

Image by Arie Bregman

What problem(s) does VPA solve?

To understand best why and when to use VPA, we should look at the problems it aims to solve. But to do that, we need to first understand requests and limits in Kubernetes (I recommend reading official documentation about the topic if you are not already familiar with it).

It’s quite common practice to set requests and limits in Kubernetes to define how much memory and CPU should be, by minimum, guaranteed for a workload. In addition, it’s also common to set the limit of max memory for a workload to consume (some also set it for CPU but it’s less common than memory). The reasons it’s a common practice to set both requests and limits, are:

  • It allows scheduling the Pod on the node that best fits it from a resources perspective, where it has the resources required for it to run properly
  • It guarantees the Pod the resources it requires in run-time, so when other Pods are scheduled, they are not assigned with resources promised to other Pods.

Over time, the resources a Pod requires can change and it forces us to monitor its usage and make adjustments constantly, or else the Pod will hit issues (like getting out of memory because it’s unable to go over its limits). In addition, setting the right requests and limits is prone to errors, where we may end up overcommitting resources on nodes when the Pods don’t require these resources.

Wouldn’t it be great if there was a solution that did that for us? well, that solution exists and it’s called VPA (Vertical Pod Autoscaler). VPA will track the usage of resources of a certain Pod and will adjust the requests and limits accordingly.

What does VPA do practically?

Depends. VPA has multiple modes and before we deep dive into these modes, you need to understand that VPA can be used in two main ways: one way of use would be to act as a “recommender” — not actually modifying any resource-related values but recommending on what these values should be. The second way is auto-applying the recommended values, so there is no need for human intervention.

Technically, there are really four modes to choose from:

  • Off: This is the mode to use if you only would like to have recommendations from VPA, without them being actually applied
  • Initial: Changes to the resource requests are done only when the Pod is created
  • Auto: VPA will automatically apply the recommended resource changes by re-creating the Pods with the new requests and limits
  • Recreate: At the moment of writing this post, this mode is identical to “Auto”. At some point, “Auto” will support “in-place update” where it’s not required to re-create the Pod and that’s when the two modes will be different.

Can I use both VPA and HPA on the same Pod?

Yes and no. I’ll explain. While it is technically possible to apply both on the same Pod(s) it’s considered a bad practice in most cases and is one of the known limitations documented officially as part of the project — the main problem is that both HPA and VPA track memory and CPU usage and act upon it.

With that being said, there are cases where for example, HPA is not tracking resource usage but other parameters (Keda is one example of HPA-based scaling that is not CPU and memory usage based but events-based) and that’s where it’s considered safe to mix both VPA and HPA.

In other cases, you are sometimes forced to choose one over the other. For example, let’s say you are limited by the number of Pods you can run on your node(s) and can’t add more Pods. In that case, your only scaling option would be obviously, VPA.

When should I consider HPA over VPA?

A simple thumb of rule that can be used, is to ask yourself “do I need to deal with sudden changes or over-time changes?” If the answer is sudden increases/decreases, VPA isn’t the best fit. You don’t want VPA to adjust memory and CPU requests and limits based on every sudden change as every change requires recreating the Pod and recreating the Pod many times in a short period is basically inviting instability.

HPA is a much better choice for sudden spikes in resource usage. If you want to benefit from both VPA and HPA then you might want to start perhaps with running VPA to set the requests and limits to the recommended values and then switch to HPA to handle any usage spikes. This means at some point you might need to adjust the requests and limits again, but it gives you a good starting point, that is data-based and less prone to human errors.

How to apply VPA on an existing workload?

I’m going to cover extensively the practical part of VPA in a separate post (how to install, configure, monitor, etc.), but for now, I’ll include one snippet to show you quickly how to use it

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: some-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: some-app
updatePolicy:
updateMode: "Auto"

The snippet above demonstrates how we apply VPA in auto mode (remember? the mode that applies the VPA recommendations) on a Deployment resource called some-app

You probably know what to do next, apply it with kubectl apply -f <VPA_resource_definition_path> and check it’s actually created with kubectl get vpa some-app-vpa

Should I apply VPA on every Pod running in the cluster?

Absolutely not. It’s recommended you avoid applying VPA on:

  • Workloads with sudden spikes in resource usage. VPA is designed to work best with consistent resource usage workloads. As mentioned previously, for sudden spikes in resource usage, you may want to consider using HPA.
  • Stateful applications — VPA isn’t well suited for stateful applications, for the simple reason it usually needs to re-create or evacuate Pods when making adjustments to their resources. If your application operates in a non-HA manner, that would mean some downtime for the application and eventually the client/customer using that application

How to verify VPA does what it is supposed to do?

Except for the obvious way of taking a look at the values of the requests and limits, you can monitor for EvictedByVPA events, these events indicate that a Pod was restarted or moved to another node, to apply new recommended requests and/or limits.

EvictedByVPA: Pod was evicted by VPA Updater to apply resource recommendation. 1 Killing: Stopping container some-container

Do not confuse these events with Kubernetes Pod eviction events, which usually means the Pod is moved/recreated on another node. EvictedByVPA means the Pod is going to be re-created with new values for requests and limits and it may be on the very same node or a different one, depending on where it best fits.

What are some limitations of VPA?

Every technology has limitations and VPA is not an exception in this case. The good news is most of them are not very critical and the project owners did a really good job in documenting them here. Some we mentioned previously like applying both HPA and VPA.

Worth to mention there are some limitations that are related to specific Kubenrete flavors. For example, GKE (The Kubernetes engine of Google Cloud Platform) allows to apply VPA on up to 1000 Pods, but not more than that.

There are also what I like to call “soft limitations” which means they are not actually limitations but I still feel I have limited VPA experience due to them. One example is the non-human-readable values VPA is using. In this case, one example is worth 1000 words:

Cpu:     50e-3
Memory: 600e6

The output above is from kubectl describe vpa command. If anyone knows why it was chosen to use these units or even better, knows how to modify them to human-readable units, please share :)

Wrapping Up

I hope you enjoyed this article about VPA. If you are interested in any other specific aspect of VPA or just have general feedback for me regarding my writing, let me know in the comments below :)

--

--