Scale and Autoscale explained — Kubernetes

Published in

Cloudnloud Tech Community

7 min readFeb 22, 2023

To understand Scale and Autoscale in better way , let us understand with the real simple and funny examples.

HPA with Funny Example :)

Assume that Chris is hosting a party at his house and he have a limited number of chairs. he want to make sure that all his guests have a seat, but he also don’t want to have too many chairs taking up space. he could use an HPA-like approach to handle this situation by adding or removing chairs as needed based on the number of guests. If more guests arrive, he can add more chairs, and if some guests leave, he can remove chairs to free up space, isn't it funny :) , This is where Horizontal POD Autoscaler comes in.

2. VPA with Funny Example :)

Imagine Chris is chef cooking a pot of soup for a large group of people. At the start of the cooking process, Chris will put all of the ingredients into a small pot because Chris don’t want to waste any ingredients or make a mess. However, as the soup cooks and the ingredients start to expand, Chris realize that the pot is too small and the soup is going to overflow.so Chris just replace the pot with big pot to adjust the pot capacity.

In Kubernetes terms, this is similar to what happens when you deploy an application that is too big for the resources (CPU and memory) allocated to it. The application starts to consume more and more resources as it runs, and if there isn’t enough capacity, it will start to experience performance issues or even crash. This is where Vertical POD Autoscaler comes in.

While these examples are humorous and not necessarily practical, I have just tried to illustrate the basic principles of HPA and VPA in a relatable way :)

Let us see the HPA and VPA with Kubernetes definition.

HPA stands for Horizontal Pod Autoscaler, which is responsible for scaling the number of pods in a deployment based on resource utilization. HPA scales the number of pod replicas up or down to match the current demand for resources. For example, if the CPU utilization of a pod exceeds a specified threshold, HPA will automatically create additional replicas to handle the increased load.

VPA stands for Vertical Pod Autoscaler, which adjusts the resource requests and limits of a container based on its historical usage. VPA automatically determines the optimal values for CPU and memory requests and limits by analyzing the usage patterns of a container. VPA helps to optimize the allocation of resources and reduce waste, thereby improving the overall efficiency of the cluster.

In summary, HPA and VPA are both essential components of Kubernetes that help to optimize the allocation of resources and scale applications based on demand.

Lets us see the actual implementation now ,

There are two ways to implement HPA & VPA in Kubernetes, Imperative and Declarative Ways.

Imperative Way — Imperative commands are those that directly manipulate the state of the system by specifying exactly what actions should be taken.

Let us see implementing by Imperative way.

NOTE : Before we start with HPA testing on Minikube we have to enable Metrics-Server Plugins to get the Metrics of utilization from Kubernetes Objects.

To enable Metrics Server

# minikube addons enable metrics-server

minikube addons enabled for metrics server

you can see by running the below command and check if Metric server is enabled.

# minikube addons list

minikube addons enabled

You can check if the metrics server is installed by checking below commands.

#kubectl top nodes

#kubectl top pods

Now create the sample deployment with NGINX and expose it.

# kubectl create deployment nginx — image=nginx — replicas=1

Create sample deployment

Expose deployment

Check the status for Deployment and Service

Now you have to create HPA autoscaler to activate the autoscale.

In this case we have configured with minimum 1 and maximum 5 replicas with cpu percentage 20% , if CPU load goes more than 20% it will start to autoscale the nginx deployment.

Autoscaler deployed

After deploying your HPA you need to wait for sometime to get the metrics collected from metrics server. if you run the below commad you will see TARGETS as <unknown>.

<unkown> target

Hold on here and seat back. to get HPA working you need to set the CPU resource limit on your deployment e.g. nginx in this case , without that HPA will not trigger autoscale functionality to get the metrics from metrics server to autoscale your deployment from Min to Max pods limits based on your pod cpu and memory load.

To update CPU & Memory limit on your deployment , please follow below instruction by editing your deployment and update in spec section of containers.

# kubectl edit deployment nginx

edit your deployment and add in spec section of container

Once you are done with update , wait for sometime and then you will see TARGET as 0%/20% where 0% is Current CPU utilization and 20% talks about Target CPU utilization.

Now here we are set to generate load on deployment with two simple ways.

To get your service endpoint URL on minikube run below command with your service name.

# minikube service php-apache

kubectl command scripted way

# kubectl run -i — tty load-generator — rm — image=busybox:1.28 — restart=Never — /bin/sh -c “while sleep 0.01; do wget -q -O- http://192.168.94.2:32198; done”

Here’s a breakdown of the different options and parameters used in the kubectl command:

kubectl run: This creates a new deployment in the Kubernetes cluster.
-i --tty: This option allocates an interactive terminal for the container.
load-generator: This is the name of the deployment.
--rm: This option tells Kubernetes to remove the container after it exits.
--image=busybox:1.28: This specifies the Docker image to use for the container.
--restart=Never: This specifies that the container should not be restarted if it exits.
/bin/sh -c: This is the command to run in the container. In this case, it's a shell command.

"while sleep 0.01; do wget -q -O- http://192.168.94.2:32198; done": This is the actual script that runs in the container. It sends HTTP requests to the specified IP address and port using the "wget" command, with a delay of 0.01 seconds between each request. in this case update your IP with your service IP what you have got from minikube service <service-name> command

2. Siege utility

Siege is a Linux utility that is used to load test web applications. It sends a high volume of HTTP requests to a web server to simulate a large number of users accessing the application at the same time.

# siege -q -c 5 -t 2m http://192.168.94.2:32198

here is breakdown of the above command.

-q: This option tells Siege to operate in quiet mode, suppressing most of its output.
-c 5: This specifies that Siege should use 5 concurrent connections to the web server.
-t 2m: This specifies that the test should run for a duration of 2 minutes.
http://192.168.94.2:31264: This is the URL of the web server to test.

Once you run the load test command it will start generating the incoming request for nginx web service.

You will see the nginx deployment Load increases to 72% and REPLICAS scaled to 4

now close the load generator script by pressing CTL+C and your deployment load will start decreasing and your REPLICAS will get to minimum state

2. Declarative Way — Instead of command we declare the list of instruction to follow with YAML or JSON file that defines the desired configuration of the Objects.

I have created below YAML to implement with declarative