Autoscaling a Project in Kubernetes

Abdulrhman Almahaini
Insider Engineering
9 min readDec 11, 2023
HPA Autoscaling in Kubernetes

Hello again, my dear friends. In the previous blog post, I talked about how to create and deploy a Laravel project using Kubernetes (K8s), and I explained what K8s is and how we can use it simply and easily. Today, I want to share another interesting topic with you. So, let's discover how we can autoscale our project today. I will continue from the point I stopped on in my previous blog post, so if you have not read that article, maybe you will not be able to have the same configuration I will show here. But if you have a general knowledge of K8s and you already have a deployment for a Laravel application in K8s, you can continue without any issues. I hope you will enjoy this article.

What is Autoscaling?

Autoscaling is a strategy that we use to keep our application alive and prevent it from remaining down; for example, sometimes, our web application gets too many requests, which can exhaust the server resources such as memory or CPU. This, in turn, may cause this application to crash or become practically unresponsive.

In K8s, as I explained in my previous blog post, we use Pods to handle the containers that host our application. K8s gives us the ability to manage those Pods and autoscale them easily.

There are two main strategies for autoscaling in K8s:

  1. Horizontal scaling: K8s will increase or decrease the number of Pods that serve our application and distribute the requests to the new Pods.
  2. Vertical scaling: K8s will automatically adjust the Memory and CPU allocated to the Pods that serve our application, and in our case, more resources will be added to those Pods to help them cope with the increasing load.

Note: In this tutorial, I will talk about Horizontal autoscaling, how to achieve it, and how to write its rules.

Horizontal Autoscaling

As I explained previously, horizontal autoscaling is a set of rules that K8s is watching. Whenever any rule is satisfied, K8s will increase the count of Pods allocated to our application. When this rule is not satisfied anymore (or when the metrics we watch come back to normal), K8s will automatically downscale and remove the extra Pods it added, restoring the deployment to its normal state.

In my previous blog post, the deploment.yml file we added was like this:

apiVersion: apps/v1
kind: Deployment
metadata:
name: learn-dep-k8s-deployment
namespace: default
labels:
app: learn-dep-k8s
spec:
replicas: 2
selector:
matchLabels:
app: learn-dep-k8s
template:
metadata:
labels:
app: learn-dep-k8s
spec:
containers:
- name: learn-dep-k8s
image: abdmahaini/learn-dep-k8s
ports:
- containerPort: 80
env:
- name: APP_KEY
value: base64:cUPmwHx4LXa4Z25HhzFiWCf7TlQmSqnt98pnuiHmzgY=
---
apiVersion: v1
kind: Service
metadata:
name: learn-dep-k8s-service
namespace: default
spec:
ports:
- name: http
targetPort: 80
port: 80
selector:
app: learn-dep-k8s

For the sake of autoscaling, I will make some changes to this deployment.yml file. I will add and modify some sections.

The deployment.yml file will be configured like the following:

apiVersion: apps/v1
kind: Deployment
metadata:
name: learn-dep-k8s-deploymentmi
namespace: default
labels:
app: learn-dep-k8s
spec:
replicas: 1
selector:
matchLabels:
app: learn-dep-k8s
template:
metadata:
labels:
app: learn-dep-k8s
spec:
containers:
- name: learn-dep-k8s
image: abdmahaini/learn-dep-k8s
ports:
- containerPort: 80
resources:
requests:
cpu: 500m
memory: 400Mi
limits:
cpu: 600m
memory: 800Mi
env:
- name: APP_KEY
value: base64:cUPmwHx4LXa4Z25HhzFiWCf7TlQmSqnt98pnuiHmzgY=

---

apiVersion: v1
kind: Service
metadata:
name: learn-dep-k8s-service
namespace: default
spec:
ports:
- name: http
targetPort: 80
port: 80
selector:
app: learn-dep-k8s

As you noticed, I decreased the number of replicas to 1 for the sake of showing the autoscaling in action, but this number can be whatever you want since it represents how many Pods you need or you want for your application by default.

The next thing I did in that file was adding the resources rule under the spec section. This rule tells K8s how many resources we expect our app to take (requests) and what resource limit we want for each Pod (limits). In this case, K8s will locate the requested resources for each Pod and make sure that it will not cross the limit we defined.

    spec:
containers:
- name: learn-dep-k8s
image: abdmahaini/learn-dep-k8s
ports:
- containerPort: 80
resources:
requests:
cpu: 500m
memory: 400Mi
limits:
cpu: 600m
memory: 800Mi
env:
- name: APP_KEY
value: base64:cUPmwHx4LXa4Z25HhzFiWCf7TlQmSqnt98pnuiHmzgY=

As you would notice in the example above, the application will request 500m, which means 500 milliCPU, which is half of a core for each Pod as CPU. For memory, it will request 400Mi, which is 400 Mebibytes (MiB) and it is so similar to 400 Megabytes (MB) with only slight difference in memory size (with the conversion factor of 1 MB ≈ 0.9537 MiB).

The next thing we need is to define the Horizontal autoscaling rules; in K8s, we can achieve that by adding a file called hpa.yml and attaching it to our deployment so that whenever our deployment is executed or built, those rules will be applied to it.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: learn-dep-k8s-deployment-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: learn-dep-k8s-deployment
maxReplicas: 10
minReplicas: 1
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50

Here and as you can notice, there are many parts I need to explain in detail:

1- To attach a deployment to an HPA (Horizontal Pod Autoscaling) rule, we need to define scaleTargetRef using this property; we will reference our deployment name so that when this HPA file is applied, it will be attached correctly.

2- maxReplicas and minReplicas: those numbers will identify how many replicas K8s should create as maximum and minimum when the autoscaling rule is triggered.

3- The metrics property has the autoscale rule we’ve defined. In this example, we are telling K8s to autoscale if CPU usage exceeds 50%, but this number can be different depending on your needs.

How do we test our HPA (Horizontal Pod Autoscaling) rule?

I will use an HTTP load test tool called Siege to implement this. You can have this tool for Mac OS by checking this link, or if you are using Linux, you can check this link.

Why to use Siege?

In our tutorial, to simulate a real-world scenario, I had to create fake requests using an HTTP load test tool. I have used Siege since I first encountered it, but feel free to use any tool you are familiar with for this purpose (for example: Apache JMeter, Gatling, ..etc). This tool will send a large number of requests to our application, and therefore, it's going to exhaust the resources we defined in the deployment.yml file.

How to apply the HPA file

After defining our rules and the deployment to which those rules will be applied, we will need to apply the YAML file; we can achieve that by executing this command:

kubectl apply -f hpa.yml

After executing this command, you can check if the HPA was created as expected by executing this command:

kubectl get hpa learn-dep-k8s-deployment-hpa

You should see this on your screen.

How to get the Pod Metrics

To determine the resources our Pod is consuming, we need to add an add-on to Minikube (which is a lightweight Kubernetes implementation that creates a VM on your local machine and deploys a simple cluster containing only one node). This add-on is called metrics-server. Using this add-on, kubectl will be able to fetch the metrics needed for the deployed Pods. To enable this add-on, you need to execute this command.

minikube addons enable metrics-server

After executing the command, we will apply the add-on to our deployment, and to do this, we need to execute this command:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

The previous command will apply the metrics-server to our running deployment, and it will redeploy it again. To show the metrics, we need to execute this:

kubectl top pods

This command will show how many resources are exhausted by your Pod, and it will ease our mission to visualize the Siege and HPA works.

How to test the autoscaling

After setting and configuring all the required services, we need first to run the Siege, which will send many requests to our application:

siege http://laravel.k8s.test/

At that same time, we will monitor the consumption of our Pods usage.

kubectl top pods

After a while, you will realize that the CPU consumption has increased, and it will satisfy the rule we defined in the hpa.yml file.

The surprising increase in the CPU usage

If you executed the command below:

kubectl get hpa learn-dep-k8s-deployment-hpa

You will realize the percentage of usage at that moment.

The surprising increase in the CPU usage

As you can notice, the usage is 119%, and our rule says that we should autoscale if half of the CPU is used (50%), so if you checked your Pods count using:

kubectl get pods

You will realize that new Pods have been created automatically to respond to the surprising increase in CPU usage.

The newly created Pods

In my case, and since the increase was too much, five new Pods have been created.
After stopping Siege (the load testing tool I mentioned above), the usage should return to normal. In that case, K8s will terminate the newly created Pods and keep only the minimum amount of replicas that are defined in the hpa.yml file (this is the downscaling phase).

The usage returned back to normal.
Terminating the newly created Pods (downscaling)
Only the minimum amount of Pods remained

Autoscaling by memory usage

The previous example and test explained how to make autoscaling by checking the CPU usage. If you want to make that autoscaling by memory usage, you need to change the hpa.yml file, and it will be similar to this:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: learn-dep-k8s-deployment-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: learn-dep-k8s-deployment
maxReplicas: 10
minReplicas: 1
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 60

Note: The CPU usage and Memory usage rules can be written in the same hpa.yml file.

For testing this, you simply need to repeat the same steps we did for the CPU usage, and you will realize that K8s will upscale your deployment when the rule is satisfied, and after that, it will downscale it.

Conclusion

Today, on this new chapter of K8s and Laravel, we learned and tested together “Horizontal Autoscaling” using the best practice to do that (through thehpa.yml file not increasing the replicas in the deployment.yml file).

In the next chapter, we will check together how to Monitor a Laravel application using Prometheus and use that tool for making autoscaling that depends on the network usage, like request latency.

If you liked this article, you can also read the first article I wrote about the K8s and Laravel topic, or you can read this one if you are interested in CI/CD in general.

See you all in the next chapter, and till then, Happy coding :D.

--

--

Abdulrhman Almahaini
Insider Engineering

software developer in Insider, with 7+ experience in backend development.