Horizontal Pod Autoscaler in Kubernetes

Published in

AVM Consulting Blog

3 min readJun 6, 2020

When we talk about scaling we should always question ourselves and get the answer to the questions below.

What to Scale ?
When to Scale ?
How to Scale ?

Autoscaling is one of the key features in the Kubernetes cluster. It is a feature in which the cluster is capable of increasing the number of pods/nodes as the demand for service response increases and decreases the number of pod/nodes as the requirement decreases.

Let’s cover up the three-way of autoscaling can be done in Kubernetes.

Horizontal Pod Autoscaler — HPA

As the name implies, HPA scales the quantity of pod replicas.Most DevOps use CPU and memory because the triggers to scale additional pod replicas or less.

High-level HPA workflow

HPA WorkFlow

Prerequisites

Kubernetes Cluster (I tested with 1.10.11 via KOPS on AWS)
Resource limit set in deployment (check the default YAML file attached for reference)
Metric Server installed

How to Setup Metric Server

Assuming you have the below setup in working mode to carry out the testing, I have used the AWS environment and used t2.micro instances for testing. I have also configured a separate instance group (db-pool) for testing

Let's Play with CPU scaling now!

Let's deploy the PHP app for testing which will be scaled on the base of CPU metrics

$ kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80
service/php-apache created
deployment.apps/php-apache created

Create Horizontal Pod Autoscaler

The following command will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods controlled by the php-apache deployment.

$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled

We may check the current status of autoscaler by running:

$ kubectl get hpa
NAME       REFERENCE   TARGET    MINPODS   MAXPODS   REPLICAS   AGE
php-apache Deployment/php-apache/scale 0% / 50%  1  10   1       18s

Let's Add the Load

We will start a container, and send an infinite loop of queries to the PHP-apache service (please run it in a different terminal):

$ kubectl run -i --tty load-generator --image=busybox /bin/shHit enter for command prompt$ while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

Within a minute or so, we should see the higher CPU load by executing:

$ kubectl get hpa
NAME         REFERENCE                     TARGET      CURRENT   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache/scale   305% / 50%  305%      1         10        1          3m

Here, CPU consumption has increased to 305% of the request. As a result, the deployment was resized to 7 replicas:

$ kubectl get deployment php-apache
NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
php-apache   7         7         7            7           19m

Stop the load!

In the terminal where we created the container with busybox image, terminate the load generation by typing + C Wait for a minute or so and let's verify the status again

$ kubectl get hpa
NAME         REFERENCE                     TARGET       MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache/scale   0% / 50%     1         10        1          11m$ kubectl get deployment php-apache
NAME         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
php-apache   1         1         1            1           27m

Cool! You are done, you can scale your pods according to your workload.