Horizontal Pod Autoscaler in Kubernetes

Kubernetes Advocate
AVM Consulting Blog
3 min readJun 6, 2020

When we talk about scaling we should always question ourselves and get the answer to the questions below.

What to Scale ?
When to Scale ?
How to Scale ?

Autoscaling is one of the key features in the Kubernetes cluster. It is a feature in which the cluster is capable of increasing the number of pods/nodes as the demand for service response increases and decreases the number of pod/nodes as the requirement decreases.

Let’s cover up the three-way of autoscaling can be done in Kubernetes.

Horizontal Pod Autoscaler — HPA

As the name implies, HPA scales the quantity of pod replicas.Most DevOps use CPU and memory because the triggers to scale additional pod replicas or less.

High-level HPA workflow

HPA WorkFlow

Prerequisites

  • Kubernetes Cluster (I tested with 1.10.11 via KOPS on AWS)
  • Resource limit set in deployment (check the default YAML file attached for reference)
  • Metric Server installed

How to Setup Metric Server

Assuming you have the below setup in working mode to carry out the testing, I have used the AWS environment and used t2.micro instances for testing. I have also configured a separate instance group (db-pool) for testing

Let's Play with CPU scaling now!

  • Let's deploy the PHP app for testing which will be scaled on the base of CPU metrics
$ kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80
service/php-apache created
deployment.apps/php-apache created

Create Horizontal Pod Autoscaler

The following command will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods controlled by the php-apache deployment.

$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled

We may check the current status of autoscaler by running:

$ kubectl get hpa
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s

Let's Add the Load

We will start a container, and send an infinite loop of queries to the PHP-apache service (please run it in a different terminal):

$ kubectl run -i --tty load-generator --image=busybox /bin/shHit enter for command prompt$ while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

Within a minute or so, we should see the higher CPU load by executing:

$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 305% / 50% 305% 1 10 1 3m

Here, CPU consumption has increased to 305% of the request. As a result, the deployment was resized to 7 replicas:

$ kubectl get deployment php-apache
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
php-apache 7 7 7 7 19m

Stop the load!

In the terminal where we created the container with busybox image, terminate the load generation by typing + C Wait for a minute or so and let's verify the status again

$ kubectl get hpa
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m
$ kubectl get deployment php-apache
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
php-apache 1 1 1 1 27m

Cool! You are done, you can scale your pods according to your workload.

👋 Join us today !!

️Follow us on LinkedIn, Twitter, Facebook, and Instagram

If this post was helpful, please click the clap 👏 button below a few times to show your support! ⬇

--

--

Kubernetes Advocate
AVM Consulting Blog

Vineet Sharma-Founder and CEO of Kubernetes Advocate Tech author, cloud-native architect, and startup advisor.https://in.linkedin.com/in/vineet-sharma-0164