Horizontal Pod Autoscaler in Kubernetes
When we talk about scaling we should always question ourselves and get the answer to the questions below.
What to Scale ?
When to Scale ?
How to Scale ?
Autoscaling is one of the key features in the Kubernetes cluster. It is a feature in which the cluster is capable of increasing the number of pods/nodes as the demand for service response increases and decreases the number of pod/nodes as the requirement decreases.
Let’s cover up the three-way of autoscaling can be done in Kubernetes.
Horizontal Pod Autoscaler — HPA
As the name implies, HPA scales the quantity of pod replicas.Most DevOps use CPU and memory because the triggers to scale additional pod replicas or less.
High-level HPA workflow
Prerequisites
- Kubernetes Cluster (I tested with 1.10.11 via KOPS on AWS)
- Resource limit set in deployment (check the default YAML file attached for reference)
- Metric Server installed
Assuming you have the below setup in working mode to carry out the testing, I have used the AWS environment and used t2.micro instances for testing. I have also configured a separate instance group (db-pool) for testing
Let's Play with CPU scaling now!
- Let's deploy the PHP app for testing which will be scaled on the base of CPU metrics
$ kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80
service/php-apache created
deployment.apps/php-apache created
Create Horizontal Pod Autoscaler
The following command will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods controlled by the php-apache deployment.
$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled
We may check the current status of autoscaler by running:
$ kubectl get hpa
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s
Let's Add the Load
We will start a container, and send an infinite loop of queries to the PHP-apache service (please run it in a different terminal):
$ kubectl run -i --tty load-generator --image=busybox /bin/shHit enter for command prompt$ while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done
Within a minute or so, we should see the higher CPU load by executing:
$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 305% / 50% 305% 1 10 1 3m
Here, CPU consumption has increased to 305% of the request. As a result, the deployment was resized to 7 replicas:
$ kubectl get deployment php-apache
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
php-apache 7 7 7 7 19m
Stop the load!
In the terminal where we created the container with busybox image, terminate the load generation by typing + C Wait for a minute or so and let's verify the status again
$ kubectl get hpa
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m$ kubectl get deployment php-apache
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
php-apache 1 1 1 1 27m
Cool! You are done, you can scale your pods according to your workload.
👋 Join us today !!
If this post was helpful, please click the clap 👏 button below a few times to show your support! ⬇