Horizontal auto-scaling in Kubernetes

A Beginner’s Guide

4 min readApr 1, 2023

Are you struggling to handle traffic spikes in your Kubernetes applications? Do you want to minimize resource waste during periods of low usage? If so, you should consider autoscaling your Kubernetes applications using the Horizontal Pod Autoscaler (HPA).

The HPA is a Kubernetes resource that automatically scales the number of replicas in a deployment based on CPU utilization or custom metrics. In this article, we’ll walk through how to use the HPA to autoscale your Kubernetes applications and ensure they can handle increases in traffic without manual intervention.

Prerequisites

To follow along with this tutorial, you should have a basic understanding of Kubernetes deployments, services, and pods. You should also have access to a Kubernetes cluster with the kubectl command-line tool installed.

Create a Deployment

First, let’s create a sample deployment for our autoscaling demo. We will create a simple NGINX deployment with one replica:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

We’ll save this YAML file as nginx-deployment.yaml and create the deployment with the following command:

kubectl apply -f nginx-deployment.yaml

Verify that the deployment was created successfully by running:

kubectl get deployments

Create a Service

Next, let’s expose our NGINX deployment as a service so we can access it from outside the cluster. We will create a ClusterIP service with port 80:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80

We’ll save this YAML file as nginx-service.yaml and create the service with the following command:

kubectl apply -f nginx-service.yaml

Verify that the service was created successfully by running:

kubectl get services

Enable Metrics Server

In order for the HPA to work, we need to have metrics available for Kubernetes to use. The Metrics Server is a Kubernetes addon that provides resource utilization metrics from each node and pod in the cluster.

To enable the Metrics Server, run the following command:

kubectl apply -f https://bitbucket.org/lokali/kubernetesfiles/src/master/Scaling/components.yaml

You can verify that the Metrics Server is running by running the following command:

kubectl get pods -n kube-system

Create an HPA

Now that our deployment and service are up and running and we have the Metrics Server installed, we can create an HPA to automatically scale our deployment based on CPU utilization.

Let’s create an HPA that scales our NGINX deployment between 1 and 10 replicas based on CPU utilization:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

We’ll save this YAML file as nginx-hpa.yaml and create the HPA with the following command:

kubectl apply -f nginx-hpa.yaml

You can verify that the HPA was created successfully by running:

kubectl get hpa

Test Autoscaling

Now that our HPA is up and running, let’s test it out to see if it scales our deployment up and down based on CPU utilization.

You can use the following command to generate CPU load on your NGINX deployment:

kubectl run -it --rm load-generator --image=busybox /bin/sh
while true; do wget -q -O- http://nginx-service; done

This will create a temporary pod that generates load by continuously accessing our NGINX service.

After a few minutes, you should see the HPA scale up our deployment to handle the increased load. You can monitor the number of replicas by running:

kubectl get deployment nginx-deployment

Once the load generator is stopped, the HPA should scale down the deployment back to its original state after a few minutes.

In this tutorial, we’ve covered how to autoscale your Kubernetes applications using the Horizontal Pod Autoscaler (HPA). By creating an HPA for your deployments, you can ensure that your applications are able to handle traffic spikes without manual intervention and minimize resource waste during periods of low usage.

To summarize, here are the steps we covered:

Create a deployment for your application

Expose your deployment as a service

Enable the Metrics Server in your cluster

Create an HPA to scale your deployment based on CPU utilization or custom metrics

Test your HPA by generating load on your application

By following these steps, you’ll be well on your way to autoscaling your Kubernetes applications and improving their resiliency and scalability.

Until next time, happy Kubernetes-ing! :😊