Setting up a Horizontal Pod Autoscaler (HPA) on a Kubernetes cluster

Merrygold Odey
HostSpace Cloud Solutions
4 min readApr 20, 2023
HPA Architecture

Introduction

The Horizontal Pod Autoscaler changes the shape of your Kubernetes workload by automatically increasing or decreasing the number of Pods in response to the workload’s CPU or memory consumption, or in response to custom metrics reported from within Kubernetes or external metrics from sources outside of your cluster.

Installation

Prerequisites

Steps

Step 1: Create a Deployment and Service

test-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deployment
spec:
replicas: 3
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
spec:
containers:
- name: test-container
image: gcr.io/cloudrun/hello
resources:
limits:
cpu: "0.1"
memory: "1Gi"
requests:
cpu: "0.05"
memory: "512Mi"
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: test-service
spec:
selector:
app: test-deployment
ports:
- name: http
port: 80
targetPort: 8080
type: LoadBalancer

This deployment and service manifest deploys a containerized application named test-container as a Kubernetes deployment and exposes it as a service accessible via a load balancer.

The deployment has the following attributes:

  • replicas: 3: specifies that the deployment should create 3 replicas of the test-container pod
  • selector: specifies the labels that are used to identify which pods should be managed by this deployment
  • template: specifies the pod template for the pods that should be created by this deployment
  • containers: specifies the container(s) to run in the pod
  • image: specifies the Docker image to use for the container
  • resources: specifies the resource limits and requests for the container, which are used to allocate CPU and memory resources to the container. In this case, the container is limited to using 0.1 CPU and 1Gi of memory, with a request for at least 0.05 CPU and 512Mi of memory.
  • ports: specifies the ports that should be exposed by the container, in this case, port 8080.

The service has the following attributes:

  • selector: specifies the labels that are used to identify which pods should be exposed by this service
  • ports: specifies the port mapping for the service, in this case mapping port 80 to port 8080 in the container
  • type: LoadBalancer: specifies that the service should be exposed externally as a load balancer, which provides a single IP address (or DNS if using AWS) and distributes incoming traffic to the individual pods.

Apply the configuration

kubectl apply -f test-deployment.yaml

Step 2: Create the HPA

test-hpa.yaml

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: test-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-deployment
minReplicas: 1
maxReplicas: 7
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 30

This HPA is used to automatically scale the number of replicas of a deployment based on the current CPU utilization of its pods. Here’s a breakdown of the different sections:

  • apiVersion and kind specify the Kubernetes API version and the kind of object being defined, respectively. In this case, it’s an HPA object using the autoscaling/v2beta2 API version.
  • metadata includes the name of the HPA object.
  • spec is the specification section of the HPA object, which includes:
  • scaleTargetRef identifies the deployment that the HPA should scale. In this case, it is a deployment named test-deployment.
  • minReplicas and maxReplicas define the minimum and maximum number of replicas that the HPA should create. In this case, it’s between 1 and 7 replicas.
  • metrics specify how the HPA should measure the utilization of the deployment’s pods. In this case, it’s measuring CPU utilization, and the target is an average utilization of 30%. This means that if the average CPU utilization of the pods goes above 30%, the HPA will automatically increase the number of replicas until the utilization comes back down. Similarly, if the average CPU utilization goes below 30%, the HPA will automatically decrease the number of replicas.

Apply the configuration

kubectl apply -f test-deployment.yaml

Load Testing

To test the installed HPA, a load tester is needed. This would be used to send a lot of concurrent requests to the deployment’s service and see if the HPA scales up the deployment.

An example of a load tester is hey.

Steps

Step 1: Watch the HPA

kubectl get hpa -w

Step 2: Hit the load balancer with a lot of requests

hey -z 300s -c 500 <loadBalancer-IP>

The deployment starts autoscaling:

The load test ran for a total of 5 minutes, during which it made 500 concurrent requests every second for 300 seconds (as specified by the -z 300s -c 500 options).

The time it takes for the HorizontalPodAutoscaler (HPA) to start downscaling depends on several factors, including the HPA’s configuration, the metric being used for scaling, the workload’s response time, and the size of the cluster.

The HPA takes into account the average CPU or memory utilization over a defined period of time, usually 5 minutes, to make scaling decisions. If the utilization drops below the target value for an extended period of time, the HPA will start downscaling the replicas.

The HPA also takes into account the minimum and maximum number of replicas defined in its configuration. If the workload requires fewer replicas than the minimum defined, the HPA will not downscale below that number.

In general, it can take a few minutes for the HPA to detect a decrease in traffic and start downscaling. However, this can vary based on the workload and the HPA’s configuration. It’s important to monitor the workload and adjust the HPA’s configuration as needed to ensure that it scales appropriately.

--

--