Spring Boot + Kubernetes — Scalability with Horizontal Pod Autoscaler (HPA)

Published in

Digital Software Architecture

8 min readDec 11, 2020

In this article we are going to see how to deploy a Microservice made with Spring Boot on a Kubernetes cluster; later we will configure the mechanism to make our Microservice scalable according to the needs of use (in this case on CPU consumption), this technique is called Horizontal Pod Autoscaler (HPA).

All Java source and Kubernetes configurations are available at the following Git repos:

scndr/medium-blog-digital-software-architecture

Git repos for Medium blog "Digital Software Architecture" (https://medium.com/digital-software-architecture) …

github.com

Spring Boot Microservice

First of all, let’s create our project with Maven, below we report the pom.xml useful to satisfy Spring Boot dependencies:

pom.xml

Our pom.xml is minimal as we are going to use only two Spring Boot dependencies:

spring-boot-starter-web → For creating a REST Service
spring-boot-starter-actuator → To expose our Health, Liveness and Readiness endpoints

To configure the actuator and tell it that we also want to expose the Probe Services for Kubernetes, we create the ‘application.properties’ file and insert the following line:

management.health.probes.enabled=true

Let’s proceed with the writing of a minimal REST Service, with only one GET (HTTP Method) Endpoint:

Java Code

In the controller we have added a logic to increase the CPU usage (to simulate the computational logic of a real application).

Now our Microservice is ready and we can run it from the command line:

mvn clean spring-boot:run

The Spring Boot Application now exposes the service:

http://localhost:8080/sayHello → REST service with application “logic”
http://localhost:8080/actuator → REST Service which lists all REST Services available through Actuator

Our application is now ready to be Conteinerized, so we’re going to create our Docker image. To create it we need to have the Dockerfile to compose the image Layers.

This is our Dockerfile:

Dockerfile

The FROM instruction indicates which image we are going to use as Runtime of our Spring Boot App, then with ARG and COPY we go to insert our created JAR and the ENTRYPOINT instruction indicates how the container should start the application.

Note: We are using the balenalib/armv7hf-alpine-openjdk:8 image as our image will run on the Raspberry Pi, as we have already ready the Docker + Kubernetes (K3s) environment executed in this Post. If your image is to run on an architecture other than the ARMv7 32Bit (Raspberry Pi) you need to use another image in the FROM instruction.

Now everything is ready to build our Docker image:

mvn clean packagedocker build -t spring-boot/hello-app .

At the end of the process we find our image in the local Docker registry:

docker image ls

And we will have the following output:

Kubernetes Deploy

In this example we are using the Kubernetes instance running on K3s, which has already integrated and configured Traefik (a Cloud Native Edge Router); if you use other products such as Microk8s or Minikube you will need to make sure you have Traefik installed and configured, so as to expose our Service through Traefik. Alternatively you will have to do a kubectl port-forwarding.

Let’s create our deployment file for Kubernetes:

hello-app-deployment.yaml

I highlight some configuration included in the file above. We have three structures Deployment (for the creation of our Container), Service (the manager of the Pods created by the Deployment) and Ingress (useful for directing traffic to and from Traefik).

We use the following properties:

imagePullPolicy: IfNotPresent → To tell Kubernetes that if it doesn’t find the spring-boot/hello-app image in the public repositories where it will look for the image, it needs to try looking for it on the local Docker repository. (Attention K3s must use Docker and non Containerd Containers as runtime)
containerPort: 8080 → We tell Kubernetes which port our Microservice uses
livenessProbe e readinessProbe → We specify to Kubernetes which endpoint it should use to use the two Probes to understand the health of our application
resources/limits/cpu →We configure the required resources and limits (in this case only at the CPU level, but we can also specify other criteria such as Memory). This is useful for not letting a single Service take all the resources, and above all to enable the Autoscaler!
pathType: Prefix e path: “/hello-app” → We specify to Traefik which URL prefix where to redirect calls to our Service.

Our deployment file is ready, and we can install with the command:

kubectl create -f hello-app-deployment.yaml

At the end, we can check that all 3 of our Deploy objects (Deployment, Service and Ingress) have been installed.

> kubectl get pod
NAME                                    READY   STATUS    RESTARTS   AGE
hello-app-deployment-5468847d57-m4dql   1/1     Running   0          65m> kubectl get service
NAME                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
kubernetes          ClusterIP   10.43.0.1      <none>        443/TCP    4h31m
hello-app-service   ClusterIP   10.43.84.106   <none>        8080/TCP   4h30m> kubectl get ingress
NAME                CLASS    HOSTS   ADDRESS         PORTS   AGE
hello-app-ingress   <none>   *       192.168.1.100   80      4h30m

We will have our Microservice exposed through Traefik, calling the following url:

http://<HOST>/hello-app/sayHello

Install metric-server on Kubernets (K3s)

To configure the metric on which Kubernetes is based to allow us to scale with HPA (Horizontal Pod Autoscaler), we need to install the metric-server component that simplifies the collection of metrics from the various objects deployed on the Kubernetes cluster.

As indicated by the official K3s documentation (link) the installation of this metric-server. is recommended.

For the installation we can follow the steps described in the documentation above, and then configure the metric-server deployment objects to enable Development mode (to bypass TLS security).

For this or already modified the metric-server installation ready for our use:

metric-server-deployment.yaml

For installation we use the following command:

kubectl create -f metric-server-deployment.yaml

HPA — Horizontal Pod Autoscaler

HPA automatically scales the number of pods on a replica controller based on CPU usage (or other custom metrics, such as with the help of Prometheus).

The Kubernetes cluster periodically adjusts the number of replicas on a controller to match the average CPU usage observed at the specified target.

Now let’s configure the API that deals with HPA, let’s create our configuration file, indicating the target threshold:

hello-app-hpa-deployment.yaml

In the file we specify:

name: hello-app-deployment → Our service
minReplicas: 1 e maxReplicas: 5 → We tell Kubernetes how much our Service can scale
name: cpu e targetAverageUtilization: 70 → The CPU threshold for which the creation of another POD of our Service is required, enabling additional replicas.

Let’s install the configuration:

kubectl create -f hello-app-hpa-deployment.yaml

Now our configuration is ready, when our Spring Boot Application reaches the CPU usage threshold of 70%, Kubernetes will automatically replicate, then scale, our Service automatically.

Obviously when it is no longer needed, the replicas created at the moment will be killed by Kubernetes.

Scale with HPA!

Now everything is ready to see our Kubernetes cluster orchestrate our application for us, creating the replicas necessary to meet the workload required of our application.

For testing purposes, let’s configure with lower the values:

horizontal-pod-autoscaler-downscale-stabilization → A value that indicates how long to wait before downscale replicas for HPA
horizontal-pod-autoscaler-upscale-delay → Value that indicates how long to wait before upscale replicas for HPA

On our K3s server, we can edit the /etc/systemd/system/k3s.service file and add our parameters at the end:

> cat /etc/systemd/system/k3s.service[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target[Install]
WantedBy=multi-user.target[Service]
Type=notify
EnvironmentFile=/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server \
        '--docker' \
        'horizontal-pod-autoscaler-downscale-stabilization=30s' \
        'horizontal-pod-autoscaler-upscale-delay=30s'

In this case we have entered 30 seconds for both values, so as to immediately have evidence of scaling.

Let’s proceed to restart the K3s service:

systemctl stop k3ssystemctl start k3s

We can now start our test by launching the Heytool to perform the Load Test of our Spring Boot Application.

hey -c 20 -z 60s http://<HOST>/hello-app/sayHello

Now that our test is underway we can monitor the Kubernetes cluster situation with the following commands:

kubectl get hpa → Monitor HPA. Provides indication of CPU usage, Target, and number of active Replicas
kubectl top pod → Monitor the resources used for each Pod (similar to the linux comand top, but for the pods)
kubectl get pods → See how many replicas of our application are present in the cluster

Below is the screenshot of my Monitoring on the K3s cluster during the Load Test:

We can see:

At the beginning of the test the HPA values were in the threshold so we did not need replicas.
Our Service started to “suffer” the workload and as we see it started consuming more CPU than the set threshold.
After the period of horizontal-pod-autoscaler-upscale-delay, let’s see how Kubernetes has created a replica automatically. From calculations we see that the total has 2 Reps because 133% of CPU is less than 70% * 2 Reps (to create a third replica we should have exceeded 140%).
From the get pods command we see our two pods in the Running state.
Once the Load Test is finished we see how the CPU is back to normal and there are still 2 active Replicas.
After the period of horizontal-pod-autoscaler-downscale-delay, we see how Kubernetes has removed a replica, no longer necessary as it is below the threshold, automatically.
From the get pods command we see only one pod in the Running state.

Finally we have seen how to Deploy a Spring Boot Application on Kubernetes and how to enable a mechanism to have an automatic scalability of our Application.

Useful links

Italian Version:

Spring Boot + Kubernetes — Scalabilità con Horizontal Pod Autoscaler (HPA)

Vediamo come realizzare una Spring Boot Application che scala automaticamente su Kubernetes tramite Horizontal Pod…

medium.com