Kubernetes: Horizontal Pod Scaling

Jonathan Campos
Google Cloud - Community
6 min readJun 5, 2018

With Pod Autoscaling your Kubernetes Cluster can monitor the load of your existing Pods and determine if we need more Pods or not. This is one of the biggest benefits of using Kubernetes as you will save yourself from overloading individual Pods and which leads to unexpected code behavior and various faults. There are ways for you to control this Pod Autoscaling and best practices around it. That is the purpose of this article.

Replicating Pods!

If you haven’t gone through or even read the first part of this series you might be lost, have questions where the code is, or what was done previously. Remember this assumes you’re using GCP and GKE. I will always provide the code and how to test the code is working as intended.

Pod Autoscaling

As we’ve discussed before, Kubernetes uses Docker containers in a Kubernetes container called a Pod to manage resources in the Kubernetes cluster. With Autoscaling, Kubernetes watches resource metrics of each Pod and determines if we need more or less Pods. I am being vague on purpose by saying “resource metrics” since you can create custom metrics based on your application’s needs. The most common is CPU utilization.

i.e. Kubernetes will watch the average CPU utilization over X seconds and, based on utilization, add or remove Pods. We use the average CPU utilization to reduce the noise from spikes.

Adding Horizontal Pod Autoscaling To Your Cluster

After a Kubernetes Cluster is ready you can add a Horizontal Pod Autoscaler (also referred to as an HPA) so that your Cluster adds and removes Pods as necessary based on resource metrics. Adding this HPA is really simple with the following script line.

echo "sets autoscale logic"
kubectl autoscale deployment endpoints --cpu-percent=50 --min=1 --max=10

Now that we know what an HPA is, we can focus on making our own and testing our Cluster on GCP.

Don’t Be This Guy

Test With Load

You can be one of two developers right now.

  1. You can push everything out and just trust it work.
  2. You can verify that things are setup properly before you are in trouble for not verifying your work.

That’s right. We are going to be developer number 2 right now and test that our Kubernetes Cluster is scaling as it is supposed to when things start to hit the fan. To do this we will create a whole other Kubernetes Cluster that will simulate load from a totally different GCP Zone. We are going to test in this way because it will adequately simulate load from another location while staying focused on code that we can really effect and not having too many test contaminants.

Note: This solution using Locust is detailed in this post. I just put it into a nice script for you.

First, let’s get our environment running. You can choose to either create autoscaling by default or run two alternative commands to add autoscaling after the fact. If you are curious about this Cluster Scaler I recommend checking out another post I put out on the Kubernetes Cluster Scaler.

$ git clone https://github.com/jonbcampos/kubernetes-series.git
$ cd ~/kubernetes-series/autoscaling/scripts
$ sh startup.sh # with autoscaling
$ # sh startup_wo_autoscaling.sh # without autoscaling
$ # sh add_autoscaling.sh # add autoscaling after creation

Wow those commands took care of a lot didn’t they!? Well now it is time to create our load testing Kubernetes Cluster, build our Docker file that includes our runner tests, and finally deploy our load testing code. To make this easy I put that into one script that you can dive into and see the magic.

You’ll notice that we added an argument to our script so that the load runner knows what address to test.

cd ~/kubernetes-series/autoscaling/scripts # if necessary
# You’ll notice that we added an argument to our
# script so that the load runner knows what address to test
$ sh startup_load_runner.sh 100.101.102.103

When this process is done you should see a prompt saying where to be able to view access your Cluster. It is time to go to that link and start your load runner.

Available At: [your cluster ip address]:8089

This is the fun part. You can enter how many users you want to simulate and their velocity and then set one Kubernetes Cluster to start attacking another.

As you are playing with the load runner you might want to add a watcher to your pods. This way you can see when the load starts to get to great and the autoscaling begins.

# view specific cluster details
$ gcloud container clusters get-credentials autoscaling-cluster --zone=us-central1-a
# show horizontal pod autoscaling details
$ watch kubectl get hpa # ctrl+c to stop

If you really want to strain your cluster you might want also want to add more workers with the following replication script I’ve set up for you.

$ cd ~/kubernetes-series/autoscaling/scripts # if necessary
$ sh scale_load_runner.sh X # <-- number of replicas to make

With this you’re effectively done! You’ve created a a cluster, you’ve set the cluster to autoscale and finally you’ve tested the scaling on your cluster by hitting it with a load runner. Seriously, it is amazing.

Bonus: If you want to enjoy watching the pods go back down you can clear out the load runner while still watching the hpa.

$ cd ~/kubernetes-series/autoscaling/scripts # if necessary
$ sh teardown_load_runner.sh

Extra Reading: If you are a bit more curious about Locust I recommend looking at my other link where I give some more details about what it takes to edit Locust files.

Teardown

Before you leave make sure to cleanup your project so you aren’t charged for the VMs that you’re using to run your cluster. Return to the Cloud Shell and run the teardown script to cleanup your project. This will delete your cluster and the containers that we’ve built.

$ cd ~/kubernetes-series/autoscaling/scripts # if necessary
$ sh teardown.sh

Closing

This post goes hand-in-hand with another post around the Cluster Autoscaler. I would recommend if this peaks your interest in scaling to head that way next.

Other Posts In This Series

Jonathan Campos is an avid developer and fan of learning new things. I believe that we should always keep learning and growing and failing. I am always a supporter of the development community and always willing to help. So if you have questions or comments on this story please ad them below. Connect with me on LinkedIn or Twitter and mention this story.

--

--

Jonathan Campos
Google Cloud - Community

Excited developer and lover of pizza. CTO at Alto. Google Developer Expert.