Autoscaling in IBM Cloud private

Published in

IBM Cloud

6 min readJul 6, 2018

Introduction

In most environments, it is quite common to have fluctuations in service demands. For a period of time, services could be running in high demand, which requires increased amounts of compute resources. At other times services can be in an idle state and are still holding on to unused resources. IBM Could Private uses the Kubernetes Horizontal Pod Autoscaling (HPA) feature to dynamically control the pod replica numbers in the environment. The number of pods that are available to a service can automatically scale up and down in response to the current CPU utilization states in the cluster.

Scaling is based on the CPU utilization target value assigned by the user.

The cluster will scale up when Current_CPU_UT >Target_CPU_UT*10%;
The cluster will scale down when Current_CPU_UT<Target_CPU_UT*90%.

A time window is also implemented to allow stabilization of the system and to avoid haphazard scaling up or down of the cluster.

A cluster will only scale up after 3 minutes has passed since the last scale up action
A cluster will only scale down after 5 minutes has passed since the last scale down action

Autoscaling is a critical component to leverage efficient resource utilization in your cluster.

In this tutorial I will demonstrate how to use the autoscaling feature in an IBM Cloud Private cluster.

Note: This tutorial references the following Kubernetes documents: Autoscaling in Kubernetes and Horizontal Pod Autoscaling.

What you will need

An IBM Cloud Private 2.1.0.3 cluster

Part 1: Install IBM Cloud Private

To setup an IBM Could Private cluster, see the IBM Cloud Private Knowledge Center.

IBM Cloud Private-EE— https://www.ibm.com/support/knowledgecenter/SSBS6K_2.1.0.3/installing/install_containers.html
IBM Cloud Private-CE— https://www.ibm.com/support/knowledgecenter/SSBS6K_2.1.0.3/installing/install_containers_CE.html

Part 2: Create a new service

Create a php-server Deployment

From the navigation menu, select Workload->Deployment.

Select Create Deployment.
On General tab, provide an Deployment name.
On Container Settings tab, provide a container name, image name, port and CPU, memory units.

Deployment name: php-apache
Container name: php-apache
Image name: gcr.io/google_containers/hpa-example
Container port: 80

4. Select Create.

Check the Deployment home page, to verify that the php-apache deployment is successfully deployed.

When a deployment is successful, a new deployment is displayed on the deployment home page. The Desired Replica, Current Replica, Up-To-date Replica, and Available Replica will all display the same value. This value is equal to the number of pods or replica that is specified during the deployment.

2. Create the php-apache deployment as a service

From the navigation menu, select Network->Service.
Create a new service with the same name with Deployment.
Provide label and selector with value of php-apache.
Provide a port and a targetPort.
Select Create

Part 3: Create a policy for the new service

From the navigation menu, select Configuration Scaling Policies.
Select Create Policy.
Provide a name for the policy, the target deployment, the maximum and minimum number of replicas as well as a target CPU utilization.

Policy name: php-apache-scaling
Scale Target: php-apache
Minimum replications: 1
Maximum replications: 10
Target CPU Utilization: 50

4. Select Create.

Part 4: Deploy a “load-generator” workload

This load-generator workload is used to rapidly increase the CPU demand in your cluster.

From the navigation menu, select Workload->Deployments.
Select Create Deployment.
Switch to JSON mode.
Download the load-generator.yaml file. Copy and paste the contents into the editor.
Select Create.
Locate the node that this load-generator deployment pod is running on.

Part 5: Scale up the php-service by using the load-generator deployment

From a terminal window, log in to the worker node where the load-generator pod is running.
Open a second terminal window on the same worker node. Having two terminal windows will allow for a faster ramp up of CPU usage.
On both terminal windows, get the container id for the load-generator deployment. The container id is returned in the first column of your output, represented in bold font below.

$ docker ps |grep busybox676f6378f0e8 busybox “/bin/sh” 2 minutes ago Up 2 minutesk8s_load-generator.cbf23a7f_load-generator-1487968570-ijlx9_default_019a3ddb-92a0–11e6–8105–32d2bf3e6494_9a92382d

4. On both terminal windows, raise the load.

$ docker exec -it 676f6378f0e8 /bin/sh/ # while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

View the policy status

From the navigation menu, select Scaling Policies.

In this example, the current CPU UT is now 388. Based on the HPA algorithms and the target CPU utilization conditions, this will trigger a scale up of the replicas of pods to 8.

Note: Actual numbers will vary in your environment. For more information on the algorithm used for autoscaling in Kubernetes, see https://github.com/kubernetes/kubernetes/blob/master/docs/design/horizontal-pod-autoscaler.md#autoscaling-algorithm.

View the deployment status

From navigation menu, select Deployment.
Select the php-server Deployment.
Scroll down to the Pods section.

You will now see that the number of pods have scaled up to meet the demands.

Part 6: Scale down the cluster

From one of the terminal windows. Stop the Deployment.

ctrl-c

View the policy status

Wait for a few minutes. This allows the system to stabilize and the time window to expire. After a while, our current CPU UT is 45. Based on the HPA algorithms and the target CPU utilization conditions, this will trigger a scale down of the replicas of pods from 8 to 4.

After the number of pods decreasing to 4, the Current_CPU_UT is stabilized to a number from 45% to 55%, which is in the range from 90% to 110% of Target_CPU_UT.

View the Deployment status

From the second terminal window. Stop the Deployment.

ctrl-c

View the policy status

After a few minutes, the current CPU UT is 0. This will trigger a scale down of the replicas of pods from 4to 1. 1 is the minimum number of replicas specified by the policy.