Autoscaling in IBM Cloud private
Introduction
In most environments, it is quite common to have fluctuations in service demands. For a period of time, services could be running in high demand, which requires increased amounts of compute resources. At other times services can be in an idle state and are still holding on to unused resources. IBM Could Private uses the Kubernetes Horizontal Pod Autoscaling (HPA) feature to dynamically control the pod replica numbers in the environment. The number of pods that are available to a service can automatically scale up and down in response to the current CPU utilization states in the cluster.
Scaling is based on the CPU utilization target value assigned by the user.
- The cluster will scale up when
Current_CPU_UT >Target_CPU_UT*10%
; - The cluster will scale down when
Current_CPU_UT<Target_CPU_UT*90%
.
A time window is also implemented to allow stabilization of the system and to avoid haphazard scaling up or down of the cluster.
- A cluster will only scale up after 3 minutes has passed since the last scale up action
- A cluster will only scale down after 5 minutes has passed since the last scale down action
Autoscaling is a critical component to leverage efficient resource utilization in your cluster.
In this tutorial I will demonstrate how to use the autoscaling feature in an IBM Cloud Private cluster.
Note: This tutorial references the following Kubernetes documents: Autoscaling in Kubernetes and Horizontal Pod Autoscaling.
What you will need
An IBM Cloud Private 2.1.0.3 cluster
Part 1: Install IBM Cloud Private
To setup an IBM Could Private cluster, see the IBM Cloud Private Knowledge Center.
- IBM Cloud Private-EE— https://www.ibm.com/support/knowledgecenter/SSBS6K_2.1.0.3/installing/install_containers.html
- IBM Cloud Private-CE— https://www.ibm.com/support/knowledgecenter/SSBS6K_2.1.0.3/installing/install_containers_CE.html
Part 2: Create a new service
- Create a php-server Deployment
From the navigation menu, select Workload->Deployment.
- Select Create Deployment.
- On General tab, provide an Deployment name.
- On Container Settings tab, provide a container name, image name, port and CPU, memory units.
- Deployment name:
php-apache
- Container name:
php-apache
- Image name:
gcr.io/google_containers/hpa-example
- Container port:
80
4. Select Create.
Check the Deployment home page, to verify that the php-apache
deployment is successfully deployed.
When a deployment is successful, a new deployment is displayed on the deployment home page. The Desired Replica, Current Replica, Up-To-date Replica, and Available Replica
will all display the same value. This value is equal to the number of pods or replica that is specified during the deployment.
2. Create the php-apache deployment as a service
- From the navigation menu, select Network->Service.
- Create a new service with the same name with Deployment.
- Provide label and selector with value of
php-apache
. - Provide a port and a targetPort.
- Select Create
Part 3: Create a policy for the new service
- From the navigation menu, select Configuration Scaling Policies.
- Select Create Policy.
- Provide a name for the policy, the target deployment, the maximum and minimum number of replicas as well as a target CPU utilization.
- Policy name:
php-apache-scaling
- Scale Target:
php-apache
- Minimum replications:
1
- Maximum replications:
10
- Target CPU Utilization:
50
4. Select Create.
Part 4: Deploy a “load-generator” workload
This load-generator
workload is used to rapidly increase the CPU demand in your cluster.
- From the navigation menu, select Workload->Deployments.
- Select Create Deployment.
- Switch to JSON mode.
- Download the load-generator.yaml file. Copy and paste the contents into the editor.
- Select Create.
- Locate the node that this
load-generator
deployment pod is running on.
Part 5: Scale up the php-service by using the load-generator deployment
- From a terminal window, log in to the worker node where the
load-generator
pod is running. - Open a second terminal window on the same worker node. Having two terminal windows will allow for a faster ramp up of CPU usage.
- On both terminal windows, get the container id for the
load-generator
deployment. The container id is returned in the first column of your output, represented in bold font below.
$ docker ps |grep busybox676f6378f0e8 busybox “/bin/sh” 2 minutes ago Up 2 minutesk8s_load-generator.cbf23a7f_load-generator-1487968570-ijlx9_default_019a3ddb-92a0–11e6–8105–32d2bf3e6494_9a92382d
4. On both terminal windows, raise the load.
$ docker exec -it 676f6378f0e8 /bin/sh/ # while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done
View the policy status
From the navigation menu, select Scaling Policies.
In this example, the current CPU UT is now 388. Based on the HPA algorithms and the target CPU utilization conditions, this will trigger a scale up of the replicas of pods to 8.
Note: Actual numbers will vary in your environment. For more information on the algorithm used for autoscaling in Kubernetes, see https://github.com/kubernetes/kubernetes/blob/master/docs/design/horizontal-pod-autoscaler.md#autoscaling-algorithm.
View the deployment status
- From navigation menu, select Deployment.
- Select the
php-server
Deployment. - Scroll down to the Pods section.
You will now see that the number of pods have scaled up to meet the demands.
Part 6: Scale down the cluster
From one of the terminal windows. Stop the Deployment.
ctrl-c
View the policy status
Wait for a few minutes. This allows the system to stabilize and the time window to expire. After a while, our current CPU UT is 45. Based on the HPA algorithms and the target CPU utilization conditions, this will trigger a scale down of the replicas of pods from 8 to 4.
After the number of pods decreasing to 4, the Current_CPU_UT is stabilized to a number from 45% to 55%, which is in the range from 90% to 110% of Target_CPU_UT.
View the Deployment status
From the second terminal window. Stop the Deployment.
ctrl-c
View the policy status
After a few minutes, the current CPU UT is 0. This will trigger a scale down of the replicas of pods from 4to 1. 1 is the minimum number of replicas specified by the policy.
View the Deployment status