Scaling workloads on Kubernetes based hybrid cloud environment

Authors: Sundaragopal Venkatraman, Krishnan Venkitasubramanian, Ramakrishna Alavala, Anandakumar Mohan, Anitha Reena.

Introduction

Enterprise applications need to support dynamic on-demand business requirements. One such key requirement is workload scaling to support traffic burst or spikes happening in the production environment. For scalable Workloads, scaling comes with additional capacity scaling requirements if the spike overflows the current capacity.

These additional capacity requirements are satisfied through overcommit on Kubernetes/Platform, Horizontal node scaling and Vertical node scaling. In this article we explore ways to scale capacity to meet on-demand requirements and also look at certain un-documented approaches to Vertically Scaling Clusters

Resource Overcommit

Overcommitment is a state (overcommited state) where the sum of the container compute resource requests and limits is actually higher than the resources available on the system. Overcommitment might be desirable in certain environments like development environments where a trade-off of guaranteed performance for capacity is acceptable.

Best Practise: To provide more reliable scheduling and minimize node resource overcommitment, it is recommended that administrators have each node reserve a portion of its resources (kube-reserved) for use by all underlying node components (such as kubelet, kube-proxy) and the remaining system (system-reserved) components (such as sshd, NetworkManager) on the host. Else overflowing on resources will create cascading effect of node draining happening on the cluster.

Horizontal Node Scaling

Horizontal Node Scaling adds new nodes to the cluster to meet dynamic resource scaling demands. Horizontal scaling of nodes automatically resizes the size of the cluster based on the demands of your workloads. The ClusterAutoscaler increases the size of the cluster when there are Pods that failed to schedule on any of the current nodes due to insufficient resources or when another node is necessary to meet deployment needs.

Note: The ClusterAutoscaler does not increase the cluster resources beyond the limits that you specify.

While you are already aware of the Horizontal Pod Autoscaler (HPA) ; the HPA and the ClusterAutoscaler modify cluster resources in different ways. The HPA changes the number of replicas based on the current load. If the load increases, the HPA creates new replicas, regardless of the amount of resources available to the cluster. If there are not enough resources, the ClusterAutoscaler adds resources so that the HPA-created Pods can run.

Vertical Node Scaling

Though Horizontal scaling is the advised way of scaling cluster as a best practise, Horizontal scaling requires considerable time to add new node to your cluster.

In this section we explore ways to vertically scale clusters and also look at certain un-documented approaches to Vertically Scaling Clusters.

More often than not we are faced with two important scenarios as far as node resources are concerned:

1) Some Node compute resources are not available (eg: a CPU that has become offline or becomes not available)

2) Customers want to vertically scale the node to quickly add resources to make additional resources available for pod scheduling

Let us examine each of these scenarios in detail.

Node Resources Go Offline

Let us take an example of a virtual CPU attached to a node becoming unavailable. In this case, the Kubernetes Node still thinks that it has the full set of allocatable CPUs and continues to schedule pod for the allocatable capacity. This creates performance issues at node level and as well at the pod level.

The scenario can be validated by making one of the CPU offline in a running cluster. Let us take an example of an OpenShift cluster with RHEL 7 nodes.

You can verify the current available CPUs by running the below command:

grep “processor” /proc/cpuinfo

Before taking a CPU offline, check the current CPU capacity in the node by running the below command:

oc describe node ose-ose-3|grep -A11 Capacity

To take one of the processor’s offline, run the below command:

echo 0 > /sys/devices/system/cpu/cpu3/online

Check the node CPU capacity in the node description after taking the CPU offline:

oc describe node ose-ose-3|grep -A11 Capacity

As shown above, even after the CPU goes offline, the allocatable capacity does not change in the node.

Vertical Scaling

On a running cluster, horizontal scaling is ideal to meet resource demands. Horizontal scaling requires new nodes to be created and pods to move to the new node. This is time consuming activity and at times does not cater to the current spike in demand that needs to be addressed immediately. In these situations customers can explore the possibility of Vertical scaling of resources to quickly address these demands; provided the resources are available on the node server.

Let us explore how the Node Capacity is correctly reflected once a CPU goes offline or even in a scenario when additional capacity is added.

Kubernetes Clusters:

The kubelet is a daemon that acts as a local agent and watches for pod specifications in a node. It is responsible for registering a node with a Kubernetes cluster, sending events, pod status and reporting resource utilization. It is observed that any resource changes (like CPU/Memory Add/Remove) does not take effect until the Kubelet process is restarted. Hence, whenever there is an addition of a new resource, we can restart the Kubelet process manually to have this resource addition reflected in the Node Configuration. To restart the Kubelet process run the below command:

  • systemctl restart kubelet

OpenShift Clusters:

For OpenShift 4.X

The Kubelet process runs as a system process and hence the restart process is similar to the Kubernetes Cluster restart process

To restart the Kubelet process run the below command:

  • systemctl restart kubelet

For OpenShift 3.X

In OpenShift 3.X environment, all hosts have atomic-openshift-node service active and enabled. To apply configuration changes, you must restart OpenShift Container Platform services using the below command:

  • systemctl start atomic-openshift-node.service

Once the service is restarted check the CPU capacity in the node as shown below:

As you can see above, the capacity of CPU’s is now showing the correct value:

Platform Overcommit

One of the other ways to achieve Scaling is to do a Platform Overcommit. This can help with immediate spike on resources as a temporary measure. In this scenario, the physical servers should be scale-up servers with multiple VMs hosted to share the resources optimally.

Perform a platform overcommit on the servers such that system can redistribute resources based on defined VM resource sharing priority. The resource sharing priority should be defined based on workload patterns. This approach also assumes that the VMs that co-exist on a particular server have mutually exclusive peaks or resource requirements to effectively share resources among themselves. The resource overcommit could help with sustaining load for limited period before new nodes are added providing a temporary relief for resource starvation in the environment on first come first serve basis.

For example, The below picture depicts resource overcommit as an approach using IBM PowerVM virtualization.

--

--