Prediction with Precision Enables the Reservation of Resources for Multi-Cloud Management: Part 1

1. Multi-Cloud Trends

More enterprises are adopting cloud-based IT infrastructures, software, services, and technologies. Migrating to the cloud solution increases agility and flexibility and reduces the cost of hardware maintenance for businesses. A large percentage of cloud users tend to leverage multiple cloud services, including private, public, and hybrid clouds for running their applications. To better manage the cost and Service Level Agreement (SLA) requirements for workloads in a multi-cloud environment, users first must consider complex factors such as the instance types, pricing, load-balancing, and fault-tolerance. This article intends to show that a workload prediction with precision can be provided, the cost of operation can be greatly reduced, and the SLA can be better maintained.

Most cloud providers offer a Containers as a Service (CaaS) solution via Kubernetes (K8s). Kubernetes is a portable, extensible open-source platform for managing containers. It can run on bare metal (such as Ubuntu, RHEL, and CoreOS-installed commodity servers) or on any cloud provider infrastructure (like Google GKE, AWS EKS, and Azure AKS). Migrating applications with Kubernetes containers are more convenient than migrating virtual machines (VMs) in a multi-cloud environment.

In Kubernetes, a pod is a cohesive unit of service that includes one or more application containers to provide deployment, vertical scaling, and horizontal scaling (replication). The basic unit of work, a pod, can be configured with multiple schedulers for a workload balancing purpose. For example, users can utilize node affinity to assign pods to the proper nodes at a scheduled time. When users need to migrate or deploy services on a remote cloud, several factors to consider include: how to choose the instance types, when to deploy services, where to allocate services, and so on. Kubernetes can dynamically create or delete nodes and storages on demand. If a less expensive instance is used to run applications during off-peak hours and can dynamically add several nodes according to the characters of Kubernetes, costs are further reduced. However, a typical cluster often includes hundreds or thousands of pods, and an enterprise needs to manage several Kubernetes clusters. Managing a large Kubernetes cluster becomes a challenge, especially in multi-cloud environments.

In this Part 1 article, we describe a machine-learning approach to predict the future users’ workloads and propose a dynamic resource allocation strategy using on-demand instances, Just-in-Time Fitted (JITF) Resource Allocation, which can lead to significant cost savings, compared with a static over-provisioning policy. We use an example based on workloads from Alibaba and illustrate that the cost savings could range from 49% to 80%, depending on the selected on-demand instance type of the over-provisioning policy.

The rest of the article is organized as follows. Section 2 explains the resource allocation challenges of Kubernetes in public clouds. Section 3 describes the intelligent policy, JITF. Section 4 describes how to optimize resource planning with a long-term policy. Section 5 illustrates our intelligent resource planning solution — The simulation results are shown in Section 6. Section 7 concludes this article.

2. Resource Allocation Challenges for Multi-Cloud Environments

Recently, many proposed hybrid cloud management products allow administrators to monitor and manage all applications running on the private and public cloud through a single user interface (UI). These products only consider the heuristic and intuitive management strategies from private cloud to public cloud or between public clouds. They may set an upper bound to monitor resource usage and use a short-term trend configuration automation to manage resources among private and public cloud environments. These strategies are not applicable to the container-based cluster management, especially for Kubernetes in multi-cloud environments.

Case in point, in July 2018, Auto Trader employee Karl Stoney migrated their workload to Kubernetes on Google Container Engine. He deployed kube-state-metrics to gather resource usage metrics and stored them in Prometheus. He used Grafana for the dashboard to provide a high-level overview of costs and resource utilizations. Manual monitoring and managing resource usages reduces the number of redundancies and in turn saves on costs. However, for more complicated, multi-cloud environments, a more advanced solution, such as an AI tool, is needed. Stoney concluded that the next step was to use historical data to generate predictive trends and provide cost analysis.1 AI services currently provide predictive trends for cost analysis.

3. Just-in-Time Fitted Resource Allocation Saves Costs

When a user wants to migrate services from a private cloud to a public cloud, the user should know which kinds of instance types are more suitable for cost and SLA requirements. In the private cloud, users usually use over-provisioning policies to allocate resources. However, cloud computing is pay-as-you-go. Its charges are based on resource usage contracts rather than on real usage. Over-provisioning policies result in a waste of resources and unnecessary costs. Users may not be able to easily switch the instance types due to the limits of cloud vendors. Therefore, it is important for users to obtain enough information to evaluate costs before migrating the workload to a public cloud. However, most users do not have sufficient information and knowledge to properly set the workload requests to the cloud service providers to save costs and still be able to achieve the SLA requirements.

Figure 1. The CPU and memory wastage comparison of over-provisioning and JITF strategies.

Figure 1 shows the difference between the CPU and memory usages of a workload based on an over-provisioning strategy and a Just-in-Time Fitted (JITF) strategy in a Kubernetes cluster for AWS in the Oregon region during a two-day period. The node’s instance type is i3.4xlarge with 16 vCPU and 122 GB of memory, and the time interval is 30 minutes. I3.4xlarge costs $1.248 per hour. As shown in Figures 1(a) and 1(b), an over-provisioning strategy usually needs to set a CPU upper bound at 70% and a memory upper bound at 30% to find an instance type to satisfy workload requirements on the cloud. On AWS, the proper instance type selected is c5.4xlarge and costs $0.68 per hour. The total instance cost during the two days is $32.64, and 45.51% of the cost can be saved if choosing a more cost-effective instance type, as shown below. The gray area depicted in Figure 1 shows the waste of resource utilization during off-peak hours.

Figures 1(c) and 1(d) illustrate the operation of an JITF strategy. The JITF strategy recommends using a node with the instance type c5.2xlarge during off-peak hours first and then increasing nodes at peak hours based on Figure 1(b). The price of a node with C5.2xlarge is $0.34 per hour, and the total cost is $18.36 during the two days of executing. This example illustrates that, armed with predictive workload with precision, costs can be significantly reduced over the existing static workload setting in Kubernetes in multi-cloud environments.

4. Optimizing Resource Planning with Long-Term Trends Enhances Stability

Figure 2. Policies based on the short-term and long-term trends.

Many companies also try to use a dynamic pricing policy based on a short-term trend for cost saving. Figure 2(a) describes how the policy works. As shown in Figure 2(a), the system may collect the resource utilization (i.e., CPU usage, memory usage, and so on) per time interval to determine the next resource usage. If the resource usage at t₂ is smaller than that at t₁, the system may remove one idle node for cost saving. If the resource usage at t₄ is smaller than that at t₅, the system automatically creates one node to provide more capacity for resources. However, on-demand policies are not good for the varied traffic since it cannot correctly meet the resource requirements. As shown in Figure 2(a), the system creates four VMs to support services at t₂ and reduces one idle VM at t₃ for cost saving. The system generates spikes at t₃ and causes resource insufficiency. Insufficient resources may result in pending pods or CrashLoopBackOff pods for a Kubernetes cluster and degrade the system performance. Kubernetes can set resource quotas for specific applications. However, the requests and limits of applications are required to update frequently since workloads vary.

Additionally, some resources, such as storage, may not be easily expanded or deleted. It is difficult to meet the applications’ requirements without a long-term trend evaluation. Besides, service vendors still need to consider the cloud service’s stability and the proximity of the datacenter to the customers. They also need to determine whether to replicate more services to provide frequent users greater benefits, as replicating may bring more redundant storage costs. A more long-term evaluation, Figure 2(b), is needed to determine how to satisfy SLA requirements. A long-term evaluation can show the entire system blueprint. Service vendors can make better decisions about when and how to maintain a Kubernetes cluster, whether to create or delete nodes manually or automatically, etc.

5. — AI-Driven Resource Orchestration Intelligent Solution for Kubernetes is an AI-driven resource orchestration intelligent solution which collects historical data and predicts long-term trends, especially for complex environments such as Kubernetes.

Figure 3. The resource reservation process using

Figure 3 illustrates the resource reservation process using First, Downstream collects the data about the Kubernetes cluster workloads and predicts the future workload distribution. Then Upstream uses an AI-based strategy to find a proper resource plan and does a long-term evaluation. Finally, it recommends a policy for resource reservation with minimum cost.

When applying to the Kubernetes resource management for a container, the predicted resource reservation can be dynamically configured in the “requests” and “limits” sections when specifying a container (see Figure 4). The CPU/memory specified in “requests” is the amount that Kubernetes will guarantee to the container, and CPU/memory specified in “limits” is the amount that Kubernetes will allow the container to use.

Figure 4. A pod object example.

When a pod is created, the Kubernetes scheduler selects a node for the pod to run on and ensures the requested CPU/memory can be provided on the selected node. After that, if the “requests” and “limits” are re-specified, the pod is restarted and rescheduled automatically.

6. Performance Simulation for Upstream

A real workload opened by Alibaba was selected as the data set for the Upstream simulation. The trace data from Alibaba2 in August 2017 contains information about a production cluster and about 1.3 thousand machines that run both online services and batch jobs. According to its service instance events, these nodes had 64 CPU cores, presumably 256 GB of memory, and 1 TB of disk space (since the trace data only includes normalized memory and disk requested ratios).

CloudSim is a widely-used framework for the modeling and simulating of cloud computing infrastructures and services. ProphetStor data scientists designed the simulated environment of a Kubernetes cluster based on the cloud architecture of CloudSim.

Figure 5. (a) Used CPU cores and (b) used memory size © active VMs of a K8s cluster during a 168-hour period.

The simulation used the “used percent of requested CPU (cpu_util), memory (mem_util), and disk space (disk_util)” from the service instance usages of Alibaba and the “utilization of CPU (plan_cpu), memory (plan_mem), and disk space (plan_disk)” to calculate the “total used amount of CPU, memory, and disk space” of a Kubernetes cluster during a 168-hour period. The used amount size of CPU, memory, and disk spaces are utilᵢ ✕ plan✕ size, where i∈{cpu, memory, disk} and sizeare the actual VM types. Figures 5(a) and 5(b) are the sum of used CPU cores and memory sizes for all containers of a Kubernetes cluster during the 168-hour period.

When a user wants to deploy a Kubernetes cluster on AWS, he or she needs to evaluate the cost on AWS first. The simulation uses the price of the similar instance type (m4.16xlarge) and the storage type (EBS on Seoul regions of AWS) as the instance cost of each node. The on-demand instance (M4.16xlarge) costs $3.936 per hour. AWS configures the root file system of each node as the EBS storage type and EBS General Purpose SSD (gb2) volumes are $0.114 per GB a month. As for the network traffic cost, it is free for users to upload data and send requests from Internet to EC2. It is also free when the amount of data transferred from EC2 to Internet is smaller than one GB per month. The simulation only considers the instance cost and storage cost since they are the most wasteful resources in the public cloud. Upstream is compared with the original configuration of Alibaba in a Kubernetes cluster. Downstream collected workloads of a Kubernetes cluster periodically, seen in Figures 5(a) and 5(b), and predicted the trends of workloads. According to the off-peak workloads and the system requirements, Upstream found that r5.4xlarge with 16 cores and 128 GB of memory is the best instance type for the workload. It then recommended users to change the instance type and allocate resources according to the predicted workloads and the concepts of JITF. Figure 5(c) is its recommended resource plan. Over-provisioning sets an upper bound to allocate resources by using the same instance type with Upstream for a Kubernetes cluster.

Figure 6. (a) The cost (including VM and storages), (b) the VM cost, © the storage cost, (d) the accumulated cost, (e) the accumulated VM cost, and (f) the accumulated storage cost during a 168-hour period.

Figures 6(a), 6(b), and 6(c) are the cost (including VM and storage costs), instance cost, and storage cost during a 168-hour period. The cost is denoted as the sum of VM cost and storage cost. As shown in Figures 6(a) and 6(b), it shows that the total cost is mainly affected by the VM cost. Figure 6(c) shows that the storage cost is based on the number of VMs since it is assumed that the root file system and the persistent volume (PV) sharing the one TB capacity of each node. Figures 6(d), 6(e), and 6(f) are the accumulated costs, the accumulated VM cost, and the accumulated storage cost, respectively.

The default configuration of Alibaba will cost $687,848 during the 168-hour period since m4.16xlarge is $3.936 per hour. Compared with default allocations of Alibaba, it only takes $136,924.652 during the 168-hour period when Upstream chooses a proper instance type (r5.4xlarge) and runs the JITF strategy to allocate resources. It achieves 80.08% in cost savings. An over-provisioning policy tries to save cost by using an upper bound for resources and choosing the same instance type with Upstream. Compared with the over-provisioning policy, Upstream with JITF policy can further save 49.18% in cost.

In summary, users often use an improper instance type to deploy applications on a Kubernetes cluster such as the default configuration of Alibaba. The over-provisioning policy in Figure 6 assumes that a fixed, lowest-cost instance type is selected, which can meet the workload requirements at all times. However, it is difficult to find a suitable instance type without an intelligent policy such as

7. Conclusion delivers AI-driven resource orchestration intelligence for Kubernetes. Downstream provides the insights into the infrastructure and applications and the foresights about workload prediction. Upstream provides the understanding and recommendations of cloud provider costs for multi-cloud environments.

From the simulation results, Upstream recommended the proper instance types based on the long-term trends of workload predictions without compromising the support of the SLA. The total cost of the workload can be further reduced with the predictive workload and dynamic configuration using the Just-in-Time Fitted policy.

In Part 2 of this paper series, we will further this study and take into consideration the different price discounts offered by cloud providers for users who subscribe long-term contracts. With the use of, we will show how Continuous Integration and Continuous Delivery (CI/CD) and DevOps workloads can be deployed on the most cost-effective cloud platform and will continue elaborating on how predicting workloads with precision can help manage cloud resources more efficiently. One can think of as the “Google Maps” for Kubernetes, analogous to how Google Maps provides navigation intelligence for users on the road (with road traffic being the workload and the roads being the resource instances).

To learn more about ProphetStor multi-cloud management solution, visit us here.




Digital intelligence for Multi-hybrid Cloud Management

Recommended from Medium

365 Data Science — Day 76 Power BI

Microsoft Azure and cloud services.

What is Open Source Code?

How to stay motivated when learning to code?

Where shall I learn from

S.O.L.I.D Design Principles Simplified

Into the unknown

Loading Scenes in Unity

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


ProphetStor, a leading AIOps vendor, helps enterprises optimize cloud resources and accelerate application performances. #Kubernetes #federatorai #AIOps #5G

More from Medium

EP.1 Templates | Argo Workflows

Run AutoML Pipelines on Google VertexAI

Building TimeSeries ML model with Prometheus DevOps Dataset

Apache Airflow on GKE