Implementing Kubernetes: The Hidden Part of the Iceberg — Part 2
Kubernetes Scheduling and Resource Management.
In the 1st part of this K8S series, we covered much ground related to building, maintaining, and operating an extensive fleet of K8S clusters. This 2nd part will be more condensed and focused on Kubernetes Resource Management and Scheduling.
If we look up the official definition of Kubernetes at kubernetes.io, it says, “Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications.”. So there are two words in this definition that make Kubernetes’s deployment and scaling of containerized applications: deployment and scaling. Although deployment and scaling are K8S concepts that involve a lot of documentation, fine-grained technical configuration, and squares of hours to understand and implement correctly, I will try to keep it simple and stick to our experiences at GumGum.
The Kubernetes Scheduler
Roughly speaking, the Kubernetes scheduler is in charge of watching for new Pods and deciding the best/optimal Nodes to run them on, based on a myriad of criteria and real-time metrics collected from the cluster nodes and the applications. This scheduling challenge will be as straightforward or as complicated as your microservices and workloads are. Typically, on a production-grade K8S cluster, you will have many different microservices/applications, each with a different CPU, memory, networking, storage, and possible other non-implicit constraints and requirements that we will mention later. The K8S scheduler must consider those requirements and restrictions to properly decide the optimal nodes to run your applications/Pods.
As with any other algorithm, the scheduler’s decisions will be as accurate and optimal as the quality of the provided application’s details. Therefore, resource requests and limits are the most critical information you should provide about your applications.
Before we dive into the details of the K8S scheduler, resource management, and the related auto-scaler, I’d like to borrow a great analogy that I found when reading articles and documentation on this topic and continue from now on using that analogy so that the reader can get more familiar with the concepts.
The Kubernetes scheduling process is like the Tetris game, slightly different, but keeping the basics from the game:
- In Tetris, you have a board/field with a fixed dimension of N rows and M columns; in the Kubernetes land, you have a Node(s) with a limited amount of N CPU and M memory.
- In Tetris, you have incoming geometrical shapes or tetrominoes that the player has to accommodate in the field/board, trying to optimize the available space; in K8S, you have incoming Pods that the scheduler has to
accommodatein the existing Nodes, which have some free amount of memory and CPU each, same as Tetris, trying to optimize the available resources.
Following the analogy, the Kubernetes Scheduler is the Tetris player that has to make quick and optimal decisions about the best way to accommodate the incoming requests for new Pods that need to be executed in the Cluster. To refresh your Kubernetes knowledge, you must remember that the pod is the smallest schedulable unit in a K8S cluster. Still, the pod is made of one or more containers, each having a different CPU and memory requirements.
Let’s take a look at the following Java application that has been
containerized to take full advantage of all the bells and whistles that come from running it in a platform like K8S. According to figure 3, the application requires 8 Gib of memory and 1.5 vCores of CPU to properly work;
properly meaning an approximate number of RPS, concurrent users, TPS, or whatever workload measure you want to use.
To run this Java application on a K8S cluster, you will create a Pod resource definition that will look like this:
If you create this pod on any Kubernetes cluster, it will be scheduled and possibly executed if at least one node is available. One problem with this Pod definition is that it does not have any CPU and memory specifications, so it’s like if the Tetris player, a.k.a. the K8S scheduler, receives a piece that needs to be accommodated, but without any information about its dimensions, a.k.a. CPU and memory. The pod will be scheduled for sure, but we don’t know if it will be able to run and if it’s able to run, we don’t know if it will be able to handle the expected workload. That’s why we need to properly define both requests and limits for the CPU and the memory.
Resource Limits and Requests
Following the above-mentioned Java application example, if we want K8S to execute the application correctly, we need to instruct the scheduler on the CPU and memory it requires, given a particular workload. The container’s CPU and memory instructions for Kubernetes are provided using the
resources dictionary, as shown below:
resources dictionary has two parts, the requests and the limits. To summarize, the cpu and memory requests are used by the Kubernetes scheduler to decide the best node to accommodate the pod. In contrast, the cpu and memory limits are used by the kubelet to enforce the maximum cpu and memory that the container can use at any time. The kubelet also reserves at least the requested amount of cpu and memory specifically for that container to use. If the pod has more than one container, all container requests are added up for scheduling purposes, while the limits are enforced individually per container.
Let’s consider the following example of a Deployment with a multi-container Pod:
Let’s say we are running a single-node cluster, with, for example, four vCPUs and 16GB RAM nodes. Then, we can extract a lot of information:
- Pod effective request is 400 MiB of memory and 600 millicores of CPU. Therefore, you need a node with enough free allocatable space to schedule the pod.
- CPU shares for the redis container will be 512 and 102 for the busybox container. Kubernetes always assign 1024 shares to every core, so:
redis: 1024 * 0.5 cores ≅ 512
busybox: 1024 * 0.1cores ≅ 102
- Redis container will be OOM killed if it tries to allocate more than 600MB of RAM, most likely making the pod fail.
- Redis will suffer CPU throttle if it tries to use more than 100ms of CPU in every 100ms (since we have four cores, available time would be 400ms every 100ms), causing performance degradation.
- Busybox container will be OOM killed if it tries to allocate more than 200MB of RAM, resulting in a failed pod.
- Busybox will suffer CPU throttle if it uses more than 30ms of CPU every 100ms, causing performance degradation.
There are probably hundreds of blog posts and documents related to Kubernetes container resource limits and requests. Still, it’s never enough to stress the criticality of ALWAYS setting up both resource limits and requests for any application deployed to Kubernetes, and you will understand that sooner or later.
How to properly setup resource request and limits?
Defining the right values for both the CPU and memory requests and limits can be a challenge, depending on the kind of application/container that you are dealing with. There are certain technologies, like Java, that let the user to restrict the maximum memory usage at any time, regardless of the workload, but the CPU won’t be restricted and it would be throttled by the Kublet and/or container runtime engine.
Running a load test on your local or dev/test environment should be a good approach to better understand the real requirement of CPU and memory for your application. I personally recommend using Locust, an amazing OpenSource/distributed load testing tool, that will greatly help you to quickly setup and run a load test of any size and/or complexity. After running the load test for a considerable time, the following guidelines should be a general approach for setting limits and requests for your containers/Pods.
- For applications where the memory reserved/used remains the same regardless the workload (like most JVM-based applications), you should set both requests and limits for Memory to the same value.
- For applications where the memory reserved/used varies according to the workload, you should set the Memory request to the value observed when the application is during a “normal” workload of users and/or RPS and the Memory limit set to 20–30% above that value
- For applications where the CPU utilization is related with the workload, you should set the CPU request to the value observed when the application is during a “normal” workload of users and/or RPS and the CPU limit set to 40–50% above that value, due to the CPU being a special kind of resource, that can be compressed or throttled by the container runtime (kubelet), and can be somehow
sharedwith other containers.
Figure 9 and 10 briefly describe the possible consequences of setting too low or too high values for both CPU and memory (requests and limits).
- Defining requests and limits in your application’s containers is hard, but necessary.
- Getting those values right can be a daunting task unless you rely on a proven method, like load testing.
- The Kubernetes Scheduler is the component in charge of determining which node is most suitable for running pods, based on the requests defined.
- If you set too low or too high values for the resources and limits for your Pod’s container(s) you will face potential issues like resource starvation and resource waste.
While your Kubernetes cluster and workloads might work fine without setting resource requests and limits, you will start facing stability issues as your teams and projects grow. Adding requests and limits to your Pods and Namespaces only takes a little extra effort, and can save you from being paged at 3:00 AM !