Planning a Production-Ready Kubernetes with Fundamental Controllers & Operators — Part 5: Scheduling Workloads

TL;DR A brief overview of scheduling in Kubernetes, “the almighty” controller loop, and additional “influencers” that enhance workload management.

Published in

Israeli Tech Radar

6 min readJul 8, 2024

Introduction

In the realm of distributed systems, running additional applications should ideally be a straightforward task. If you can run a specific process on a Kubernetes cluster, theoretically, you can run any process.
However, the differentiation between systems lies in their behavior and the architectural choices made by their managers. To make any system cost-effective, it’s crucial to address the question of resource consumption, particularly the “four primitives” — CPU, RAM, network, and disk.

**kube-scheduler** | a dedicated service for scheduling

For stateless applications, focus primarily on CPU and memory requests. For stateful applications, additional constraints like disk speed, network bandwidth, or GPU capabilities may come into play.

Let’s begin with resource requests and limits followed by a node-selection based on tolerations.

Resource Management in Kubernetes

Effective resource management in Kubernetes is vital, particularly for CPU and memory. Setting appropriate resource requests and limits during pod creation ensures that applications have the necessary resources to operate smoothly.

DALL-E | “kube-scheduler …” — this one took a while …

Pod Resource Requests and Limits

Here’s an example of a simple pod configuration for a web application, illustrating resource requests and limits in the pod’s manifest file

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  containers:
  - name: web-container
    image: nginx:latest
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

Requests: The pod requests 64Mi of memory and 250m (0.25 cores) of CPU, guaranteeing these resources for the pod.
Limits: The pod sets a limit of 128Mi of memory and 500m (0.5 cores) of CPU, representing the maximum resources the pod can use. Exceeding these limits may result in throttling or eviction.

Scheduler Considerations

When the kube-scheduler attempts to place this pod, it looks for a node that can accommodate at least the requested resources. The scheduler ensures that the pod has the necessary resources to start and run effectively, considering existing resource usage on nodes to avoid overcommitting resources.

Avoiding Resource Limits
Many community leaders advise against setting resource limits due to potential issues. For further insights, refer to Robusta.dev’s blog and Spot.io’s guide on CPU limits.

Vertical Scaling

At this point we either have nodes or node-pools which means we have a predefined set of resources to use in our cluster.

Unless you have a restricting policy in place (another option to keep your cluster tidy — for a different post), your users may not set limits / requests on their workload and “someone needs” to provide some kind of “guess-timation”, in this case we use VPA.

Vertical Pod Autoscaler (VPA)

Vertical Pod Autoscaler (VPA) frees users from the necessity of setting up-to-date resource limits and requests for the containers in their pods.

When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that appropriate resource amount is available for each pod.
It will also maintain ratios between limits and requests that were specified in initial containers configuration.

It can both down-scale pods that are over-requesting resources, and also up-scale pods that are under-requesting resources based on their usage over time.

Configured with a Custom Resource Definition object called VerticalPodAutoscaler. It allows to specify which pods should be vertically autoscaled as well as if/how the resource recommendations are applied.

Vertical Pod Autoscaler (VPA) for web-app

The Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory requests for your pods based on their actual usage.

VPA Configuration:
First, ensure that the VPA components are installed in your cluster. Then, create a VPA resource for the web-app

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: "v1"
    kind:       Pod
    name:       web-app
  updatePolicy:
    updateMode: "Auto"

This controller is a good starting point for optimizing the workload on your cluster and is also used under the hood by a tool called goldilocks which can be used as a “requests recommendation engine“.

When you install goldilocks it creates the vpa automatically and starts monitoring the current workload in order to help predict the current usage and if you preserve history it will provide the trends of the application over time and help set the correct request statement.

DALL-E | “kube-scheduler …” continue with a production grade cluster needs

Once we’ve covered resource allocation on a node, we can apply the same logic to a node-pool and as our cluster scales with our product needs, we need to start labelling and segregating between certain systems based on budget/hour etc …

This is where other supporting mechanisms play a big role both in vertical and horizontal scaling which we will dive into in the next part in the series.

Supporting Mechanisms in Scheduling

DALL-E | continue in prompt as above -> describe taints and tolerations for control-plane worker-plane system-workload and general purpose worker plane

While the kube-scheduler is the primary controller for scheduling, other mechanisms influence pod placement:

Node Controllers:
Ensure nodes are registered with the API server and have necessary labels/taints for scheduling.
Pod Affinity and Anti-affinity Rules:
Specify pod placement rules to enhance workload distribution and fault tolerance.
Node Selectors:
Use labels to control pod placement based on node attributes.
Custom Scheduler Plugins:
Extend kube-scheduler functionality with specific scheduling criteria through custom plugins.

These mechanisms work alongside kube-scheduler to achieve desired pod placement strategies within your cluster.

When you are scheduling resources the following process is performed by the scheduler:

Resource Limits and Requests:
Ensure pods are scheduled on nodes with available required resources.
Resource Labels:
Organize and select resources efficiently using labels.
Resource Tolerations:
Allow pods to be scheduled on nodes with specific taints.
Affinity and Anti-Affinity:
Ensure certain pods are scheduled together or apart based on application needs.

Let’s see an example

Using Labels, Taints, and Tolerations

Consider a scenario with nodes designated for specific workloads. Label these nodes and use taints to ensure only appropriate pods are scheduled on them this starts with tainting the node and then scheduling the pod.

Node Configuration

# our "tainted-node"
apiVersion: v1
kind: Node
metadata:
  name: special-node
  labels:
    dedicated: special-workload
spec:
  taints:
  - key: "dedicated"
    value: "special-workload"
    effect: "NoSchedule"

Pod Configuration with Tolerations

# our "special-pod"
apiVersion: v1
kind: Pod
metadata:
  name: special-pod
spec:
  containers:
  - name: special-container
    image: nginx
    tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "special-workload"
      effect: "NoSchedule"

In this example:

The Node Configuration:
The node is labeled with dedicated=special-workload and tainted with the same key-value pair and the NoSchedule effect, which means only pods that tolerate that taint can be scheduled on it which is what we see in the Pod Configuration: which includes a toleration matching the node’s taint, allowing it to be scheduled on the designated node.

To conclude the topic of scheduling let now look at our cluster as 1 single computer which has some boundaries which are the accumulative sum of the 4 primitives available on the underlying operating systems, as we plan our clusters we may consider providing labels and taint combinations which represent your business needs and budget, what we should do when planning our environments landscape categorize the services in terms of common / shared which may require their own node-pool to provide services as we discussed in previous chapters related to configuration management/ingress routing.

Managing clusters involves grouping node-pools or node-groups, indicating specific workload requirements, such as disk speed or GPU capabilities. For instance, Kubernetes often has a control plane node group with labels like role=master or role=worker. Adopting this pattern throughout the system, with the help of additional controllers, optimizes cluster management.

DALL-E | continue with the theme emphasizing — **separation of concerns**

Understanding Kubernetes scheduling intricacies is just the beginning. The next logical step is exploring scaling and auto-scaling mechanisms. These concepts will help efficiently manage resources and maintain optimal performance as workloads grow.

Stay tuned for the next chapter, where we’ll delve into strategies for scaling and auto-scaling in Kubernetes, ensuring robust and flexible clusters capable of handling dynamic demands.

Yours sincerely, HP