Managing Resource Request Failures in Kubernetes
In the realm of large enterprises, container orchestration environments are commonly centrally managed. To facilitate the execution of containerized workloads, enterprise users are often allocated constrained namespaces with limited access.
These clusters and namespaces are typically configured with restricted capacities and resource quotas to ensure efficient resource utilization and maintain control over resource allocation.
What are quotas and why resource management is important?
The quotas serve as a mechanism that imposes constraints on aggregate resource consumption per namespace. These limits help maintain resource fairness and prevent excessive usage within individual namespaces, ensuring a balanced and predictable environment within the Kubernetes cluster.
A miscalculated sizing can have lots of implications, some of them hidden in most cases, and even sometimes can be a limiting factor while planning for a resilient architecture and speed of recovery.
Quotas are key to protecting tenant workloads, managing load growth and ensuring the continuity of service. It stands to reason that the quota calculation should follow a rigorous process that accounts both for steady-state operation and various contingencies.
Have you carefully assessed all aspects when determining capacity needs?
It’s a common mistake to oversimplify calculations for the required capacity or resource quotas to accommodate what we perceive as our needs. However, various complex factors must be considered, including required replicas, the cluster topology, and the number of available nodes, all of which significantly influence capacity requirements.
To get a summary of resource usage and limits for all nodes in the cluster:
kubectl top nodes
To get a summary of resource usage and limits for all pods in a namespace:
kubectl top pods
To list all resource quotas in the current namespace:
kubectl get resourcequota
To describe a specific resource quota to see its details:
kubectl describe resourcequota <resource_quota_name> -o yaml
Is it defining quotas and assessing capacity complex?
It can be and depends on several factors, the availability of resources, the project budget, high availability, resilience and requirements on recovery such as RTO, RPO and also SLA.
In the specific case of setting quotas, it’s important to note that you can achieve this not only with a single manifest file but also create several different quotas segregating them by resource types. Below is an illustrative single-file example of a manifest file configuring ResourceQuotas:
apiVersion: v1
kind: ResourceQuota
metadata:
name: example-quota
namespace: default
spec:
hard:
pods: "10" # Maximum number of pods allowed
requests.cpu: "4" # Maximum total CPU request in CPU units
requests.memory: "8Gi" # Maximum total memory
limits.cpu: "8" # Maximum total CPU limit in CPU units
limits.memory: "16Gi" # Maximum total memory limit
configmaps: "5" # Maximum number of ConfigMaps
persistentvolumeclaims: "3" # Maximum number of PersistentVolumeClaims
services: "5" # Maximum number of Services
secrets: "10" # Maximum number of Secrets
At any time you can describe your quotas to understand the current usage:
kubectl describe resourcequota example-quota
Name: example-quota
Namespace: default
Resource Used Hard
-------- ---- ----
configmaps 1 5
limits.cpu 3 8
limits.memory 10 16Gi
persistentvolumeclaims 1 3
pods 3 10
requests.cpu 2 4
requests.memory 4Gi 8Gi
secrets 7 10
services 3 5
Consider a scenario where high availability (HA) takes precedence in decision-making, exerting significant influence on architectural considerations. In this use case, we are tasked with operating a distributed system to manage our workloads, comprising deployments, statefulsets, pods, and other components. Underlying this setup is a topology spanning three data centres, each housing a single node.
Let’s explore a few illustrative scenarios to understand the potential challenges. For simplicity, we’ll assume uniform capacity across all nodes within each data centre.
Scenario 1: Rolling restarts
Picture a scenario where you’ve allocated capacity without leaving room for additional workloads.
If one or more workloads (e.g., Deployments, StatefulSets) undergo a Rolling Restart or a simple Restart, they may remain Terminating for a few seconds, even as their controllers have already initiated the Starting process. During this period, both the Terminating and Starting Pods will utilize their full available capacity, leading to insufficient resources for all of them to start simultaneously.
Consequently, workloads must wait for resource availability before proceeding.
The impact would likely be negligible on pods with a few million CPU millicores. However, in high CPU and RAM workloads, the impact would certainly be noticeable.
If high availability is a critical requirement, it’s advisable to ensure that each data centres/nodes has sufficient resources allocated. This ensures that workloads can restart promptly without waiting for resource availability.
For instance, if you deploy a workload that requests more resources than what is available you will get:
kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
huge-load 0/1 1 0 3m52s
kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
2s Warning FailedScheduling pod/huge-load-556679f48f-kjghj 0/3 nodes are available: 3 Insufficient cpu, 3 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
0s Normal NotTriggerScaleUp pod/huge-load-556679f48f-kjghj pod didn't trigger scale-up:
3s Normal SuccessfulCreate replicaset/huge-load-556679f48f Created pod: huge-load-556679f48f-kjghj
3s Normal ScalingReplicaSet deployment/huge-load Scaled up replica set huge-load-556679f48f to 1
Scenario 2: Failures and Scheduling to another Node
In the event of a node or an entire data centre failure, certain workloads may be configured to be scheduled to another node automatically, especially if no affinity or topology-related resources are specified. However, if there is insufficient quota or capacity available in the remaining data centres, these workloads will remain Pending until resources become available.
Furthermore, the loss of a data centre or node also entails the loss of capacity provided by the failing nodes. For instance, in this scenario, the failure results in the loss of quotas for three workloads.
Depending on your use case you should plan to have enough capacity in each data centre, or some kind of elasticity automation to be able to accommodate the remaining workloads when necessary.
Scenario 3: Total vs Available Capacity
Consider a scenario where you possess adequate quotas or capacity to initiate a workload deployment. However, despite meeting the overall resource thresholds, the deployment enters a Pending state due to insufficient quotas or capacity to commence.
Upon inspecting the associated events, you may discover that no individual node meets the configured constraints for required CPU, memory, or other resources.
Even though the aggregate quotas seem adequate, the absence of a single node possessing the exact resource combination aligned with your topology configurations impedes the deployment process.
This scenario is frequently encountered in the event of a node or data centre, failure. Initially allocated quotas may appear adequate, but when such failures occur, our quotas may be reduced, revealing their insufficiency.
Always bear in mind that the total available quotas are the aggregate of quotas across each data centre/node. Workloads demanding substantial resources might exceed the capacity of a single node, making the total available quota deceptive.
Depending on the scenario, ensure that in the event of a data centre, failure, you have sufficient capacity in the remaining data centres, to accommodate the required workload.
Scenario 4: Sequence of deployment
The sequence of deployments significantly impacts resource availability.
When examining related events, it’s crucial to consider not only quotas but also factors like labels, topology spread constraints, node selectors, affinity, and anti-affinity. You may find that no single node has sufficient CPU, RAM, or other resources.
By strategically deploying resource-intensive workloads first, followed by less demanding ones, you can optimize resource utilization and mitigate deployment challenges.
It’s crucial to remember that large workloads can introduce unforeseen challenges, necessitating ongoing evaluation of deployment sequences. In instances of limited capacity, large workloads toward the end of deployment stages may encounter placement constraints.
In situations of limited capacity, smaller workloads toward the end of deployment stages may prove more efficient, as they can readily fit within any available node.
More scenarios?
When working with multiple low-capacity nodes, or nodes with varying capacities, which collectively contribute to a significant available Resource Quota or capacity. However, when faced with heavy workloads, the scheduler may encounter difficulty assigning a suitable node or achieving a balanced deployment topology.
As the number of nodes and large workloads increases, planning efforts may become more challenging due to the complexity of resource allocation and workload distribution.
Conversely, in scenarios with numerous small-footprint workloads, irrespective of the node count and their capacities, the scheduler can efficiently assign them to available nodes without encountering significant deployment hurdles.
Remember and Plan
As we have seen there are many occasions when we might have enough capacity to run our workloads right, but then something changes (planned or not), and we no longer can run them.
Remember that those scenarios could happen in combination between them and also along several other factors for example:
· Resources configuration (requests and limits).
· Affinity and Anti-Affinity configuration.
· Topology constraints.
· Nodes Capacities.
· Workers and DC failures.
· Planned maintenance events.
· Scheduling policies.
There are too many variables that can impact the scheduling of the workloads.
Setting or requesting the right capacity would depend on your specific use cases, your HA requirements, budget, resource availability, recovery requirements, topology and so on.
The key takeaway is to remain aware of these scenarios and understand the circumstances in which they may occur. These scenarios should not be viewed as isolated examples; they are bound to happen, often more frequently than anticipated. Instead, they should serve as a guide for future capacity and resiliency planning, helping to prevent or, in the worst-case scenario, troubleshoot issues related to quota and capacity.
And I cannot stress enough, look at the events.