Optimising Pod Startup Time in Kubernetes: Eliminating Provisioning Delays with Extra Node Capacity [Part II]
Welcome back to the second part of our series on speeding up Kubernetes pod startup times. In the first part, we delved into the challenges and creative strategies to improve pod startup performance. If you missed it, you can catch up on it here.
Today, we’ll tackle the issue of pod startup delays caused by node provisioning. In fact, we’ll show you how to eliminate these delays entirely! The key benefit is the immediate scheduling of pods, ensuring your applications run smoothly and efficiently.
The Challenge
In high-traffic scenarios, the following sequence of events occurs, highlighting the challenge we’re addressing:
- Kubernetes Horizontal Pod Autoscaler (HPA) detects increased load and triggers the creation of additional pods to handle the surge in traffic.
- Scheduler then attempts to place these new pods onto existing nodes within the cluster.
- If the current nodes lack the necessary resources or capacity, the scheduler is unable to fit the new pods onto these nodes.
- This triggers the Cluster Autoscaler to provision additional nodes to accommodate the new pods.
The provisioning of new nodes introduces a delay in the scheduling of pods. This delay can lead to performance bottlenecks and service disruptions during high traffic periods, impacting the overall user experience.
Understanding PodPriorityClass
Before diving into our solution, it’s essential to understand PodPriorityClass, a crucial feature in Kubernetes that will play a vital role in our approach.
PodPriorityClass is a mechanism that allows you to assign priority levels to different pods in your cluster. This is particularly useful when you need to ensure that more critical workloads receive resources and scheduling precedence over less critical ones. The Kubernetes scheduler uses these priority levels during the scheduling process. Here’s a quick overview of how it works:
- Priority Value: Each PodPriorityClass has a unique integer value that determines its priority level. Higher values indicate higher priority.
- Preemption: Higher-priority pods can preempt lower-priority pods, meaning they can force lower-priority pods to be evicted to free up resources.
- Usage: You define a PriorityClass resource and then reference it in your pod specifications
PriorityClass Example
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 100
globalDefault: false
description: "This priority class should be used for P0 service pods only."
Once the `PriorityClass` is defined, you can assign it to a pod like this:
apiVersion: v1
kind: Pod
metadata:
name: critical-app
spec:
priorityClassName: high-priority #<-----
containers:
- name: my-container
image: my-image
# other container specs
Here’s the Solution: Using Extra Pods and Priority Settings
To tackle delays when there’s a lot of traffic, we can overprovision nodes using extra pods with a lower priority. This smart move makes sure there’s always some spare capacity ready, so real workload pods can be scheduled immediately. Here’s how it works:
- Lower Priority Dummy Pods: We create extra dummy pods with a lower priority.
- Right-sized Requests: By using
nodeSelector
and resourcerequests
, we make sure each extra pod provisions a new node, meaning the requests of pods is equal or near equal to the node allocatable capacity. - Quick Eviction for Real Pods: If a real pod needs to be scheduled and a node is busy, the higher-priority real pod will bump out the lower-priority dummy pod. This way, you get an empty node and the real pod gets scheduled right away.
- Automatic Node Provisioning: When a dummy pod gets evicted, it will go into a pending state, triggering the Cluster Autoscaler to add another node. This keeps our extra capacity in place.
Enough theory — Time to Deploy!
- Create a low priority class
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: -100
globalDefault: false
description: "This priority class should be used for dummy pods only."
2. Create a dummy deployment with low prioirity class attached
apiVersion: apps/v1
kind: Deployment
metadata:
name: dummy-overprovisioning-amd
namespace: microservices
spec:
replicas: 2
selector:
matchLabels:
app: overprovisioning-amd
template:
metadata:
labels:
app: overprovisioning-amd
spec:
containers:
- name: pause
image: k8s.gcr.io/pause:3.2
resources:
requests:
cpu: '13'
nodeSelector:
node.kubernetes.io/instance-type: c6a.4xlarge
priorityClassName: low-priority
Here we have created a deployment of 2 replicas with requests as 13 cpu such that it occupies whole c6a.4xlarge machine
3. Create your Deployment of Critical Service
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
resources:
requests:
cpu: '2'
nodeSelector:
node.kubernetes.io/instance-type: c6a.4xlarge
As the nginx-app is scaled and deployed, two additional pods are initiated. These pods initially enter a pending state. Given that the newly created nginx-app pod possesses a higher priority (default priority is 0, which supersedes our low-priority class), the existing lower-priority pod (dummy-app) is evicted to accommodate the nginx-app pod. Consequently, the newly created nginx-app pods are allocated to the node previously occupied by the dummy-app pod.
With the dummy-app pod now in a pending state and no available node for its deployment, Karpenter provisions an additional node. Upon the readiness of the newly added node, the dummy-app pod is then scheduled onto it.
Summary
Wrapping up, adding extra node capacity and prioritizing tasks in Kubernetes tackles the issue of pod startup delays effectively. This forward-thinking approach makes sure that high-priority pods are scheduled quickly, reducing service interruptions and keeping performance at its best. By using lower-priority dummy pods, we keep a reserve of resources ready to handle real workloads right away. This strategy not only simplifies operations but also sets your applications up for success in a fast-paced and demanding environment.