Optimising Pod Startup Time in Kubernetes: Unveiling Challenges and Solutions [Part I]

Abhinav
4 min readJul 22, 2024

--

Have you ever been frustrated by the slow startup times of your Kubernetes pods? You’re not alone. In the first part of our series, we will explore this common issue, focusing on how prolonged pod startup times can impact deployment speed and scalability. We will also look at various strategies to reduce these delays. In later parts, we will examine each strategy in detail. Let’s begin this journey together, step by step.

Causes of High Startup Time

Deploying applications in Kubernetes involves multiple stages before a pod is ready to handle traffic. The scheduler must identify the appropriate node, the Kubelet communicates with the CRI, the image is pulled from the repository, and the application itself takes time to start and pass the readiness probe. From a DevOps perspective, optimizing image pulling and node provisioning is crucial since developers manage the application code.

Impact of Node Provisioning

In some instances, the scheduler may need to provision a new node if existing ones lack the necessary resources for the pod to be scheduled. This provisioning process can add about 2 minutes of overhead, further delaying pod startup.

Options to Reduce Node Provisioning Time

  1. Switch to Karpenter: If you’re using an ASG in EKS to autoscale your node pool, consider switching to Karpenter, this will reduce node provisioning time from 2 minutes to 30 seconds.
  2. Provision Extra Node Capacity: You can also eliminate provisioning time by maintaining extra node capacity. While this approach incurs a slight cost overhead, it allows for instant pod scheduling. This can be achieved using PodPriorityClasses and dummy pods. We will explore this technique in detail in the next part of this series.

Slow Image Pull Time
Once the pod is scheduled the next step is to pull the image from the respective repository. This is generally the most time consuming task in the pod startup journey. Obviously the time increases with large images.

Options to Speed up ImagePull time

  1. ImagePull Policy: This dictates when the container image will be pulled while creating the container. Setting it to IfNotPresent seems to be the best option as it will prevent unnecessary pulling of images and will utilize nodes cache if possible.
  2. Slimmer Images: The larger the image size more time it will take us to download all the layers and extract them. We can reduce our images by choosing a slim base image like alpine/slim. Also utilizing multi-stage build wherever possible.
  3. Enable Parallel Image Pull: When a node is provisioned, the kubelet needs to download many images for the scheduled pods, typically one after another. Since version 1.27, you can change this behavior by setting serializeImagePulls to false, allowing for parallel downloads
  4. Pull-through Cache: These caches can significantly speed up image pull times by storing images locally on Kubernetes. When an image is requested, the cache checks its local storage first. If the image is there, it’s delivered directly from local storage, saving time compared to fetching it from a remote repository. Harbor is a popular pull-through cache.

5. P2P Based Caching: Peer-to-peer (P2P) image registries, like Kraken and Dragonfly, improve image pull times by distributing image layers across multiple nodes. When an image is needed, its layers are downloaded concurrently from several peers instead of a single central server, speeding up the process.

Summary
In the first part of our series on speeding up Kubernetes pod startup times, we explore the common problem of slow pod launches and their impact on how quickly we can deploy and scale applications. We identify key issues like scheduling, setting up nodes, and fetching images. Strategies to fix these delays include switching to Karpenter, adding more node capacity, improving how we pull images, using smaller images, fetching images in parallel, using pull-through caches, and adopting P2P caching. Future articles will go into more detail on these strategies.

--

--