Cut Container Startup Time for Better Performance and Costs — Part1

Published in

Google Cloud - Community

11 min readFeb 7, 2024

In today’s world, containerized applications are the norm, and Kubernetes orchestrates them at scale — a fact well captured by the CNCF Annual Report. To maintain smooth user experiences during demand surges, containers must spin up quickly. That’s why optimizing startup time is crucial. Slow startups hurt user experience and can drive up cloud costs, sometimes even significantly.

*Horizontal Pod Autoscaling, Vertical Pod Autoscaling, GKE Cluster Autoscaler, and GKE Node Auto-Provisioning*

While zero startup time may be a dream, a focus on speed offers significant gains. In this two-part series, we’ll dive into optimization tactics across infrastructure, container design, and even code-level tweaks. Get ready!

· GKE Cluster Optimization
∘ GKE Node Pool Machine Type
∘ GKE Storage Subsystem
∘ GKE VPC-native Cluster
∘ Avoid Spot VMs
∘ GKE Cluster Autoscaling Profile
∘ Image Streaming: Shaving Seconds Off Startup Time
· GKE NodePool Considerations
∘ Let GKE Handle the Scaling for You
∘ COS: Leaner Images, Faster Startups
∘ Turbo Boost for Pods
· Docker Image Optimization
∘ Image Size and Layers
∘ Choosing the Right Compression: Speed vs. Size
· Code-Level Optimization
∘ Health Checks Readiness
∘ Application Efficiency
∘ Startup Dependencies
· Next Steps

GKE Cluster Optimization

This first section covers the specific optimizations for the GKE Cluster.

GKE Node Pool Machine Type

Choosing the right machine type for your GKE cluster directly impacts performance and cost. It’s more than just CPU cores and memory; newer CPU architectures from Google Cloud (like C3 and C3D with integrated IPUs, respectively based on Intel Sapphire Rapid and AMD Genoa) can significantly boost the speed* of specific workloads. These cutting-edge architectures typically perform better, but may come with a higher price tag.

Deciphering the perfect machine type is complex — I cover this in depth in my SPEC CPU series. For now, remember: choosing the right architecture requires balancing performance needs with budget constraints.

Google Cloud provides a one-pager reporting the CPU used in all Machine Type, allowing the user to make an informed decision.

CPU platforms | Compute Engine Documentation | Google Cloud

Whether your business is early in its journey or well on its way to digital transformation, Google Cloud can help solve…

cloud.google.com

*I know that speed is a reductive, abused, and generic therm, but this is not a SPEC nor a CPU deep-dive 😫 please bare with me 😬

GKE Storage Subsystem

GCE has several type of persistent and non-persistent storage device. Local SSD is an ephemeral storage media, priced similar to Balanced-PD, that deliveries NVMe-like throughput and latency. Consider the following table:

The table highlights the stark performance advantage of Local SSDs over Balanced PDs. They offer 3x the throughput and a staggering 35x increase in IOPS, all with vastly reduced latency. This translates into smoother operations and faster Pod startup times.

Important Note: Local SSDs provide ephemeral storage. To leverage them in GKE there is a slightly different syntax to use between 1st or 2nd GCE Gen and 3rd GCE Gen. Such option ensure temporary items like emptyDir volumes, container-writable layers, and images reside on the Local SSD.

COS Filesystem layout with ephemeral NVMe used

Tips and tricks to reduce cold start latency on GKE | Google Cloud Blog

Google Kubernetes Engine supports multiple techniques to reduce cold start latency, so you can deliver responsive…

cloud.google.com

GKE VPC-native Cluster

In a VPC-native cluster, Pods receive IP addresses directly from your VPC’s subnet range. This architecture eliminates the need for the custom routes and NAT overhead found in traditional routes-based GKE clusters. The result? A quicker, more efficient process for adding new nodes to the cluster, which directly contributes to faster Pod startup times.

VPC-native clusters | Google Kubernetes Engine (GKE) | Google Cloud

Whether your business is early in its journey or well on its way to digital transformation, Google Cloud can help solve…

cloud.google.com

Avoid Spot VMs

Spot VMs (formerly Preemptible VMs) offer deep discounts of up to 91% on Compute Engine instances. However, remember they come with no availability guarantees and can be terminated at any time. This makes them suitable for fault-tolerant workloads, but expect potential delays during Pod startups due to restarts. The unpredictability of node reliability, especially during rapid scaling, further compounds the situation.

Picture a conveyor belt: new Pods are added due to increasing demand, but unreliable Spot VMs simultaneously fall off the other end. This dynamic ebb and flow is the reality of working with Spot VMs.

When the user experience is at the core, I do not recommend them.

Spot VMs | Google Kubernetes Engine (GKE) | Google Cloud

This page explains what Spot VMs are and how they work in Google Kubernetes Engine (GKE).

cloud.google.com

GKE Cluster Autoscaling Profile

Okay, the rush is over, time to save money by removing unused nodes… right? Not so fast cowboy! Deciding when to scale down your cluster involves choosing between optimal resource usage and lightning-fast responsiveness for new workloads. The GKE Autoscaling Profiles comes to the rescue.

balanced: Prioritizes maintaining a buffer of resources for quick Pod startup. This is the default profile.
optimize-utilization: Aggressively maximizes resource usage by removing nodes. This saves costs, but new Pods might face delays waiting for resources to be spun up again.

The balanced profile generally strikes a good compromise, ensuring those sudden traffic spikes won’t leave your users waiting. However, if cost optimization is your primary concern, optimize-utilization might be the way to go.

Note: Don’t forget that Pod scheduling is complex! Even with aggressive node removal, GKE still tries to evenly distribute your workloads across available nodes.

How to tune cluster autoscaler in GKE by Anthony Bushong

Image Streaming: Shaving Seconds Off Startup Time

We’ll dive into optimizing Docker images shortly, but sometimes even our best efforts don’t make them as lean as possible. That’s where image streaming comes in! GKE can selectively pull only the data needed to immediately start your container from eligible images. This means your applications don’t have to wait for the entire image download, dramatically decreasing startup times.

When Image Streaming was introduced Ihor Leshko, Senior Engineering Manager at Databricks reported “With image streaming on GKE, we saw a 3X improvement in application startup time, which translates into faster execution of customer workloads, and lower costs for our customers by reducing standby capacity.”

Introducing container image streaming in GKE | Google Cloud Blog

New container image streaming in Google Kubernetes Engine slashes the time it takes to boot your applications.

cloud.google.com

All the details, requirements, and the few limitations around Image Streaming are available at the following link:

Use Image streaming to pull container images | Google Kubernetes Engine (GKE) | Google Cloud

Whether your business is early in its journey or well on its way to digital transformation, Google Cloud can help solve…

cloud.google.com

GKE NodePool Considerations

This second section covers the specific optimizations for the GKE NodePool.

Let GKE Handle the Scaling for You

GKE’s cluster autoscaler is your backstage crew, resizing your node pool based on your application’s needs. No more manual adjustments or wasted resources from over-provisioning! Simply set your minimums and maximums, and the autoscaler handles the rest. It’s automagic 🤩

Let’s See it in Action:

gcloud container clusters create example-cluster \
    --num-nodes=2 \
    --zone=us-central1-a \
    --node-locations=us-central1-a,us-central1-b,us-central1-f \
    --enable-autoscaling --min-nodes=1 --max-nodes=4

This creates a multi-zonal cluster starting with six nodes (two per zone). Your cluster can shrink to three nodes (one per zone) or expand to twelve, intelligently handling demand surges. Even if a zone fails, the cluster gracefully scales within two to eight nodes. Auto-scaling ensures resources are always there to meet demand without overspending!

About cluster autoscaling | Google Kubernetes Engine (GKE) | Google Cloud

Learn how the GKE cluster autoscaler works.

cloud.google.com

COS: Leaner Images, Faster Startups

Employ an OS like COS (Container-Optimized OS) for smaller footprint and specifically designed to run your containers efficiency — ruthlessly cutting unnecessary clutter for enhanced performance. Here’s why it matters:

Built to run only containers: COS instances come pre-installed and pre-tuned for containerd runtime.
Smaller attack surface: COS has a smaller footprint than any general-purpose OS like Ubuntu, reducing the instance’s potential attack surface, its optimization is aimed at running only on GCE and containers, period. The OS is also immutable, preventing an attack to make permanent local changes.
Locked-down by default: Given its narrow focus, COS has lots of security features turned on by default such as IMA, KPTI, some LSMs from Chromium OS, and seccomp and AppArmor. The config is also stateless and generally feature security-minded defaults.
Automatic Updates allow for timely delivery of security patches.
GKE and COS go hand-in-hand: Google tests a specific COS version against GKE to guarantee compatibility and maximum stability.

Container-Optimized OS Overview | Google Cloud

Container-Optimized OS from Google is an operating system image for your Compute Engine VMs that is optimized for…

cloud.google.com

As a former Red Hatter, I witnessed the power of minimalism OSes firsthand with the acquisition of CoreOS. COS brings this KISS — Keep It Small — philosophy to GKE.

COS stays actively maintained by Google — check out their changelog for proof! This focus on constant updates, along with its tiny footprint (under 1GB), ensures lightning-fast node spin-up.

Turbo Boost for Pods

Traditionally, changing the CPU or memory resources assigned to a Kubernetes Pod required a restart, potentially disrupting running workloads. Kubernetes v1.27 introduces a powerful new feature allowing in-place resizing of CPU and memory allocations. In-place Resource Resize unlocks valuable optimization opportunities.

Many applications experience higher resource demands during startup due to loading dependencies, initializing connections, or warming up caches. Imagine to apply the In-place Resource Resize during Pod’s startup, mirroring concepts like Cloud Run CPU boost or Turbo Boost in modern processors. Introducing Kube Startup CPU Boost 🤯

With this new feature, you can now seamlessly scale up the Pod’s resources during the initial critical phase. The result? Faster startup times and improved user experience.

GitHub - google/kube-startup-cpu-boost: Kube Startup CPU Boost is a controller that increases CPU…

Kube Startup CPU Boost is a controller that increases CPU resource requests and limits during Kubernetes workload…

github.com

Docker Image Optimization

This third section covers the specific optimizations for the Docker image.

Image Size and Layers

When building container images, it’s easy to accidentally include unnecessary files (like build-time dependencies and intermediate files) that bloat the image size. This increases transfer times and costs. The most common techniques includes:

Usage of Multi-Stage Builds.
Selecting pre-optimized base image like Alpine, worth only few MB.
Consolidating Docker Layers;
Cleaning up any temporary artefacts;
Getting rid of any unnecessary files;

Check out Kartheek Gottipati’s detailed Medium post.

Building Lean Containers: Advanced Techniques for Docker Image Optimization

Docker has revolutionized how we develop and deploy applications. It provides a standardized way to package an…

medium.com

Choosing the Right Compression: Speed vs. Size

When optimizing your Docker images, consider the compression algorithm carefully. ZStandard offers an excellent balance of strong compression (smaller image size) with remarkably fast compression and decompression speeds. For a pure focus on speed, lz4 remains the fastest option, but expect slightly larger file sizes.

Smaller and faster data compression with Zstandard

Fun fact, both lz4 and zstd come from the same person, Yann Collet, and he is guest of honor at CoRecursive’s interview about data compression.

Code-Level Optimization

This fourth and last section covers the specific optimizations at code level.

Health Checks Readiness

The Readiness probe determines when traffic is routed to a Pod. Avoid checking non-critical dependencies in the readiness probe. Focus on whether the app can serve its core functionality. Unrealistic probes unnecessarily extend the time before a Pod goes “ready”. We could also mention the liveness probe which tells Kubernetes when to restart a Pod. Configure it to accurately reflect whether your application is truly alive and able to handle traffic. An overly sensitive liveness probe causes restarts and suboptimal UX.

Configure Liveness, Readiness and Startup Probes

This page shows how to configure liveness, readiness and startup probes for containers. The kubelet uses liveness…

kubernetes.io

Application Efficiency

My code-slinging days might be a distant memory, but there are some optimization wisdoms that simply never get old:

Profiling: visibility through profiling is the vital first step towards meaningful system optimization. Check out the masterful works of Brendan Gregg’s. This principle has guided my work since my Red Hat days, where I focused on zero packet loss and real-time tuning to ensure maximum system performance. In the profiler Swiss knife there are a variety of tools (e.g. eBPF, perf, ftrace, pprof for Go, line profilers, performance monitors) to pinpoint the most time-consuming functions within your application’s initialization phase. Perhaps nowadays such tools are built directly into the IDE. Target these bottlenecks for optimization. GCP has at its disposal Cloud Tracer and Cloud Profiler that can and will truly make a difference.

Algorithmic Optimization: Review the algorithms used in your initialization logic. Opt for more efficient data structures and algorithms whenever possible to reduce processing time.
Caching: Introduce caching mechanisms (in-memory or via a fast store like Redis or Memcached) for frequently accessed data or results of compute-intensive operations during startup. This avoids recalculations.
Asynchronous Initialization: Break down your startup process. Identify tasks that can be initialized asynchronously after the essential functionality of the Pod is available. This allows the Pod to become “ready” faster.

Startup Dependencies

Several optimization can be done when coding your application to make sure it goes up and running as quickly as possible:

Dependency Analysis: Thoroughly analyze all dependencies (external libraries, services) pulled in during startup. Identify those critical for initial functionality vs. those needed later.
Lazy Loading: For any non-critical dependencies, defer their loading or initialization to a later stage after the Pod signals readiness. This streamlines the initial startup process.
External Service Readiness: If reliant on external services (databases, APIs), don’t simply retry connections in a loop. Implement exponential backoff mechanisms and reasonable timeouts to avoid extended initial delays.
Dependency Optimization: For in-house code and services, use the same optimization principles (profiling, caching, async) within those services to prevent cascading delays.

Next Steps

Fast container startup speeds are essential for smooth user experiences and cost control in Kubernetes environments. This post dove into a spectrum of optimizations available. We focused on GKE cluster tuning, image design, and lightly touched on code-level efficiency. Stay tuned for the next post where we’ll delve into real-world performance gains of applying these techniques!

Cut Container Startup Time for Better Performance and Costs — Part1

GKE Cluster Optimization

GKE Node Pool Machine Type

CPU platforms | Compute Engine Documentation | Google Cloud

Whether your business is early in its journey or well on its way to digital transformation, Google Cloud can help solve…

GKE Storage Subsystem

Tips and tricks to reduce cold start latency on GKE | Google Cloud Blog

Google Kubernetes Engine supports multiple techniques to reduce cold start latency, so you can deliver responsive…

GKE VPC-native Cluster

VPC-native clusters | Google Kubernetes Engine (GKE) | Google Cloud

Whether your business is early in its journey or well on its way to digital transformation, Google Cloud can help solve…

Avoid Spot VMs

Spot VMs | Google Kubernetes Engine (GKE) | Google Cloud

This page explains what Spot VMs are and how they work in Google Kubernetes Engine (GKE).

GKE Cluster Autoscaling Profile

Image Streaming: Shaving Seconds Off Startup Time

Introducing container image streaming in GKE | Google Cloud Blog

New container image streaming in Google Kubernetes Engine slashes the time it takes to boot your applications.

Use Image streaming to pull container images | Google Kubernetes Engine (GKE) | Google Cloud

Whether your business is early in its journey or well on its way to digital transformation, Google Cloud can help solve…

GKE NodePool Considerations

Let GKE Handle the Scaling for You

About cluster autoscaling | Google Kubernetes Engine (GKE) | Google Cloud

Learn how the GKE cluster autoscaler works.

COS: Leaner Images, Faster Startups

Container-Optimized OS Overview | Google Cloud

Container-Optimized OS from Google is an operating system image for your Compute Engine VMs that is optimized for…

Turbo Boost for Pods

GitHub - google/kube-startup-cpu-boost: Kube Startup CPU Boost is a controller that increases CPU…

Kube Startup CPU Boost is a controller that increases CPU resource requests and limits during Kubernetes workload…

Docker Image Optimization

Image Size and Layers

Building Lean Containers: Advanced Techniques for Docker Image Optimization

Docker has revolutionized how we develop and deploy applications. It provides a standardized way to package an…

Choosing the Right Compression: Speed vs. Size

Code-Level Optimization

Health Checks Readiness

Configure Liveness, Readiness and Startup Probes

This page shows how to configure liveness, readiness and startup probes for containers. The kubelet uses liveness…

Application Efficiency

Startup Dependencies

Next Steps

Written by Federico Iezzi