Speed up pod startup by re-using image layers from other nodes with Spegel

Published in

Linux Shots

5 min readAug 21, 2023

Suppose Kubernetes cluster is having StatefulSet with 20 replicas of the pods. If images used for pod’s container is significantly large, It takes time to pull images on worker nodes. This takes lots of time for all the pods of StatefulSet to come up and running one-by-one.

In other scenario, Suppose one of the worker node become unhealthy and pods running on it got evicted. So pods get re-scheduled on another node, but it again needs to pull the images from container registry through internet which adds to the outage time.

This is where Spegel comes to help. Spegel is a stateless cluster local OCI registry mirror. It enables each node in a Kubernetes cluster to act as a local registry mirror, allowing nodes to share images between themselves. Any image already pulled by a node will be available for any other node in the cluster to pull.

This reduces the startup time of pods and egress traffic to remote container registry as image layers are re-used/pulled from local cluster nodes itself.

How does this work ? How can this be helpful ? Let’s see in below demo.

Source: https://github.com/XenitAB/spegel

Why it works on my machine?

For this demo, I am using:

Kubernetes cluster v1.27.4
Helm v3.10.1
Containerd Runtime v1.6.22 (Spegel is currently supported in Containerd runtime only)
Ubuntu 22.04 LTS for control plane and worker nodes
Spegel v0.0.11

Pre-requisite configurations
Install Spegel on cluster
Test the setup

Pre-Requisite configurations

Spegel is currently only supported with Containerd runtime as it relies on feature of Containerd registry mirroring. We need to set config_path for Containerd mirror configuration. Follow below steps on each cluster nodes:

Login to each nodes (including control plane node)
Update Containerd configuration file with below value.

sudo vi /etc/containerd/config.toml # This is default containerd configuration file

Add update config_path value as per below. By default, Its value would be config_path = ""

[plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"

2. Restart containerd

sudo systemctl restart containerd

Install Spegel

Once Containerd has config_path set for its registry. Follow below steps to install Spegel on Kubernetes cluster.

Create a spegel.yaml file as example below and add the container registries you want to mirror through Spegel.

spegel:
  registries:
    - "https://docker.io"
    - "https://ghcr.io"
    - "https://quay.io"
    - "https://mcr.microsoft.com"
    - "https://public.ecr.aws"
    - "https://gcr.io"
    - "https://registry.k8s.io"
    - "https://k8s.gcr.io"
    - "https://lscr.io"

2. Use below helm command to install Spegel

helm upgrade --create-namespace --namespace spegel --install --version v0.0.11 -f spegel.yaml spegel oci://ghcr.io/xenitab/helm-charts/spegel

3. It deploys a Daemonset resource in spegel namespace. Check for the pods.

kubectl -n spegel get pods

It should create the pods on each nodes. Wait for the pods to come up.

Test the setup

Once Spegel is deployed and its pods are up and running. Lets test this setup and check if it works fine.

I have a pod’s yaml file radarr-pod.yaml to deploy to cluster. It has nodeName set to it to make it schedule on specific name. You may also create a demo file with below contents.

apiVersion: v1
kind: Pod
metadata:
  name: radarr
spec:
  nodeName: worker-node01
  containers:
  - name: radarr
    image: lscr.io/linuxserver/radarr:4.7.5
    ports:
    - containerPort: 7878

Above YAML file will create a pod in worker-node01. Lets apply it on cluster.

kubectl apply -f radarr-pod.yaml

Pod will start coming up on worker-node01. Note the time it took to pull the image. Since this the first time we are deploying the pod on cluster, It will pull the image from its container registry.

In my demo, It took around 93 seconds to pull the image onto worker-node01 and 95 seconds to start the pod.

Now, We will delete this pod and update our YAML file to make it deploy on worker-node02. Below is updated file content:

apiVersion: v1
kind: Pod
metadata:
  name: radarr
spec:
  nodeName: worker-node02
  containers:
  - name: radarr
    image: lscr.io/linuxserver/radarr:4.7.5
    ports:
    - containerPort: 7878

If you noticed, Here nodeName value is changed, so this time pod will be created on worker-node02. Let’s deploy this now. But don’t forget to delete the pods we created before.

kubectl delete pod radarr
kubectl apply -f radarr-pod.yaml

Once, It is deployed on another node, Check the time it takes to pull the image.

You will notice the pod comes up very soon. This is because, Now instead of pulling the image from container registry on internet, It now pulls the image from other node on which image is already present.

In my case in my demo, It took around 2 seconds to pull the image and total 6 seconds to start the pod. This is around 45 times lesser time than it took before when pod was created on worker-node01.

This is because, https://lscr.io registry was added in our configuration when we deployed helm chart. So when first time image was pulled from this registry, It is cached on worker-node01 and served to other nodes.

Now when, We deployed same pod on worker-node02, It didn’t need to pull the image from https://lscs.io registry again and pulled from local cache from another node itself.

This tool is still maturing and continuous development is in progress. This could be very useful tool.

I hope you must have learn something from this article. For more detail on Spegel, You may visit their Github page here: https://github.com/XenitAB/spegel

You can support my work by buying me a cup of coffee on https://www.buymeacoffee.com/linuxshots

Thanks

Navratan Lal Gupta