Properly Running Kubernetes Jobs with Sidecars in 2024 (K8s 1.28+)
Kubernetes has been a great orchestrator of Jobs and CronJobs for over half a decade now, but if you had a need for running proxy containers or other secondary containers alongside the job, running things properly took a bit of work and decision-making to handle gracefully.
This article introduces the easiest way to run Jobs with sidecars using the latest Kubernetes features, and has a complementary repository with complete example manifests you can try in your own cluster. The repository contains all the examples for earlier versions of K8s as well, so make sure to focus on the cronjob.sidecar.*.yaml examples.
Table of Contents
- What are Sidecars and Why is This a Challenge?
- Past Solutions and Caveats
- Native Sidecar Containers in Kubernetes v1.28+
- Using Native Sidecar Containers
- Do they Work?
- Conclusion
What are Sidecars and Why is This a Challenge?
Sidecars are containers running in the same pod as an application container or some other primary container that generally play a supporting role like proxying or helping with observability.
Even though sidecars have been a common and accepted concept used by the Kubernetes community for quite some time now, Kubernetes itself has never been aware of which containers you consider sidecars, or which container you consider to be your primary container. That lack of awareness has meant that you need to manage part of the sidecar life cycle yourself.
The consequences of not handling sidecar shutdown properly can include situations like Jobs stalling indefinitely after completing without ever being marked as successful (generally leading to a timeout and the appearance of failed runs), or CronJobs creating an ever-growing number of pods that cost a fortune or eat up all your cluster’s resources.
Past Solutions and Caveats
Two common solutions are to use pkill to kill the sidecar container’s process, or to mount a shared volume in each of the containers and have a script wrap the sidecar process that can watch for “success” or “failure” to be written to a file and use that as a signal to shut the container down with a matching exit code.
The pkill strategy can work with any sidecar, but it can require running the primary container as root or overriding the user that the sidecar runs as, and it requires pkill to be installed as a dependency in the primary container.
The shared volume mount strategy can work with any container that has a shell available, but it requires overriding the sidecar’s entry point and adding relatively complex logic to both containers’ entry points to handle shutting things down properly.
Regardless of the strategy, managing the containers like this adds complexity and room for error, and it still lacks the ability to control the startup order of the containers for common cases like ensuring the database proxy container has started up before running the application container.
Native Sidecar Containers in Kubernetes v1.28+
Kubernetes describes the previous types of solutions in their release article and lumps the ones I’m talking about here into a common category…
Lifetime of sidecar equal to Pod lifetime: Use a main container that runs alongside your workload containers in the Pod. This method doesn’t give you control over startup order, and lets the sidecar container potentially block Pod termination after the workload containers exit.
…and then goes on to brag about the new feature.
The built-in sidecar feature solves for the use case of having a lifetime equal to the Pod lifetime and has the following additional benefits:
Provides control over startup order
Doesn’t block Pod termination
Here’s a diagram of how things have worked up until now, where we want sidecar behavior, and we talked about one of the containers as a sidecar, but there was no way to indicate to Kubernetes that a container in the pod was a sidecar or a primary container, they’re both just regular containers in the pod.
In Kubernetes v1.28, with the new Sidecar Containers alpha feature, you can declare init containers in a special way where they’re treated as actual sidecars. Kubernetes v1.29, which is available in GKE’s regular channel but not the stable channel quite yet, has the sidecar feature in beta, and it is available by default at that point in any GKE clusters. Based on Google’s docs for the release channels, we should expect this feature to be available in stable clusters any day now.
This new sidecar feature lets us control the startup order of the containers, as well as avoid all of the caveats we had to choose between historically.
It’s not particularly intuitive to configure or obvious if you’re not told the behavior, but just by moving our supporting containers from the containers stanza to the initContainers stanza, and then ensuring restartPolicy: Always is set for those containers (that’s the real secret), we unlock native sidecars. This is a one-size fits all solution and results in simpler and more secure Kubernetes manifests and Dockerfiles (depending on the strategy you chose previously) since you no longer have to handle the concern yourself.
Using Native Sidecar Containers
Here’s an example from my original article, updated to use the new pattern. haproxy is configured in the initContainers section and given a restartPolicy of Always. That’s all you have to do for things to work properly with the new feature.
To appreciate the simplicity of this more, we can compare an example of an older solution with the native sidecar solution. Here’s a handful of steps you’d need to take to migrate from the old strategy to native containers:
- remove
securityContextrunAsconfig - remove
shareProcessNamespaceconfig - remove crazy shell script logic that overrides and wraps the original command
- remove
procps(pkill) install from the Dockerfile
Do they Work?
Here’s the response from a kubectl describe on a job pod that’s using the sidecar feature. Note that the sidecar, haproxy, started at 19:55:00, and then the primary container started at 19:55:01 and finished within the same second (still at 19:55:01), before the haproxy container finally finishes at 19:55:02, demonstrating ordered startup, and that the sidecar was available throughout the primary containers lifetime.
Name: successful-cron-sidecar-28471855-29gr9
Start Time: Sun, 18 Feb 2024 19:55:00 -0700
Status: Succeeded
Init Containers:
haproxy:
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 18 Feb 2024 19:55:00 -0700
Finished: Sun, 18 Feb 2024 19:55:02 -0700
Ready: False
Restart Count: 0
Containers:
primary:
Command:
/bin/bash
-c
Args:
echo "Pretend this is real logic"
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 18 Feb 2024 19:55:01 -0700
Finished: Sun, 18 Feb 2024 19:55:01 -0700
Ready: False
Restart Count: 0Looking at an intentionally failing job is also interesting. This job was configured with a backoff limit of 2, and you can see the Restart Count of the primary container reaching 2 in the following snippet as it sits in Terminated state. The pod Status is Terminating already, even though the haproxy container is still Running. The Events at the very bottom of the snippet show that haproxy is about to be stopped now that the primary container has reached its backoff limit and the job is considered a failure.
Native sidecars work!
Name: failing-cron-sidecar-28471854-tgl9m
Start Time: Sun, 18 Feb 2024 19:54:00 -0700
Status: Terminating (lasts <invalid>)
Init Containers:
haproxy:
State: Running
Started: Sun, 18 Feb 2024 19:54:01 -0700
Ready: True
Restart Count: 0
Containers:
primary:
Command:
/bin/bash
-c
Args:
cat /not/a/real/file/so/this/will/error
State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 18 Feb 2024 19:54:22 -0700
Finished: Sun, 18 Feb 2024 19:54:22 -0700
Ready: False
Restart Count: 2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 24s kubelet Container image "haproxy:2.6" already present on machine
Normal Created 24s kubelet Created container haproxy
Normal Started 23s kubelet Started container haproxy
Normal Pulled 2s (x3 over 23s) kubelet Container image "ubuntu:22.04" already present on machine
Normal Created 2s (x3 over 23s) kubelet Created container primary
Normal Started 2s (x3 over 23s) kubelet Started container primary
Warning BackOff 1s (x4 over 21s) kubelet Back-off restarting failed container primary in pod failing-cron-sidecar-28471854-tgl9m_default(3242ba44-2153-4818-863b-b8117941b0bf)
Normal Killing 0s kubelet Stopping container haproxyConclusion
The new sidecar feature in Kubernetes makes it painless to run Jobs and CronJobs with sidecars. The caveats and complexities we used to have to choose between can all be deleted and replaced with this feature, although it’s not recommended that you use it in it’s alpha state, so make sure you’re on Kube 1.29+ before you switch over (and even then, be aware its still a beta feature).
You can use the examples in the complementary repo to test failure and success conditions using this new native sidecar strategy.
