Properly Running Kubernetes Jobs with Sidecars

Published in

TeamSnap Engineering

6 min readNov 5, 2022

2024 update: If you’re running Kubernetes v1.28+, there’s Kubernetes-native sidecar support to leverage that makes this much easier. Check out the new article here.

Kubernetes Jobs and CronJobs are a great way to run some arbitrary code in a K8s environment.

They’re configured very similarly to Deployments but instead of running indefinitely* like a web server would, they’re intended to execute some business logic and then shut down. Example use cases include running database migration code on-demand (Job) or billing processing code at midnight on Fridays (CronJob).

Jobs and CronJobs are straight forward and convenient for single-container jobs, but they quickly become complex when you add in indefinitely-running sidecars, such as Google’s Cloud SQL Proxy for database connectivity.

If you try this naively, you’ll end up with your primary container completing, but the sidecar container (and thus the Job pod as a whole) running forever without being marked properly as a success or failure. This becomes increasingly problematic with CronJobs, because K8s will keep adding new Pods on the given schedule without ever removing old ones, which will eventually take your cluster to its autoscaling limits and cost you a fortune.

CronJob pods accumulating due to a sidecar running forever

Prerequisite Knowledge

This article focuses on CronJobs because CronJobs just create normal Jobs on a set schedule, but you’re more likely to blow up your infrastructure with CronJobs, so it seems reasonable to build confidence there from the start.

Before we can write a solution, it’s important to understand the landscape we’re working in. The examples provided are all going to assume a CronJob is creating a Job that runs a 2-container Pod, one container with our business logic code (“Primary” container), and the other being a sidecar handling something important like database connectivity.

It’s simple enough to observe that a CronJob is responsible for creating a Job on a set schedule, and that a Job is responsible for creating a special Pod that’s intended to execute code and complete. The Job spec also allows special controls such as optionally retrying your business logic a given number of times upon failure.

It’s less clear what a Job “completing” actually means, under what circumstances a retry is attempted, and how retries are really handled.

Job creates a Pod
If all containers in the Pod exit with code zero, the Job is considered successful
If any container in the Pod exits with a non-zero code, that individual container will be restarted a number of times up until the configured backoff limit, at which point the Job is considered a failure and the Pod is deleted

This behavior (vs. creating a brand new Pod each retry) leads to an interesting problem to resolve because your containers have no awareness of how many times they or other containers within the Pod have been restarted, but they all need to coordinate exiting with either a zero or non-zero exit code for the Job to ever complete while still allowing retries to work properly.

Since Kubernetes has no awareness of which container you consider a sidecar, it’s up to you to somehow uniformly shut down your sidecars after your primary container completes.

Following along in your own Kubernetes cluster

If you’re a hands-on learner and want to experiment in your own Kubernetes cluster, there are a handful of ready-to-apply manifests in GitHub here for you to play with that cover all the scenarios talked about here.

Visualizing the Problem

This first snippet demonstrates what a Pod will look like if no attempt is made to shut down the sidecar after the primary container successfully exits. Since the sidecar runs forever, the Pod stays running forever and the Job never completes. This is the scenario that can lead to the Pod-accumulation symptom I mentioned in this article’s intro which can push your cluster autoscaling to its limits and eventually prevent scheduling of additional Pods cluster-wide.

Primary container completing, while the sidecar runs forever

However, even if you successfully shut down the sidecar, you can run into unexpected behavior if you don’t pay attention to exit codes.

This second snippet demonstrates what happens if you always shut down the sidecar gracefully (exit code 0). Upon failure of the primary container, the sidecar will be shut down in a way K8s thinks is successful, so it never restarts, while the primary container will continue retrying. If the sidecar is responsible for something critical like database connections, this guarantees that your primary container will eventually reach its failure backoff limit and the Job as a whole will be marked a failure.

Take note of the primary container’s Started timestamp compared to the sidecar’s Finished timestamp and Ready status. This shows the case where our primary container is still retrying but the sidecar will no longer restart.

Primary container restarting multiple times while the sidecar is in completed state

We want to achieve killing all the sidecars in a Pod with a matching (zero or non-zero) exit code, letting all the containers either restart simultaneously or successfully complete as if they’re a discrete unit.

The following snippet demonstrates what that looks like. Take note of the Started and Finished timestamps for both containers, as well as the coupled restart counts and State.

Both containers restarting in tandem

Solutions

I’m going to cover two specific ways to solve for the sidecar shut-down problem as well as some caveats associated with both strategies.

Regardless of implementation details, both share the same overall goal which is to:

Execute the primary container’s business logic
Capture the exit code of the business logic
Make all sidecars conditionally exit with either a zero or non-zero exit code right after the primary business logic completes

Pkill

My preferred strategy is to use a configuration called shareProcessNamespace that makes a container’s processes visible to all other containers in the Pod. That concept paired with some securityContext overrides lets us explicitly kill off the sidecar processes from the primary container when it’s finished executing its business logic.

The main advantage to using this technique is that it should be compatible with any sidecar without the sidecar having any awareness that it’s being controlled.

The main disadvantage is that it might require running the primary container as root, or overriding the user IDs of all other containers to match so that the primary container has the permissions it needs to kill processes in other containers (which often need to run as a specific user or as root). Additionally, it requires a tool to be installed (pkill) for managing processes that may not already be available in your container without altering its docker image.

Here’s a snippet demonstrating how to use pkill in a privileged primary container to kill off an haproxy sidecar. Depending on your primary container’s base image, you might need to install pkill by running apt-get install -y procps.

To avoid running as root, you can match the user ID each container is running as so that the primary container has permissions to kill sidecar processes without special privileges. The following snippet shows a 3-container Pod with all containers’ securityContext configured to run as user 65532. The caveat with this strategy is that sometimes 3rd party images are written in a way where they need to be run as a specific user, and you won’t be able to override the user ID to match without side effects.

Shared Volume Mounts

One way to share information between containers in a Pod is by using volume mounts. If both containers in the Pod have access to a common directory, they can read/write files to exchange information.

The main advantage to using this technique is that it should be compatible with any container that has a shell available.

The main (huge) disadvantage is that it requires overriding the sidecar’s entrypoint/command so that we can inject our own communication and shut-down logic. That shut-down logic is also very ugly, as you can see below. A lot of times sidecars will be a 3rd party image, so you’ll need to dig up its original entrypoint and command and sandwich that in between the custom shut-down code.

Recommended Path Forward

Using a shared volume requires overriding the entrypoint/command for each container, contains a lot of complex logic, could cause unexpected behavior with 2+ sidecar setups, and not every container will have a shell available to script in.

While the pkill strategy might require some additional privileges or dependencies to be installed, it does not share any of the pitfalls of the shared volume approach, and seems like a far superior solution. If you’d like to view or try out a complete, ready-to-apply solution, check out the pkill success and failure examples in the accompanying GitHub repo.