Running Kubernetes jobs with sidecar containers
Recently, I came across an interesting use case of having to run a sidecar container inside a Kubernetes Job. However, there was a challenge: a Job isn’t complete until all of its containers exits. We realized we’d need an external mechanism to coordinate this because an attempt to support this in Kubernetes KEP-753 was dropped. In this post, we’ll look at how an open-source project — kubexit can be used to solve this.
Requirements
We have two main requirements that need to be met:
- If the primary container exits with status code 0, the job should succeed by terminating the sidecar container(s).
- If the primary container exits with a non-zero status code, the Job should fail. Incorrectly marking a job as successful could lead to loss of customer data.
Solution
The kubexit binary records metadata (tombstone) about a container: start time, end time, and exit code onto a volume (graveyard). The easiest way to provide this volume can be as an emptyDirin-memory volume. The binary watches for tombstone changes and can co-ordinate the exit of sidecar container(s) by looking up the name of the container identified by KUBEXIT_NAMEin KUBEXIT_DEATH_DEPS.
For our example, we setKUBEXIT_DEATH_DEPS=Forwarder that implies that when Forwarderexits, Queue is gracefully terminated by sending a TERM signal. kubexit also preserves the exit codes so if Forwarder exits due to an error, the Job is marked as Failed or retried based on the Job’s retry policy.
Problems
We ran into a major bug while testing this setup — if the main container exits before the sidecar starts, kubexit is unable to terminate the sidecar container.
Unfortunately, the original repository: https://github.com/karlkfi/kubexit is no longer maintained. There is an open PR that addresses this but it doesn’t look like it will be merged anytime soon. Fortunately, CortexLabs has a fork https://github.com/cortexlabs/kubexit with the fixes, and highly recommend using that instead.
To sum up, kubexit is not an ideal solution but one that just works. If you’ve come across a more elegant solution, please share it in the comments :)
