How to prevent Kubernetes cron jobs with sidecar containers from getting stuck

Published in

finnovate.io

3 min readApr 13, 2021

The following article is written with the help of Stephen Sparling from the finnovate.io team.

If you are deploying a Kubernetes cron job, there is a good chance that it will need to access a database in Google’s cloud SQL. Google recommends that we access a database via a cloud SQL proxy deployed in a sidecar container for heighten security.

The configuration YAML file may look something like this:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: my-cron-job
spec:
  schedule: "0 9 * * *"  # job runs everyday at 9 am
  concurrencyPolicy: Forbid
jobTemplate:
  spec:
    template:
      spec:
        containers:
        - name: main-job
          # additional code hidden
        - name: cloudsql-proxy
          image: gcr.io/cloudsql-docker/gce-proxy:1.11
          command: ["/cloud_sql_proxy",
                    "-instances=xxxx=tcp:0.0.0.0:3306",
                    "-credential_file=secrets.json"]          
          # additional code hidden
        restartPolicy: OnFailure

Doing so allows your code, in this case code inside the “main-job” container to access the SQL proxy with “localhost”.

We find that using the above sidecar pattern creates a major problem: long after our main job finishes, the cloudsql-proxy remains active, causing the job to be stuck in a running state. The pod with the two containers are never deleted without human intervention. This clearly defeats the purpose of the cron job as the undying pod prevents a new pod from starting up in the next scheduled time.

Solution #1: use Replace concurrencyPolicy

The simplest solution we found is to adjust the concurrencyPolicy to “Replace” instead of “Forbid” or the default “Allow”

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: my-cron-job
spec:
  schedule: "0 9 * * *"  # job runs everyday at 9 am
  concurrencyPolicy: Replace

This forces the stuck pod to be replaced by a new one in the next scheduled time. However, the pod is still taking up precious resources in the cluster long after the job has served its purpose.

If you are not overly concerned about idle pods taking up resources in your cluster, this may be an acceptable workaround.

Solution #2: don’t use a sidecar DB proxy

Another simple solution is to avoid the use of a sidecar container to proxy your database calls. Have your code directly establish a connection to the remote database. However, this comes at the risk of exposing your SQL traffic beyond the boundaries of your pod.

If you are not dealing with highly sensitive data, this may be an acceptable alternative.

Solution #3: Have the main job signal when it is done

This is the solution that requires the most work, but it allows the sidecar to be terminated gracefully. It doesn’t have the disadvantages described in the first two solutions.

This approach involves the use of a shared volume between the two containers in your pod. In short, when you main job finishes, it writes a file to the shared volume and when the sidecar detects the file, it terminates itself. You can certainly write your own code and/or bash commands to perform such trickery, but you may want to review kubexit, which is specifically developed for coordinating container termination.

Finnovate.io is a technology company focused on helping organizations build unique digital experiences on web, mobile and blockchain. Finnovate.io offers development services, training, consulting, as well as a platform that rapidly turn paper based content into interactive experiences.

How to prevent Kubernetes cron jobs with sidecar containers from getting stuck

Solution #1: use Replace concurrencyPolicy

Solution #2: don’t use a sidecar DB proxy

Solution #3: Have the main job signal when it is done

Published in finnovate.io

Written by Alfred Yang