Working with Kubernetes Jobs

Cory O'Daniel
CoryODaniel
Published in
4 min readSep 21, 2018

Kubernetes defines a job as:

A job creates one or more pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the job tracks the successful completions. When a specified number of successful completions is reached, the job itself is complete. Deleting a Job will cleanup the pods it created.

This is important because kubernetes by default will try to get a success out of a failing job. If it fails, it will rerun it. If your code isn’t idempotent, that could be trouble. Dont fret, there are configuration options to tune this behavior.

If your “jobs” are created as responses to an event-driven, queue processing, or a workflow service; you may be looking for a Kubernetes Deployment or parallelism instead of a job.¹

Anywho, let’s get to work. A job can be created in one of two ways:

  1. “manually”
  2. via a CronJob

Creating a job manually²

Under the hood a pod does the work of your job. Jobs are an abstraction around a pod.

Open up your editor and create a file named job.yaml

This is an amazing example

Now create the job:

kubectl create -f ./job.yaml

You can inspect details about it:

kubectl describe jobs/pi

You can also inspect the pod that the job creates to actually run the workload:

kc get pods

You should see something like:

Let’s run it again:

kubectl create -f ./job.yaml

You probably got an error message:

Error from server (AlreadyExists): error when creating "./job.yaml": jobs.batch "pi" already exists

This because only one instance of a job can exist. To get around this you either uniquely name your jobs or you use a CronJob which manages naming for you.

Before we continue, lets delete that job.

kubectl delete jobs/pi

The kubectl command can tell you a lot about kubernetes resources including jobs.

kubectl explain jobs to get an overview of YAML configuration.

You can nest your explanation requests with any field kubernetes responds with. This is a great way to look at documentation while you are working on defining a resource.

This will show you some advanced options to set on your job

kubectl explain jobs.spec

And you can keep nesting calls:

kubectl explain jobs.spec.template

Creating a job via CronJob

A CronJob is a higher level abstraction in kubernetes that creates jobs on a cron schedule.

CronJobs create Jobs, Jobs create Pods.

You’ll notice the job spec from above is placed below the jobTemplate here.

Open up your editor and create a file named cronjob.yaml:

I can feel the $ rolling in. Damn, we’re going to be rich.

kubectl create -f ./cronjob.yaml

Lets list the running cronjobs:

kubectl get cronjob

You should see:

CronJobs create Jobs. Wait one minute and then try:

kubectl get jobs

You should see something like:

Jobs create pods under the hood to do the work. Try:

kubectl get pods

You should see something like:

By default kubernetes will leave the last 3 successful pods and 1 failed pod so that you can inspect their logs and exit statuses. This is customizable, see the advanced CronJob configuration below.

Inevitably you may need to run a one-off of a cronjob. Kubernetes added support for this in 1.10.1:

kubectl create job --from=cronjob/pi-cronjob a-unique-name-for-your-job

Clean up:

kubectl delete -f ./cronjob.yaml

Below is a a fully featured configuration for a CronJob:

Configuration Options

Configuration happens at a two different levels: the CronJon spec. and the Job spec.jobTemplate.spec.

CronJob configuration options:

  • spec.concurrencyPolicy Controls whether CronJobs can overlap. Options are “Allow” (default), “Forbid”, and “Replace”. Replace cancels the oldest job and replaces it with the currently scheduled one.
  • spec.failedJobsHistoryLimit Controls the number of failed pods to keep around for inspection. Defaults to 1.
  • spec.schedule The cron schedule. Woo hoo!
  • spec.startingDeadlineSeconds If kubernetes can’t schedule the job on time, this controls how many seconds after missing the schedule the job can still be scheduled. Missed schedules are considered failures.
  • spec.successfulJobsHistoryLimit Controls the number of successful pods to keep around for inspection. Defaults to 3.
  • spec.suspend Allows you to disable a CronJob from being scheduled. I frequently use this in test/dev environments to disable the cron but keep the same configuration options as prod for parity. Then I’ll use the one-off method mentioned above to test execution.

Job configuration options:

  • spec.jobTemplate.spec.parallelism Controls how many pods are spun up to do the job. Defaults to nil which means it will only schedule 1.
  • spec.jobTemplate.spec.completions Controls how may pods must exit 0 before the job is considered a success. Defaults to nil which means any pod that exits 0 will consider the whole job a success.
  • spec.jobTemplate.spec.backoffLimit Controls the number of times kubernetes should try to run the job in case it fails (exits non-zero). Defaults to 6.
  • spec.jobTemplate.spec.activeDeadlineSeconds Controls how long the job can run before kubernetes will terminate it. Defaults to forever and ever.

Configuration Gotchas:

Two common configuration options that I see people miss that may bite you are:

  • spec.concurrencyPolicy defaults to “Allow” which lets CronJobs overlap. If you’ve ever bent of backwards trying to not allow crons to overlap this is going to slap you in the face.
  • spec.jobTemplate.spec.backoffLimit defaults to 6, which means if your code isn’t idempotent and it fails partially through, you could end up with some weird state… like Portland weird.

Links

Footnotes

  1. If you’re interested in seeing a write up on this, send me a message on twitter @coryodaniel
  2. You can also create jobs via the Kubernetes API from a StatefulSet or a Deployment. This is useful if you need to dispatch jobs from some sort of queue.

--

--

Cory O'Daniel
CoryODaniel

SQL, kubernetes, elixir, ruby, node, tacos, whiskey, repeat.