Working with Kubernetes Jobs
Kubernetes defines a job as:
A job creates one or more pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the job tracks the successful completions. When a specified number of successful completions is reached, the job itself is complete. Deleting a Job will cleanup the pods it created.
This is important because kubernetes by default will try to get a success out of a failing job. If it fails, it will rerun it. If your code isn’t idempotent, that could be trouble. Dont fret, there are configuration options to tune this behavior.
If your “jobs” are created as responses to an event-driven, queue processing, or a workflow service; you may be looking for a Kubernetes Deployment or parallelism instead of a job.¹
Anywho, let’s get to work. A job can be created in one of two ways:
- “manually”
- via a CronJob
Creating a job manually²
Under the hood a pod does the work of your job. Jobs are an abstraction around a pod.
Open up your editor and create a file named job.yaml
Now create the job:
kubectl create -f ./job.yaml
You can inspect details about it:
kubectl describe jobs/pi
You can also inspect the pod
that the job
creates to actually run the workload:
kc get pods
You should see something like:
Let’s run it again:
kubectl create -f ./job.yaml
You probably got an error message:
Error from server (AlreadyExists): error when creating "./job.yaml": jobs.batch "pi" already exists
This because only one instance of a job can exist. To get around this you either uniquely name your jobs or you use a CronJob which manages naming for you.
Before we continue, lets delete that job.
kubectl delete jobs/pi
The kubectl
command can tell you a lot about kubernetes resources including jobs.
kubectl explain jobs
to get an overview of YAML configuration.
You can nest your explanation requests with any field
kubernetes responds with. This is a great way to look at documentation while you are working on defining a resource.
This will show you some advanced options to set on your job
kubectl explain jobs.spec
And you can keep nesting calls:
kubectl explain jobs.spec.template
Creating a job via CronJob
A CronJob is a higher level abstraction in kubernetes that creates jobs on a cron schedule.
CronJobs create Jobs, Jobs create Pods.
You’ll notice the job spec from above is placed below the jobTemplate
here.
Open up your editor and create a file named cronjob.yaml
:
kubectl create -f ./cronjob.yaml
Lets list the running cronjobs:
kubectl get cronjob
You should see:
CronJobs create Jobs. Wait one minute and then try:
kubectl get jobs
You should see something like:
Jobs create pods under the hood to do the work. Try:
kubectl get pods
You should see something like:
By default kubernetes will leave the last 3 successful pods and 1 failed pod so that you can inspect their logs and exit statuses. This is customizable, see the advanced CronJob configuration below.
Inevitably you may need to run a one-off of a cronjob. Kubernetes added support for this in 1.10.1:
kubectl create job --from=cronjob/pi-cronjob a-unique-name-for-your-job
Clean up:
kubectl delete -f ./cronjob.yaml
Below is a a fully featured configuration for a CronJob:
Configuration Options
Configuration happens at a two different levels: the CronJon spec.
and the Job spec.jobTemplate.spec.
CronJob configuration options:
spec.concurrencyPolicy
Controls whether CronJobs can overlap. Options are “Allow” (default), “Forbid”, and “Replace”. Replace cancels the oldest job and replaces it with the currently scheduled one.spec.failedJobsHistoryLimit
Controls the number of failed pods to keep around for inspection. Defaults to 1.spec.schedule
The cron schedule. Woo hoo!spec.startingDeadlineSeconds
If kubernetes can’t schedule the job on time, this controls how many seconds after missing the schedule the job can still be scheduled. Missed schedules are considered failures.spec.successfulJobsHistoryLimit
Controls the number of successful pods to keep around for inspection. Defaults to 3.spec.suspend
Allows you to disable a CronJob from being scheduled. I frequently use this in test/dev environments to disable the cron but keep the same configuration options as prod for parity. Then I’ll use the one-off method mentioned above to test execution.
Job configuration options:
spec.jobTemplate.spec.parallelism
Controls how many pods are spun up to do the job. Defaults tonil
which means it will only schedule 1.spec.jobTemplate.spec.completions
Controls how may pods must exit 0 before the job is considered a success. Defaults tonil
which means any pod that exits 0 will consider the whole job a success.spec.jobTemplate.spec.backoffLimit
Controls the number of times kubernetes should try to run the job in case it fails (exits non-zero). Defaults to 6.spec.jobTemplate.spec.activeDeadlineSeconds
Controls how long the job can run before kubernetes will terminate it. Defaults to forever and ever.
Configuration Gotchas:
Two common configuration options that I see people miss that may bite you are:
spec.concurrencyPolicy
defaults to “Allow” which lets CronJobs overlap. If you’ve ever bent of backwards trying to not allow crons to overlap this is going to slap you in the face.spec.jobTemplate.spec.backoffLimit
defaults to 6, which means if your code isn’t idempotent and it fails partially through, you could end up with some weird state… like Portland weird.
Links
Footnotes
- If you’re interested in seeing a write up on this, send me a message on twitter @coryodaniel
- You can also create jobs via the Kubernetes API from a StatefulSet or a Deployment. This is useful if you need to dispatch jobs from some sort of queue.