Kubernetes: Running Background Tasks With Batch-Jobs

Jonathan Campos
Google Cloud - Community
10 min readAug 7, 2018

When building amazing applications, there are times that you might want to handle an action outside of a user’s request/response lifecycle. If you are wanting to respond to time-based events then you are wanting to look at cron jobs. If you are wanting to just kick-off a process outside of the user’s request/response lifecycle but not based on time, then you are looking at batch-jobs. Batch jobs can be kicked off by any number of triggers and can run complex tasks without effecting your users response times.

Photo by Jonathan Farber on Unsplash

In this article we are going to look at how to run a batch-job in a few different ways: one-time, sequential, and parallel.

If you haven’t gone through or even read the first part of this series you might be lost, have questions where the code is, or what was done previously. Remember this assumes you’re using GCP and GKE. I will always provide the code and how to test the code is working as intended.

Setup Your Kubernetes Cluster

Unlike other articles in this series, this article is going to have us run more scripts to see our batch-jobs in action. As such we need to start by setting up our Kubernetes Cluster once and keep running jobs on our Cluster. Here is the script to setup a Kubernetes Cluster in Google Cloud using Google Kubernetes Engine.

$ git clone https://github.com/jonbcampos/kubernetes-series.git
$ cd ~/kubernetes-series/batch-job/scripts
$ sh startup.sh
$ sh deploy.sh
$ sh check-endpoint.sh endpoints

With these scripts run you will have a Kubernetes Cluster up and running in your Google Cloud Project. Furthermore, take note of the IP Address that is revealed by the check-endpoint.sh command. This will be the IP Address that we use later to see outputs from our batch-jobs.

Note: You can view the results of your batch files at http://[IP_Address]/data. As we go through the various ways to run your batch-jobs you will see data logged out to this location.

Creating A One-Time Kubernetes Job

There maybe a time you have some script that you need to run only once. Maybe a migration script. Maybe a script to fix some data that is missing. Whatever the reason, you need this to run once. For this example we are going to assume you already have the code ready that you want to run, as represented by a simple container/script that I’ve setup for you.

With the container that you want to run ready, we just need to create our yaml file.

apiVersion: batch/v1
kind: Job
metadata:
name: single-job
spec:
backoffLimit: 6
# number of retries before throwing error
activeDeadlineSeconds: 10 # time to allow job to run
template:
metadata:
labels:
app: kubernetes-series
tier: job
spec:
restartPolicy: OnFailure
containers:
- name: job
image: gcr.io/PROJECT_NAME/batchjob-container-job:latest

# environment variables for the Pod
env:
- name: GCLOUD_PROJECT
value: PROJECT_NAME
- name: MESSAGE
value: I am a single run job
- name: FOREIGN_SERVICE
value: http://endpoints.default.svc.cluster.local/single
- name: NODE_ENV
value: production
ports:
- containerPort: 80

With your container ready along with your Kubernetes Cluster the only thing you need to do is run the yaml file with the following script.

kubectl apply -f ../k8s/single-job.yaml

See the next section to run the script and view the output.

One-Time Kubernetes Job In Action

With your Kubernetes Cluster running we just need to go into the Google Cloud Shell and run the following script to run a one-time batch-job.

$ cd ~/kubernetes-series/batch-job/scripts
$ sh run_single_job.sh

A lot is going to happen immediately when you hit enter. If you run over to your Kubernetes Engine > Workloads View you will see the new single-job Pod in your Workloads and 0/1 Pods active. That is because the single-job ran and terminated the Pod when complete.

single-job Pod Complete

If you checkout the single-jobEvents details you can also see the SuccessfulCreate event that was logged after the Pod was created.

single-job Pod Events

And finally, to see the result of the single-job Pod you can return to http://[IP_Address]/data and view the result from the batch-job.

single-job batch-job result

Wow! So much stuff happened so quickly! This is all well and good for a simple job. But what if you need to move a lot of data all at once. Running one job may not be enough. In the next example we will look at how to sequentially create batch-jobs.

Creating A Sequential Kubernetes Job

In the case that you want to run batches of jobs in sequence rather than have one very long job. This is especially good if you are worried that jobs may fail and need to restart, if a job is consuming a lot of memory/resources and needs to be limited, or if you just don’t want background jobs running too long.

Example: Assume you have 100 resources to consume and transform. You could have 1 job that takes an hour to run and possibly fails and needs to be started multiple times to fully complete. Or you could have 10 batch-jobs that run through 10 resources at a time. Even if a batch-job fails you end up just restarting that specific group of 10 and not the full group of 100.

To see this in a yaml file that Kubernetes can understand look at the following code. Most of the yaml file is extremely similar to a single run batch-job except for the one .spec.completions addition (in bold below). This will set the batch-job to run 3 times, one after the other.

apiVersion: batch/v1
kind: Job
metadata:
name: sequential-job
spec:
completions: 3 # number of times to run
backoffLimit: 6 # number of retries before throwing error
activeDeadlineSeconds: 10 # time to allow job to run
template:
metadata:
labels:
app: kubernetes-series
tier: job
spec:
restartPolicy: OnFailure
containers:
- name: job
image: gcr.io/PROJECT_NAME/batchjob-container-job:latest
# environment variables for the Pod
env:
- name: GCLOUD_PROJECT
value: PROJECT_NAME
- name: MESSAGE
value: I am a sequential run job
- name: FOREIGN_SERVICE
value: http://endpoints.default.svc.cluster.local/sequential
- name: NODE_ENV
value: production
ports:
- containerPort: 80

The change is super minor and super effective. Next let’s see this in action.

Sequential Kubernetes Job In Action

With your Kubernetes Cluster running we just need to go into the Google Cloud Shell and run the following script to run a sequential batch-job.

$ cd ~/kubernetes-series/batch-job/scripts
$ sh run-sequential-job.sh

A lot is going to happen immediately when you hit enter. If you run over to your Kubernetes Engine > Workloads View you will see the new sequential-job Pod in your Workloads and 0/3 Pods active. That is because the sequential-job ran and terminated the Pod when complete.

sequential-job complete

If you checkout the sequential-jobEvents details you can also see the SuccessfulCreate event that was logged after the Pod was created.

sequential-job events view

And finally, to see the result of the sequential-job Pod you can return to http://[IP_Address]/data and view the result from the batch-job.

sequential-job batch-job result adding to output

Just like the single-job, the sequential-job did so much so quickly! If you need to break down a single run job into multiple this is a very effective solution. What happens if you want to go further though? What if you want to break down the job and run multiple jobs all at the same time? Time to talk about parallel jobs.

Creating A Parallel Kubernetes Job

Now you can run your jobs sequentially to break down long running jobs into bite sized chunks. What if you want to use the power of the Google Cloud and your Kubernetes Cluster and run these jobs in parallel rather than sequentially — effectively turning hours of processing into minutes of processing on more Nodes? Parallel jobs will do this for you.

If you have setup your jobs to run idempotently then this won’t be a problem. Adding one new line into your yaml file will give you this ability.

Make sure to checkout my quick explanation of Idempotent Jobs if you are confused by the term.

apiVersion: batch/v1
kind: Job
metadata:
name: parallel-job
spec:
completions: 6 # number of times to run
parallelism: 2 # number of pods that can run in parallel
backoffLimit: 6 # number of retries before throwing error
activeDeadlineSeconds: 10 # time to allow job to run
template:
metadata:
labels:
app: kubernetes-series
tier: job
spec:
restartPolicy: OnFailure
containers:
- name: job
image: gcr.io/PROJECT_NAME/batchjob-container-job:latest
# environment variables for the Pod
env:
- name: GCLOUD_PROJECT
value: PROJECT_NAME
- name: MESSAGE
value: I am a parallel run job
- name: FOREIGN_SERVICE
value: http://endpoints.default.svc.cluster.local/parallel
- name: NODE_ENV
value: production
ports:
- containerPort: 80

The .spec.parallelism line now tells Kubernetes that it is okay to run that many Pods at the same time. If I set completions to 6 and parallelism to 6 then all of my batch-jobs would run at the same time. For this example I wanted things to run in batches of 3 Kubernetes Pods, so I set parallelism to 2 and completions to 6.

Parallel Kubernetes Job In Action

With your Kubernetes Cluster running we just need to go into the Google Cloud Shell and run the following script to run a parallel batch-job.

$ cd ~/kubernetes-series/batch-job/scripts
$ sh run-sequential-job.sh

A lot is going to happen immediately when you hit enter. If you run over to your Kubernetes Engine > Workloads View you will see the new parallel-job Pod in your Workloads and 0/6 Pods active. That is because the parallel-job ran and terminated the Pod when complete. Note: In the screenshot I caught one Pods still spinning up.

parallel-job pods spinning up

If you checkout the parallel-jobEvents details you can also see the SuccessfulCreate event that was logged after the Pod was created.

parallel-job events view

And finally, to see the result of the parallel-job Pod you can return to http://[IP_Address]/data and view the result from the batch-job.

parallel-job batch-job result adding to output

BOOM! Batch-jobs, running sequentially and in parallel. Speeding up your processing without making you wait any longer. And it was so easy that you almost feel like you’re missing something.

Conclusion

If you made it this far and ran all the code you must be the happiest coder right now — I know I was. How easy it is to run batch-jobs and have them run just the way you want. Kubernetes is a breath of fresh air in job that is often filled with cringe worthy config files and scripts.

Teardown

Before you leave make sure to cleanup your project so you aren’t charged for the VMs that you’re using to run your cluster. Return to the Cloud Shell and run the teardown script to cleanup your project. This will delete your cluster and the containers that we’ve built.

$ cd ~/kubernetes-series/daemon/scripts
$ sh teardown.sh

Jonathan Campos is an avid developer and fan of learning new things. I believe that we should always keep learning and growing and failing. I am always a supporter of the development community and always willing to help. So if you have questions or comments on this story please ad them below. Connect with me on LinkedIn or Twitter and mention this story.

--

--

Jonathan Campos
Google Cloud - Community

Excited developer and lover of pizza. CTO at Alto. Google Developer Expert.