Kubernetes: Running Background Tasks With Batch-Jobs

Published in

Google Cloud - Community

10 min readAug 7, 2018

When building amazing applications, there are times that you might want to handle an action outside of a user’s request/response lifecycle. If you are wanting to respond to time-based events then you are wanting to look at cron jobs. If you are wanting to just kick-off a process outside of the user’s request/response lifecycle but not based on time, then you are looking at batch-jobs. Batch jobs can be kicked off by any number of triggers and can run complex tasks without effecting your users response times.

In this article we are going to look at how to run a batch-job in a few different ways: one-time, sequential, and parallel.

If you haven’t gone through or even read the first part of this series you might be lost, have questions where the code is, or what was done previously. Remember this assumes you’re using GCP and GKE. I will always provide the code and how to test the code is working as intended.

Kubernetes: Day One

This is the obligatory step one Kubernetes post. If you’re interested in Kubernetes you’ve probably read 100 of these…

medium.com

Setup Your Kubernetes Cluster

Unlike other articles in this series, this article is going to have us run more scripts to see our batch-jobs in action. As such we need to start by setting up our Kubernetes Cluster once and keep running jobs on our Cluster. Here is the script to setup a Kubernetes Cluster in Google Cloud using Google Kubernetes Engine.

$ git clone https://github.com/jonbcampos/kubernetes-series.git
$ cd ~/kubernetes-series/batch-job/scripts
$ sh startup.sh
$ sh deploy.sh
$ sh check-endpoint.sh endpoints

With these scripts run you will have a Kubernetes Cluster up and running in your Google Cloud Project. Furthermore, take note of the IP Address that is revealed by the check-endpoint.sh command. This will be the IP Address that we use later to see outputs from our batch-jobs.

Note: You can view the results of your batch files at http://[IP_Address]/data. As we go through the various ways to run your batch-jobs you will see data logged out to this location.

Creating A One-Time Kubernetes Job

There maybe a time you have some script that you need to run only once. Maybe a migration script. Maybe a script to fix some data that is missing. Whatever the reason, you need this to run once. For this example we are going to assume you already have the code ready that you want to run, as represented by a simple container/script that I’ve setup for you.

With the container that you want to run ready, we just need to create our yaml file.

apiVersion: batch/v1
kind: Job
metadata:
  name: single-job
spec:
  backoffLimit: 6 # number of retries before throwing error
  activeDeadlineSeconds: 10 # time to allow job to run
  template:
    metadata:
      labels:
        app: kubernetes-series
        tier: job
    spec:
      restartPolicy: OnFailure
      containers:
        - name: job
          image: gcr.io/PROJECT_NAME/batchjob-container-job:latest
          # environment variables for the Pod
          env:
          - name: GCLOUD_PROJECT
            value: PROJECT_NAME
          - name: MESSAGE
            value: I am a single run job
          - name: FOREIGN_SERVICE
            value: http://endpoints.default.svc.cluster.local/single
          - name: NODE_ENV
            value: production
          ports:
          - containerPort: 80

With your container ready along with your Kubernetes Cluster the only thing you need to do is run the yaml file with the following script.

kubectl apply -f ../k8s/single-job.yaml

See the next section to run the script and view the output.

One-Time Kubernetes Job In Action

With your Kubernetes Cluster running we just need to go into the Google Cloud Shell and run the following script to run a one-time batch-job.

$ cd ~/kubernetes-series/batch-job/scripts
$ sh run_single_job.sh

A lot is going to happen immediately when you hit enter. If you run over to your Kubernetes Engine > Workloads View you will see the new single-job Pod in your Workloads and 0/1 Pods active. That is because the single-job ran and terminated the Pod when complete.

If you checkout the single-jobEvents details you can also see the SuccessfulCreate event that was logged after the Pod was created.

And finally, to see the result of the single-job Pod you can return to http://[IP_Address]/data and view the result from the batch-job.

Wow! So much stuff happened so quickly! This is all well and good for a simple job. But what if you need to move a lot of data all at once. Running one job may not be enough. In the next example we will look at how to sequentially create batch-jobs.

Creating A Sequential Kubernetes Job

In the case that you want to run batches of jobs in sequence rather than have one very long job. This is especially good if you are worried that jobs may fail and need to restart, if a job is consuming a lot of memory/resources and needs to be limited, or if you just don’t want background jobs running too long.

Example: Assume you have 100 resources to consume and transform. You could have 1 job that takes an hour to run and possibly fails and needs to be started multiple times to fully complete. Or you could have 10 batch-jobs that run through 10 resources at a time. Even if a batch-job fails you end up just restarting that specific group of 10 and not the full group of 100.

To see this in a yaml file that Kubernetes can understand look at the following code. Most of the yaml file is extremely similar to a single run batch-job except for the one .spec.completions addition (in bold below). This will set the batch-job to run 3 times, one after the other.

apiVersion: batch/v1
kind: Job
metadata:
  name: sequential-job
spec:
  completions: 3 # number of times to run
  backoffLimit: 6 # number of retries before throwing error
  activeDeadlineSeconds: 10 # time to allow job to run
  template:
    metadata:
      labels:
        app: kubernetes-series
        tier: job
    spec:
      restartPolicy: OnFailure
      containers:
        - name: job
          image: gcr.io/PROJECT_NAME/batchjob-container-job:latest
          # environment variables for the Pod
          env:
          - name: GCLOUD_PROJECT
            value: PROJECT_NAME
          - name: MESSAGE
            value: I am a sequential run job
          - name: FOREIGN_SERVICE
            value: http://endpoints.default.svc.cluster.local/sequential
          - name: NODE_ENV
            value: production
          ports:
          - containerPort: 80

The change is super minor and super effective. Next let’s see this in action.

Sequential Kubernetes Job In Action

With your Kubernetes Cluster running we just need to go into the Google Cloud Shell and run the following script to run a sequential batch-job.

$ cd ~/kubernetes-series/batch-job/scripts
$ sh run-sequential-job.sh

A lot is going to happen immediately when you hit enter. If you run over to your Kubernetes Engine > Workloads View you will see the new sequential-job Pod in your Workloads and 0/3 Pods active. That is because the sequential-job ran and terminated the Pod when complete.

If you checkout the sequential-jobEvents details you can also see the SuccessfulCreate event that was logged after the Pod was created.

And finally, to see the result of the sequential-job Pod you can return to http://[IP_Address]/data and view the result from the batch-job.

sequential-job batch-job result adding to output

Just like the single-job, the sequential-job did so much so quickly! If you need to break down a single run job into multiple this is a very effective solution. What happens if you want to go further though? What if you want to break down the job and run multiple jobs all at the same time? Time to talk about parallel jobs.

Creating A Parallel Kubernetes Job

Now you can run your jobs sequentially to break down long running jobs into bite sized chunks. What if you want to use the power of the Google Cloud and your Kubernetes Cluster and run these jobs in parallel rather than sequentially — effectively turning hours of processing into minutes of processing on more Nodes? Parallel jobs will do this for you.

If you have setup your jobs to run idempotently then this won’t be a problem. Adding one new line into your yaml file will give you this ability.

Make sure to checkout my quick explanation of Idempotent Jobs if you are confused by the term.

apiVersion: batch/v1
kind: Job
metadata:
  name: parallel-job
spec:
  completions: 6 # number of times to run
  parallelism: 2 # number of pods that can run in parallel
  backoffLimit: 6 # number of retries before throwing error
  activeDeadlineSeconds: 10 # time to allow job to run
  template:
    metadata:
      labels:
        app: kubernetes-series
        tier: job
    spec:
      restartPolicy: OnFailure
      containers:
        - name: job
          image: gcr.io/PROJECT_NAME/batchjob-container-job:latest
          # environment variables for the Pod
          env:
          - name: GCLOUD_PROJECT
            value: PROJECT_NAME
          - name: MESSAGE
            value: I am a parallel run job
          - name: FOREIGN_SERVICE
            value: http://endpoints.default.svc.cluster.local/parallel
          - name: NODE_ENV
            value: production
          ports:
          - containerPort: 80

The .spec.parallelism line now tells Kubernetes that it is okay to run that many Pods at the same time. If I set completions to 6 and parallelism to 6 then all of my batch-jobs would run at the same time. For this example I wanted things to run in batches of 3 Kubernetes Pods, so I set parallelism to 2 and completions to 6.

Parallel Kubernetes Job In Action

With your Kubernetes Cluster running we just need to go into the Google Cloud Shell and run the following script to run a parallel batch-job.

$ cd ~/kubernetes-series/batch-job/scripts
$ sh run-sequential-job.sh

A lot is going to happen immediately when you hit enter. If you run over to your Kubernetes Engine > Workloads View you will see the new parallel-job Pod in your Workloads and 0/6 Pods active. That is because the parallel-job ran and terminated the Pod when complete. Note: In the screenshot I caught one Pods still spinning up.

If you checkout the parallel-jobEvents details you can also see the SuccessfulCreate event that was logged after the Pod was created.

And finally, to see the result of the parallel-job Pod you can return to http://[IP_Address]/data and view the result from the batch-job.

parallel-job batch-job result adding to output

BOOM! Batch-jobs, running sequentially and in parallel. Speeding up your processing without making you wait any longer. And it was so easy that you almost feel like you’re missing something.

Conclusion

If you made it this far and ran all the code you must be the happiest coder right now — I know I was. How easy it is to run batch-jobs and have them run just the way you want. Kubernetes is a breath of fresh air in job that is often filled with cringe worthy config files and scripts.

Teardown

Before you leave make sure to cleanup your project so you aren’t charged for the VMs that you’re using to run your cluster. Return to the Cloud Shell and run the teardown script to cleanup your project. This will delete your cluster and the containers that we’ve built.

$ cd ~/kubernetes-series/daemon/scripts
$ sh teardown.sh

Kubernetes: Running Background Tasks With Batch-Jobs

Kubernetes: Day One

This is the obligatory step one Kubernetes post. If you’re interested in Kubernetes you’ve probably read 100 of these…

Setup Your Kubernetes Cluster

Creating A One-Time Kubernetes Job

One-Time Kubernetes Job In Action

Creating A Sequential Kubernetes Job

Sequential Kubernetes Job In Action

Creating A Parallel Kubernetes Job

Parallel Kubernetes Job In Action

Conclusion

Teardown

Other Posts In This Series

Kubernetes: Run A Pod Per Node With Daemon Sets

My initial title to this article was just “Daemon Sets” with the assumption that it would be enough to get the point…

Kubernetes: Cron Jobs

Sometimes your work isn’t transactional. Instead of waiting for a user to click a button and have systems light up we…

Kubernetes: DNS Proxy With Services

When building an application it is common that you’ll need to interact with external services to complete your business…

Kubernetes: Routing Internal Services Through FQDN

I remember when I was first getting into Kubernetes. Everything was new and shiny and about scale. As I continued…

Kubernetes: Liveness Checks

Recently I put together a quick article about the Kubernetes Readiness Probe and how important it was for your cluster…

Kubernetes: Readiness Probe

In case there was any question about this feature, I am writing about it specifically to state that this is not an…

Kubernetes: Horizontal Pod Scaling

With Pod Autoscaling your Kubernetes Cluster can monitor the load of your existing Pods and determine if we need more…

Kubernetes: Cluster Autoscaler

Autoscaling is a huge (and marketed) feature of Kubernetes. When your site/app/api/project makes it big and the flood…

Kubernetes: Day One

This is the obligatory step one Kubernetes post. If you’re interested in Kubernetes you’ve probably read 100 of these…

Written by Jonathan Campos