Creating a long running job in GCP’s new Batch service

Datamarinier
5 min readFeb 22

--

Source: GCP Batch Icon

TLDR

You can run long-running jobs on a fully configurable virtual machine using Google Cloud Platform’s (GCP) new Batch service. You can find the link to Github with our working sample here.

For whom is this blog?

If you are looking for a serverless way to run long-running jobs, look no further. GCP has recently launched Batch, a managed service which enables you to do exactly that. With Batch you can train machine learning algorithms, convert video files or process large amounts of data. Although this is a fully managed service, you only pay the costs for the virtual machine you use. Since the virtual machine automatically shuts down upon completion, this serverless deployment can save you a lot of costs compared to a machine that runs constantly.

Prerequisites

To follow the blog you will need the following:

  • Basic knowledge of Docker
  • Basic knowledge of Python
  • Project with billing address in GCP
  • Installation of Docker and gcloud CLI

Batch

One of our clients needed to convert large .mfx video files (up to 30 gigabytes) to a more manageable compressed .mp4 format on a daily basis. Such conversion, depending on a variety of factors, can run from 30 minutes to several hours. One possibility is using Cloud Run, however, this solution will not work for jobs exceeding 60 minutes.

Still dedicated to finding a serverless solution, we proposed to use GCP’s new Batch service. With Batch you can start a job on a virtual machine with a simple HTTP request. The rest, namely, provisioning, executing and shutting down the virtual machine is taken care of automatically. So only a minimal setup is required and you also only pay for what you use. As you are using virtual machines, you can install any dependency or software needed to execute your workload.

Let us try this new service by running a ‘long’ running Python script. We will create a custom dockerfile and Python script, build it, and push it to Google Container Registry (GCR). Here the batch process will find the container image and run the container on an automatically provisioned virtual machine. The machine will be shut down automatically.

Need no further explanation? The working code sample can be found here.

First step is to create a folder in which you place, at the root level: (1) a Dockerfile and (2) a Python script called ‘main.py’.

import sys
import time

def long_running_function(n):
for i in range(n):
time.sleep(1)
print(f"Processed {i+1} out of {n} items...")

if __name__ == "__main__":
try:
long_running_function(60)
except Exception as err:
print("Error occurred:", err)
sys.exit(1)

This function takes a single argument n which specifies the number of items to process. It then runs a loop that sleeps for one second and prints a progress message for each item processed.

The Dockerfile is pretty straightforward. Here you can build upon a base image (like python:3), place the main.py file in the working directory and indicate the correct entrypoint so that Docker knows what to execute when the container starts. In this file you can add other dependencies, such as GCSfuse or Cloud SQL.

Below is what your Dockerfile might look like.

FROM python:3

WORKDIR /usr/src/app

#COPY requirements.txt ./
#RUN pip install --no-cache-dir -r requirements.txt

COPY main.py ./main.py

ENTRYPOINT ["python", "/usr/src/app/main.py"]

Now to build and push this image run the following:

docker build -t eu.gcr.io/<your project name>/blog_test .

docker push eu.gcr.io/<your project name>/blog_test

The first Docker command builds a new Docker image from the Dockerfile in the current directory and tags it with the name eu.gcr.io/<your project name>/blog_test. Adapt the URL based on your preferred location (for instance us.gcr.io instead of eu.gcr.io). The second command pushes the image to the register.

That’s it!

You can now run the batch job. Here you have several options, either directly with gcloud, via a client library (available in Python, Java or other languages), or an HTTP request. In the example below we use gcloud. To do so make a config.json with the following content (make sure to replace the name of your project):

{
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"container": {
"imageUri": "eu.gcr.io/<your project name here>/blog_test"
}
}
],
"computeResource": {
"cpuMilli": 1000,
"memoryMib": 2000
}
}
}
],
"logsPolicy": {
"destination": "CLOUD_LOGGING"
}
}
  • imageUri: the name and location of the image (defined by the docker tag)
  • cpuMilli: milliseconds per cpu-second. This means the task will use 2 CPUs.
  • memoryMib: memory (in megabytes).
  • destination: to be able to view logging in Cloud logging

Open a terminal, go to your working directory and run:

gcloud batch jobs submit testjob  --location europe-north1 --config config.json

Once the command is executed, the job will be submitted to Google Cloud Batch in the specified region, and the service will begin provisioning the necessary resources and running the job according to the configuration specified in config.json.

Visit: https://console.cloud.google.com/batch/jobs?project=<your project name here>

Your test job is scheduled
Your test job is running
Your job is complete!
Link to visit the logs (below left)
The logs of the job

References:

https://cloud.google.com/batch/docs/get-started

--

--