Running a serverless batch workload on GCP with Cloud Scheduler, Cloud Functions, and Compute Engine
This quick-start guide is part of a series that shows how to leverage Google Cloud Platform components to run batch workloads in a simpler way. Those familiar with AWS, there’s a great tool called AWS Batch, but looking at GCP products, how are we able to run a batch job in a similar manner?
Let’s dig into the GCP documentation: aws-comparison
Going deeper into the documentation… A way that GCP recommends this use case to be accomplished is: reliable-task-scheduling-compute-engine
A quick summary: Cloud Scheduler -> Pub/Sub -> Start VM
An example of starting and stopping VM’s that’s quoted on the documentation: start-and-stop-compute-engine-instances-on-a-schedule
But there’s a flaw in this approach, the VM is not deleted after the batch workload is completed, so we are paying for the storage costs. How can we make it more efficient, and turn into a more serverless solution?
Wait a minute, how can you talk about VM’s and serverless in the same sentence? Something smells fishy…
It sure does, but those questions will be answered shortly, so stay with me…
To begin with, let me introduce the solution we are going to use to make the batch execution more serverless, using the GCP components:
Btw, all GCP components that will be used in this solution have a free tier =)
1 — Cloud Scheduler
This is the starting point of our batch process:
“Cloud Scheduler is a fully managed enterprise-grade cron job scheduler. It allows you to schedule virtually any job, including batch, big data jobs, cloud infrastructure operations, and more. You can automate everything, including retries in case of failure to reduce manual toil and intervention. Cloud Scheduler even acts as a single pane of glass, allowing you to manage all your automation tasks from one place.”
Go to this page to start your Cloud Scheduler configuration.
To configure our starting point, we need to begin creating a Cloud Scheduler Job, it’s really straightforward:
The important fields here are Name, Frequency, Timezone, Target, and Topic. For now, we don’t need to worry about the payload, just leave an empty JSON there because the field is required.
Pay attention to the Timezone and the cron expression at Frequency, that’s when your Target is going to be executed.
As a Target, we are going to use Pub/Sub, and when we press Create, we will be presented with our Scheduler Job information:
Cloud Scheduler enables you to even manually trigger the Target execution with the “Run now” option, but running it now will return an error. The execute-batch-process topic doesn’t exist yet, let’s fix that…
2 — Pub/Sub
We will fix that error by creating our Pub/Sub topic, it’s going to be our mediator:
“Cloud Pub/Sub is a fully-managed real-time messaging service that allows you to send and receive messages between independent applications.”
Go to this page to start your Pub/Sub configuration.
Creating a Pub/Sub topic is as simple as filling the topic name!
When we press Create Topic our Pub/Sub mediator is ready to extend the hand to our next component.
3 — Execution
Cloud Functions will be the one reaching for Pub/Sub’s hand!
“Google Cloud Functions is a lightweight compute solution for developers to create single-purpose, stand-alone functions that respond to Cloud events without the need to manage a server or runtime environment.”
So, let’s revise… once the Cloud Scheduler is triggered, it will publish a message to a Pub/Sub topic that will start the Cloud Function.
And finally, our Cloud Function is going to create a Compute Engine VM that will be responsible for running our batch workload.
This is the code we will use to run the Cloud Function.
Notice that we are using the
@google-cloud/compute client library, this is the core of our execution engine, and the method
createInstance is the one responsible for spinning up the machine that will run our batch process. The
vmConfig attribute will contain all the instructions for our VM workload, let’s dig into it…
For our VM workload, we have chosen a really simple use case, which is sending a “Hello World” message to Stackdriver Logging.
“Stackdriver Logging allows you to store, search, analyze, monitor, and alert on log data and events from Google Cloud Platform and Amazon Web Services (AWS). Our API also allows ingestion of any custom log data from any source. Stackdriver Logging is a fully managed service that performs at scale and can ingest application and system log data from thousands of VMs. Even better, you can analyze all that log data in real time.”
Let’s see the code that the VM is going to execute:
Basically, we are doing 3 things here…
Line 1: sending our “Hello World“ message to Stackdriver Logging and telling it to write to the
batch-execution key, so we can track it later.
Line 2: retrieving the
gcp_zone from the internal metadata endpoint that will be needed on the next line.
To know more about Retrieving instance metadata go to the official documentation.
Who’s the best actor to know that the VM finished executing? That would be the VM itself!
So here we are deleting the VM after it’s done with the batch workload. This is how we make it serverless, or as close as you can get.
Sounds good, but how are we going to put that script inside the VM and then configure it on our Cloud Function?
Go to this page to start your Compute Engine configuration.
By going to the UI, we are going to input the following fields:
Note: Remember to use a f1-micro instance, because it has a free tier.
On the image above, everything was left with the default value, we are just going to select a pre-configured service account named compute-execute-batch-job, this is important because the default service account doesn’t have the right permissions to delete Compute Engine VM’s.
Looking at how the service account was configured, best practices were followed by adding only the necessary roles for this kind of workload:
This is how the Service Account was created using the gcloud command line, you are also able to do this by using the Cloud Console.
To know more about Service Accounts and Roles go to the official documentation.
And after the service account, scrolling down on our compute engine UI, we have a Startup Script field where we are going to paste our workload script:
Remember one thing, it’s the Cloud Function that’s going to create our VM, so we are not going to click on the Create command, we will instead click on the Equivalent REST option, pointed on the image above.
By doing that, it will present us the full JSON that can be used in an API call:
Now that we have what we need to wrap up our Cloud Function with the VM configuration, let’s go back and update it, with the JSON from the REST call:
Note: Some fields from the JSON can be removed to use the default values, so it becomes smaller, but I want to show how simple it is, you can just copy and paste.
Remember to remove the double quotes from the key of the JSON attributes and make sure that the zone value being used in the
vmConfig attributes are the same as the
const zone in line 5. Also, replace the
const projectId to use your project.
We are almost there! We just need now to deploy our Cloud Function and connect it to the Pub/Sub topic, let’s do it using the UI.
Go to this page to start your Cloud Function configuration.
Looking at the UI:
Mainly we have 3 important fields: Trigger, Index.js, and Function to Execute.
Trigger: here we can select options to call the Cloud Function, we will choose our Pub/Sub Topic
Index.js tab: this field contains the code that will be executed when the Cloud Function is called, so paste the
create_instance_function code here.
Function to execute: use the name of the method that will be called when the Cloud Function run,
One small thing that sometimes is forgotten, since Google Cloud is managing everything for us with Cloud Function, we also need to provide the
Here’s the code:
We are adding it to the package.json tab:
To be good citizens, we will follow the best practices again and use a service account with just the amount of permissions that are needed:
The Service Account
function-create-vm was configured with a Service Account User Role and a Custom Role containing the following permissions:
Ok now we can press Create 😄, we are ready to test everything together!
We will see something like this:
I love it when I see a ✔️ icon!
If we go back now to our Cloud Scheduler Job, and trigger it manually we can see everything working together.
Go to the Compute Engine page after a few seconds and you will see a new VM running with the prefix batch-job-executor followed by the execution time, it’s a little trick so we always have a unique name, if we need to track problems later.
After a few more seconds you will see that the icon before the VM name changed, that’s because the VM is being deleted, once the deletion is done the VM will be gone from the instances page.
Finally, to make sure it actually did something, we are going to Stackdriver Logging page, and when we filter the
batch-execution key we can see our Hello World message! 👌🏻
Remember that this batch workload will be running on a scheduled basis according to the cron expression programmed at the Chould Sheduler Frequency.
And That’s It for today!
This is the first post of a series showing how to run batch workloads in a simpler way, using Google Cloud Platform. It’s important to point out that here we used Compute Engine, but you are also able to run batch processes using other components like App Engine with task queue, GKE and the newest member of GCP’s Compute family Cloud Run.
On this post, we showed a really simple batch workload to help you get started, and to make it serverless we made sure that we deleted the VM after it was done executing. Thinking about it we used other serverless components such as Cloud Scheduler, Pub/Sub, and Cloud Functions, the only difference was that GCP was managing the resources and creating and deleting those for us… Using a few words from a friend of mine: “A serverless solution is never serverless. there is always a server behind the scenes…”
Thank you for your time! And stay tuned for the next post, where we will show a more complex workload adding Docker and Container Registry to this solution. Cheers!
[Update] Part 2 has been posted:
[Update] Google announced a cloud-native batch manager
- Google Cloud Platform for AWS Professionals: https://cloud.google.com/docs/compare/aws/
- Reliable task scheduling on Compute Engine with Cloud Scheduler: https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine
- Google Cloud Platform Free Tier: https://cloud.google.com/free/
- Cloud Scheduler quickstart: https://cloud.google.com/scheduler/docs/quickstart
- Pub/Sub documentation: https://cloud.google.com/pubsub/docs/
- Cloud Functions documentation: https://cloud.google.com/functions/
- Compute Engine documentation: https://cloud.google.com/compute/
- Stackdriver Logging documentation: https://cloud.google.com/logging/