Running a serverless batch workload on GCP with Cloud Scheduler — Adding Docker and Container Registry to the mix

Marcelo Costa
Google Cloud - Community
8 min readSep 11, 2019

This quick-start guide is part of a series that shows how to leverage Google Cloud Platform components to run batch workloads in a simpler way.

If you need a bit of context before getting started, please take a look at the first part of the series, I wrote describing the architecture to get the batch workload running.

To begin with, let me introduce the solution we are going to use to run the batch workload this time, using the GCP components:

Image 1. Scheduled Batch Process architecture

If you read the first part you will notice that we have two new players, Container Registry and Docker into the mix.

The star of this article will be the Container Registry and Docker, which will enable us to run a much more complex batch job, we will talk more about it later on…

Image 2. Spongebob staring at a container

There are many articles, talking about working with Container Registry… Why should you read this one?
Instead of showing command lines and comparing approaches, I want to show you a working example on how to build a Continuous Integration around your batch workload and automate everything!

Image 3. Buzz Lightyear telling Woody to automate everything

Themes covered in this post will be:

  1. The Batch workload
  2. Connect to Source Repository
  3. Setting up the batch workload entrypoint
  4. Batch workload execution explained

Without further ado, let’s go!

You will be provided a github repository with the working example.

1 — The Batch workload

As the candidate for the more complex batch workload, I’ve chosen a combination of behave, which is a python library for running BDD techniques, and alphavantage: a set of free APIs for realtime and historical data on stocks, forex (FX), and digital/cryptocurrencies.

The GitHub repository with the code is: alpha_vantage_bdd

So, let’s take a look at the scenarios that will be executed:

Image 4. Feature files with scenarios that will be executed

by running the command, at the root of the directory:

behave features/ --tags=-wip

We should see the following output:

Image 5. Terminal showing results from behave execution

That’s really cool, but it’s a local execution, how do we send this code to our Google Cloud project and run the batch workload there?

Image 6. A person on the defensive, saying the code works on their machine

2 — Connect to Source Repository

We are going to use the Source Repository to extend our git workflow to GCP.

“Cloud Source Repositories are fully featured, private Git repositories hosted on Google Cloud Platform. Extend your Git workflow by connecting to other GCP tools, including Cloud Build, App Engine, Stackdriver, and Cloud Pub/Sub.”

Go to this page to start your Source Repository configuration. And once you select your repository name, which is alpha_vantage_bdd in this case, select Push code from a local Git repository, then there will be 3 options for pushing your code, I chose Google Cloud SDK.
Follow the instructions presented:

Image 7. Source Repo UI — Add code to your repository

What we did was push our Github repository: alpha_vantage_bdd to a Cloud Source Repository, which lives inside our Google Cloud Project. The Cloud Source Repo works as a remote repo for our origin repo.

You typed commands similar to the following:
git remote add google https://source.developers.google.com/p/my_project/r/alpha_vantage_bdd
git push google master

After pushing to the source repository you will be able to see this:

Image 8. Source Repo UI — Showing the created Repo

You are able to mirror your github directly instead of using it as a remote repository, following this guide, but I still prefer using it as a remote repository looking at the open issues: 73122477 and 133100479.

3 — Setting up the batch workload entrypoint

Now that we have the code inside GCP, we are going to build a Docker image for it. This is the Dockerfile:

The important file is the script that will be executed on the Docker’s entry point:

Every time the Docker container is ran (Which will be at the Compute Engine creation in our case), this entry point will execute the behave command and then send the output to Stackdriver Logging. After it’s done, the VM will be deleted.

Interesting and all… but what about the automate, automate everything lingo? How do we automatically build this Docker image?

We are going to use Cloud Build for that, and that’s where Container Registry will join the game!

“ Cloud Build lets you build software quickly across all languages. Get complete control over defining custom workflows for building, testing, and deploying across multiple environments such as VMs, serverless, Kubernetes, or Firebase.”

Let’s see the build flow with Cloud Build:

Image 9. Update Batch Process code architecture

1 — Code is pushed to the Cloud Source Repository. This happens whenever a git push google master is executed on the origin repo, that we configured previously.

2 — Cloud build is triggered by the commit on the master branch.

3 — Cloud build packages the docker image and stores it inside Cloud Storage.

4 — The image is marked in the Container Registry as the latest version.

To achieve that, go to this page to start the Cloud Build configuration:

Image 10. Cloud Build UI, showing Image name

Choose the created Source Repository, then it’s really simple, we are leaving all fields with the default value, but the image name. We are using the Dockerfile as the build configuration and marking it with the :latest label, so we always get an up to date image with the code.

Don’t worry, Cloud Build will store the previous images for you, in case you need a quick rollback.

Once we press Create trigger, Cloud Build will be connected to our Source Repository:

Image 11. Build Triggers UI, showing successfully created trigger

We can test the trigger by pressing Run trigger, and once it’s done, our Docker image will show up on Container Registry:

Image 12. Container Registry UI, showing the created Docker image

As you can see it’s tagged with latest, this is what will guarantee we always have the fresh Docker image on our Compute Engine VM!

4 — Batch workload execution explained

Now that we have the Docker image inside Container Registry, it’s a piece of cake, remember the cloud function we dived into the first post? We are going to use it again! We will just change the Compute Engine configuration, but let me show you the execution flow first:

Image 13. Execution Detailed

1 — Cloud Function is triggered by Pub/Sub and calls the Compute Engine API to create a VM

2 — Compute Engine retrieves the latest image from Container Registry and starts the VM

3 — The VM entrypoint starts the batch process that runs the automated tests for Alpha Vantage API’s

Once the automated tests are done, the output is sent to Stackdriver Logging.

To update the Cloud Function from the last post, let’s change the Compute Engine configuration, go to this page.

Image 14. Compute Engine UI, showing the configuration

Select the “Deploy a container image to this VM instance” checkbox, and if you are curious you can click on the “Learn more link”.
For the Container image, we are going to use the one created in the previous steps, that’s stored in the Container Registry. This is important, remember to use the syntax :latest so we are retrieving the up to date image.

At the bottom of the UI, click on the Equivalent Rest link, so we can get the configuration that will be used in our Cloud Function.

Image 14. Equivalent Rest Request popup, showing the VM configuration

Once we have it, go to the Cloud Functions UI, and update the vmConfig variable, replacing the startup script with the new gce-container-declaration configuration, this is how our code is going to look:

After that, hit the deploy button and we are ready!

If we go back now to our Cloud Scheduler Job, and trigger it manually we can see everything working together.

Just to remember, doing that will publish to our Pub/Sub topic, that starts our execution flow.

Image 15. Engines running

Press the Run Now button:

Image 16. Cloud Scheduler UI, showing the Run Now button

Go to the Compute Engine page after a few seconds and you will see a new VM running with the prefix batch-job-executor followed by the execution time, it’s a little trick so we always have a unique name, if we need to track problems later.

Image 17. Compute Engine UI. Created VM on the Left. VM fading away on the right

After a few more seconds you will see that the icon before the VM name changed, that’s because the VM is being deleted, once the deletion is done the VM will be gone from the instances page.

Finally, to make sure it actually did something, we are going to Stackdriver Logging page, and when we filter for the VM name we can see the results for the VM with the Container Registry Image! 👌🏻

Image 18. Stackdriver Logging UI, showing the execution results

One last thing! To show our Continuous Integration working, whenever we do a git push google master, Cloud Build will run and create a new Container Registry image for us, tagging it with latest. On the image below, you can see that only the most recent image is tagged with latest, that means next time Cloud Scheduler runs, it will pick up the new version!

Image 19. Container Registry UI, showing 3 images, and the recent one tagged with latest

And That’s It for today!

This is the second post of a series showing how to run batch workloads in a simpler way, using Google Cloud Platform.

On this post, we showed a more complex batch workload to help you get started, and to be able to easily update the batch workload we used a combination of Google Source Repositories, Cloud Build, Container Registry, and Docker.

Thank you for your time! And stay tuned for the next post, where we will connect with Pub/Sub once again, to decouple our batch workload results and show you how to send notifications to Google Chat! Cheers!

References

--

--

Marcelo Costa
Google Cloud - Community

software engineer & google cloud certified architect and data engineer | love to code, working with open source and writing @ alvin.ai