How to run serverless batch jobs on Google Cloud

Use AI Platform for functions that take longer than a couple of minutes

Lak Lakshmanan
Sep 30, 2019 · 2 min read

Cloud Functions, Cloud Run, AppEngine, etc. are not a good choice for long-lived functions, i.e., anything that takes longer than a couple of minutes to run (the services themselves impose a limit of 10 or 15 minutes, but that includes errors and retries, so your goal should be 2–3 minutes maximum). If you want to run a function that will take longer than this, what are your options?

Image for post
Image for post

What if you want to run a long-running batch job in a serverless way?

Put your code in a Docker container. Run it using AI Platform. Schedule it using Cloud Scheduler.

Custom containers in AI Platform Training

You can use AI Platform Training to run any arbitrary Docker container — it doesn’t have to be a machine learning job. To have some arbitrary container executed on a GPU, you’d just do:

This is just a REST API, so you have a variety of client libraries in a bunch of programming languages to invoke this from. There are no requirements for the container — just that it needs to have an entry point and that it is published in the container registry. It is possible to use custom machine types — see the documentation for details.

Concurrent autoscaling?

Being able to launch a custom container on a job-specific cluster satisfies a number of use cases for serverless functions. But not all of them. Specifically, another use case for serverless functions is concurrent autoscaling — we want to be able to receive multiple requests, and route them to the same machine and once that machine starts to get overwhelmed, we’d like to add more machines. If you need concurrent autoscaling and your tasks last longer than 2–3 minutes, the AI Platform Training solution will not work. You’ll need Kubernetes in that case, and it won’t be serverless.

Google Cloud - Community

Google Cloud community articles and blogs

Lak Lakshmanan

Written by

Data Analytics & AI @ Google Cloud

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Lak Lakshmanan

Written by

Data Analytics & AI @ Google Cloud

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store