Automate the execution of BigQuery queries with Cloud Workflows

Márton Kodok
May 18 · 4 min read

Resolve repetitive queries with scheduled invocations, combine simple queries to complex DML statements by using Cloud Workflows instead of BigQuery Scripting.

Automate the execution of BigQuery queries with Cloud Workflows

SQL pipelines! In this article, we will explore how to use Workflows Connectors to run BigQuery query pipelines to execute a query, retrieve query results, loop through the resultset, and repeatedly launch subsequent queries.

Also, you could reuse these techniques to various business logics to execute a SQL pipeline process that combines several consecutive steps.

Introduction

We are going to define a problem that currently is not supported out of the box by BigQuery and is frequently asked on Stackoverflow.

Is there a way to run scheduled DML dynamic queries on wildcard tables such as Firebase analytics dataset?

Often people want to clean up the dataset of their unused events to have smaller tables, lower storage costs.

By trying to execute something like:

We get:
DML over table my-project.analytics_1234567890.* is not supported

As you notice there is a * (star) in the table definition. DML statements over wildcard tables are not supported, but this query runs fine on a single table. Let’s automate this.

To turn this into a working solution we will use Cloud Workflows.

What is Cloud Workflows?

In a nutshell, Workflows allows you to connect services together, anything that has a public API.

  • Workflow orchestration (step engine as a service)

Workflow Steps for our problem

  1. Run a query to get the table names

Main Workflow definition

Run a dynamic query and subsequent queries for each result.

Our main workflow definitions define BQ_Query and BQ_Results_LoopItems subworkflows. Let’s see how they are defined.

Workflow to execute a query and return results

As you may notice the call action is a Workflow Connector, more about the parameters can be found in the Cloud Workflow connector syntax, and on BigQuery API reference.

Workflow to Loop a resultset and run a task

This is a standard foreach loop template that you may have seen in Cloud Workflows. On the process_item internal step, we have the mention of BQ_Task that’s a subworkflow that gets executed for each result set.

BQ_Task requires the table_id parameter which is the table name from the resultset.

The previous step output looks like this JSON structure:

To extract the table names, we need to use ${items[i].f[0].v} path in JSON to get the table name like: events_intraday_20210511

If there are multiple columns in the query, you can address additional columns with the index of the zero-based column. So to get a 2nd column from the resultSet you need to use ${items[i].f[1].v} where “1” is column index.

We are almost done, we need the BQ_Task that’s arbitrary to our problem:

As you see this gets the table_id as a parameter and it uses the DML syntax on a single table which runs successfully on BigQuery. You may notice that it reuses BQ_Query subworkflow.

Deploying and executing Cloud Workflows

To deploy your workflow, you need the YAML file, which’s at the end of the article. You can deploy using Cloud Console, by API or with gcloud command line utility.

We recommend using VSCode as there you can set up the GCP Project Switcher extension, and also to define IDE tasks to automate, deploy, execute, and describe execution.

Once your workflow is deployed, you need to trigger the execution when you want to process your files. This could be a Cloud Scheduler job, manual execution triggered by your script, or by setting up an event observer based on Eventarc or Cloud Function trigger.

When you setup the invoker service account, you need to grant permissions to run BigQuery jobs.

Wrap Up

In the meantime, if you want to check it out, here are some links:

Feel free to reach out to me on Twitter @martonkodok or read my previous posts on medium/@martonkodok

Complete YAML workflow definition.

https://gist.github.com/pentium10/e9ce523bd0be0c83e3d3cdb12026ace3

Google Cloud - Community

Google Cloud community articles and blogs

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Márton Kodok

Written by

Speaker at conferences, a Google Developer Expert top user on Stackoverflow, software architect at REEA.net, co-founder IT Mures, life-long learner, mentor

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store