Using Cloud Workflows to load Cloud Storage files into BigQuery
In this article, we will orchestrate and automate Google Cloud with serverless workflows.
We will create a Cloud Workflow to load data from Google Storage into BigQuery. This is a complete guide on how to work with workflows, connecting any Google Cloud APIs, working with subworkflows, arrays, extracting segments, and calling BigQuery load jobs.
There are various ways to process Cloud Storage files to BigQuery such as using a Cloud Function, by Eventarc triggers to Cloud Run services, a relatively new syntax is by using BigQuery create external table statement or the good old way via BQ CLI tool.
These require you to maintain a function, a container, a library, or SDK up to date, which means they need maintenance.
We are going to use Cloud Workflows to connect Cloud Storage API with BigQuery Jobs API for loading files into tables. Using the techniques that we’ll cover for this part, you will have a foundation to build any kind of serverless automation in Cloud Workflows, in YAML syntax, without maintenance.
Note: To get started with Cloud Workflows, check out my introductory presentation about: Serverless orchestration with Cloud Workflows.
What is Cloud Workflows?
In a nutshell, Workflows allows you to connect services together, anything that has a public API.
- Workflow orchestration (step engine as a service)
- Integrate any Google Cloud API, SaaS API, or private APIs
- Out of the box authentication support for Google Cloud products
- Fully managed service — requires no infrastructure or capacity planning
- Serverless with Pay-per-use pricing model
- Declarative workflow language using YAML syntax
- Developer friendly built-in error handling with retries