Using BigQuery (and BigQuery ML) from Kubeflow Pipelines

Use the Python function to component capability

Lak Lakshmanan
Google Cloud - Community
3 min readApr 21, 2020

--

These days, when someone asks me the best way to set up a machine learning development to operationalization workflow, I point them to Kubeflow Pipelines (KFP). On Google Cloud, Cloud AI Platform Pipelines provides a managed experience for KFP, so that you don’t have to muck around with Kubernetes clusters.

Python code to invoke BigQuery

We start by writing a function that uses the BigQuery Python client to run a BigQuery query that creates a table and returns the table or model name:

Creating a container using function to container

In Kubeflow Pipelines, each step (or “operation”) needs to be a container. Fortunately, taking the Python function above and making it a container is as easy as:

Notice that I am specifying the Python packages that the function depends on. Also look carefully at the Python function itself — any non-standard packages are imported inside the function definition.

Using the container

Given the container above (ddlop), we can use it to execute any table or model creation query we want. For example, here’s a query to train a model, invoked as part of a pipeline:

Writing a pipeline

We can string together such queries into an ML pipeline:

Now, you can create a zip of this pipeline and submit it to the ML Pipelines cluster to invoke a new Experiment Run.

Try it out!

The full code for this article is on GitHub. To try out the notebook:

  • Create an instance of AI Platform Pipelines by following the Setting up AI Platform Pipelines how-to guide. Make sure to enable the access to https://www.googleapis.com/auth/cloud-platform when creating a GKE cluster.
  • Create a Notebook instance (any version) by going to AI Platform / Notebooks in the GCP console.
  • Clone my notebook (above)
  • Change the first cell to reflect the hostname of your KFP cluster.

Enjoy!

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Lak Lakshmanan
Lak Lakshmanan

Written by Lak Lakshmanan

articles are personal observations and not investment advice.

Responses (1)