Update Google Data Catalog Tags with Cloud Dataprep Metadata

This post is linked to the Github repository https://github.com/victorcouste/google-data-catalog-dataprep explaining how to create or update Google Cloud Data Catalog tags on BigQuery tables with Cloud Dataprep Metadata and Column’s Profile via a Python Cloud Function.

The 2 Data Catalog tags created or updated:

  • Dataprep Job Metadata tag attached to the BigQuery table and containing information from the Dataprep job used to create or update the BigQuery table : the user, Dataprep Job (id, name, url, timestamp), Dataprep Dataset (id, name, url), Dataprep Flow (id, name, url), Job Profile (url and number of valid, invalid an empty values) and the Dataflow job (id, url).
Example of a Cloud Dataprep Metadata Tag in Data Catalog
Example of a Cloud Dataprep Column Profile Tag in Data Catalog

To activate, learn and use Cloud Data Catalog, go to https://cloud.google.com/data-catalog and https://console.cloud.google.com/datacatalog.

The Github repository contains the Cloud Function Python code triggered from a Dataprep Webhook to create or update the 2 Data Catalog tags.

This Cloud Function uses:

In your Cloud Function, you need the 5 files:

Before running the Cloud Function (and create or update tags), you need to create the 2 Data Catalog Tag Templates for Dataprep (Job Metadata and Job Column Profile).

Cloud Dataprep Metadata Tag Template
Cloud Dataprep Column Profile Tag Template

For this action, you can use:

Then, when the Cloud Function has been created, to use it you just have to pass the Dataprep Job ID in a JSON format like {"job_id":"7827359"}.

And to trigger it from a Cloud Dataprep flow, you can use a Webhook on the Cloud Function endpoint with {"job_id":"$jobId"} in the POST body.

Cloud Dataprep Webhook to call the Data Catalog Cloud Function

When Data Catalog template tags are created and when tags are created or updated on BigQuery tables, you can find all results from the GCP console interface https://console.cloud.google.com/datacatalog.

Finally, you can also search BigQuery tables in Cloud Data Catalog with a Dataprep tag from your own application like https://github.com/victorcouste/dataprep-datacatalog-explorer

Happy wrangling and happy tagging !




A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Recommended from Medium

Hackathon, Definitely not Evil

UBXT Deposit on Binance Smart Chain is now live!

Dev Blog — Moving away from Trello

Installing a MySQL NDB Cluster 8.0 on Rocky Linux 8

Enhance app user experience with Universal Links

Sample App Flow

Awesome Colleagues I Met This Bootcamp

Announcing our Integration with ClaimSwap

FreeBSD Jails Quick Start

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Victor Coustenoble

Victor Coustenoble

Data Fan / Starburst Data

More from Medium

Version Control of BigQuery schema changes with Liquibase

Tracking And Analyzing Device Connections in Google Cloud IoT Core

Data Workflow Modernization