With scripting now available in BigQuery, there are new ways to analyze script-related job events. For example, you can track job metrics, such as the number of rows affected for data manipulation language (DML) queries associated with a script, for more detailed information on how your scripts are performing.
The new version of logs for BigQuery, BigQueryAuditMetadata, provides rich insights into the execution of your scripts. With the combination of Cloud Logging and scripting in BigQuery, you can skip manual logging job event data and go straight to analyzing it. This data can give you insight into your script performance, modifications of your data, and more.
This post is a step-by-step guide on how to monitor your ELT BigQuery jobs via the following steps:
- Set up a Cloud Logging sink to BigQuery that will export BigQuery script events.
- Run a pre-written SQL view that will transform the exported data into queryable fields.
- Run queries that run on top of the view to analyze exported job events.
Tracking results of DML statements is commonly achieved by reading system variables after each DML statement finishes, and then writing their values over to a separate logging table. However, this approach isn’t always optimal. Take the script below as an example:
You can see that you have to set system variables and write a logging call to store the state after each DML statement in the script. This can become very tedious if you’re running thousands, if not more, scripts in a day.
With BigQuery, you no longer have to log your SQL statement results because Cloud Logging allows you to store, search, analyze, monitor, and alert on all your BigQuery scripting activity.
1. Extract and load: Export Cloud Logging logs
You can also use INFORMATION_SCHEMA to get job metadata, but the retention period is 180 days. The retention period for data access logs in Cloud Logging is 30 days. The benefit of exporting logs is that they are retained for longer than the retention period.
Creating your Cloud Logging filter
First, navigate to Logs Viewer and switch to an advanced filter. Now you can create a filter that only captures BigQuery script events.
Here’s the filter we will be using — paste the filter into the advanced filter field.
What do all these things mean?!
Here’s a quick overview of these components of this filter:
Creating your BigQuery sink
With this filter, you can set up a destination for the job events that match this filter using a BigQuery sink. Follow this guide to set up a sink and be sure to choose BigQuery as the sink service. It is also strongly recommended to enable the option for partitioning, because it will partition your data by timestamp of events, improving query performance when you run queries on the exported data.
Note: Any time a sink filter is created or updated, any logs generated from BigQuery events prior to filter creation or update will not be captured by the sink. Only the logs generated after the application of a filter will be captured.
Creating your BigQuery script
Now, you can create your BigQuery script. One way to do this is by writing a stored procedure in BigQuery like the example below.
CREATE OR REPLACE PROCEDURE `project_id.dataset_id.script_name`()
2. Execute the script to generate job event data
Call the script in Cloud Shell
Once the Cloud Shell opens, create a SQL file. Open the file, paste the line below, and replace dataset_name.script_name with the path of where your script was created in BigQuery.
Save and exit the file. Now you can execute the script by executing this file. Copy and paste the command below into the Cloud Shell.
bq query — nouse_legacy_sql < YOUR_SQL_FILE — label time:$(date +%s) — label script_name:YOUR_SCRIPT_NAME
This is an explanation of the labels.
When you run this command, you’ll see that the BigQuery sink is populated with job events from Cloud Logging.
Transform: Using a view to parse the job event data into queryable fields
Use the new view to query your job data
Now that there’s log data available in BigQuery, you can more easily query these logs. If you take a closer look at the table schema, you can see that the logs are represented in JSON format. Querying BigQueryAuditMetadata fields would require unnesting the exported log entries. We can do this with JSON functions in BigQuery, but this can get complex. Instead, this view — big_query_elt_script_logging.sql, which acts as an interface between Cloud Logging and BigQuery — can be used to transform your exported log data and allows you to write simple queries on top of the view. Follow these instructions to use the view.
With the new version of Cloud Logging available and scripting supported by BigQuery, there is a new opportunity to monitor query performance for scripts. Setting up a pipeline like we did here enables you to view your scripting jobs data in a simpler way.
Now that log fields are queryable, you can write queries on top of the view to get insights on BigQuery script events. To get started, check out sample queries in this README for the view.
You can even take this a step further and use a BI tool such as DataStudio to visualize your jobs’ data and performance.
To learn more about BigQuery, visit our site.