Visualising BigQuery Query Plans

Stephan Meyn
Google Cloud - Community
5 min readMay 16, 2019

Tldr: visualise BigQuery execution on https://bqvisualiser.appspot.com

BigQuery is a powerful tool to query very large (tera and peta byte scale) data sets using standard SQL. It does so by parallelizing the query into multi sharded workloads, achieving amazing scale and speed.

Yet sometimes queries take longer than imagined. The BigQuery UI gives some support in understanding how the query was executed but it can still be a challenge understanding what happened

That’s a big query

I had a customer who runs a complex query with multiple subqueries and joins. The query took hours to complete and we needed to understand what happened. So I ended up writing a tool that takes the query plan and displays it as a tree helping us tracking down what was happening. This is how BigQuery Visualiser was born.

How does BQVisualiser work?

BqVisualiser displays the queryplan of a BigQuery query. The queryplan is stored with every query and is accessible from the BigQuery console. You can download queryplans from the command line onto your PC. BqVisualiser gives you the ability to directly retrieve query plans for your projects via the BigQuery REST API.

Access Bq Visualiser on https://bqvisualiser.appspot.com

Some Use Cases where BqVisualiser helped us

Inadvertent Use of WITH clause

The WITH clause is powerful simplifier of the query syntax. However it needs to be understood that it is the equivalent to a macro and each reference to a WITH clause in a subsequent SELECT causes that SQL to be individually executed.

Here is a conceptual structure of the query that was causing a client trouble:

The top WITH clause causes 1.3 billion records being read and passed on. This WITH clause is referenced by two other WITH clauses which in turn are referenced again.

The query analyzer promptly unfolds all the WITH clauses and the actual query looks more like this:

As you can see the 1.3 billion records are read 6 times — not good.

Loading the queryplan into BigQuery Visualiser shows the complexity in all its detail:

You can see how the query graph seems to repeat itself several times, it looks like a fractal, albeit a mean one….

Based on this insight the client decided to materialize the top queries in intermediate tables which improved the results considerably.

Same query suddenly takes longer

A client runs a query every night that takes just under 3 hours to complete. Suddenly that same query started taking over 4 hours.

BqVisualiser provides a Gantt view of the execution. Looking at it, it was obvious that most of the join steps ran 30–40% longer in the new run compared to the old run.

In particular one step, that contained joins and a group by over a total of 16 columns, took 1 hour 20 minutes vs 56 minutes previously. Looking at the the average wait in the slow one was 36 minutes vs 25 seconds

Fast run of join/group by
Slow run of join/group by

That resulted in us looking at the overall slot milliseconds vs elapsedMs in the execution statistics tab of the queryplan:

Optimal query:

Slow Query

The totalSlotMs metric is a measure of resource usage. It provides an insight of how much compute capacity is used for how long.

The difference in estimated slots in the two instances is a serious indicator that there is a contention for available slots, which is also supported by the excessive wait times for workers. This customer has a flat rate subscription of 2000 slots (think of it like a query cluster reserved just for your organisation) and it turned out that another job was running at the same time (around midnight). As the BigQuery service tries to allocate slots fairly to all jobs, it resulted in this query having less slots available and hence it took longer to complete.

This was fixed by reviewing the scheduling of the queries by the customer as this particular query was critical for its daily reporting.

Using BQVisualiser

The easiest way to use BqVisualiser is to access it on https://bqvisualiser.appspot.com

You will be required to log in to Google Cloud.

Open a Query Plan from Google Cloud

  1. Select the tab labelled “Select Job”
  2. Click on the “Get Projects” button
  3. (optionally) type in a filter expression
  4. Select a project in the drop down tab
  5. Click List Jobs

Note: If you don’t want to authenticate here you can also upload a queryplan. QueryPlans can be downloaded using the bq command:

bq - format=prettyjson show -j bquxjob_yourjobid > mybqjob.export.json

You will then see a list of queries below. Click on the Get button for the query you want to investigate.

The Tree tab will open with the query plan

The Tree View

The main view is the tree view:

Tree View

The top node represents a data source. All other nodes are BigQuery processing stages.

Selecting a single node gives the details for that stage:

Stage Details Display

Timing Tab

This tab gives a Gantt chart view of the execution time.

Gantt Chart of Stage execution timings

In this example the Join+ stage runs for 0.7 seconds

Bottom Tabs

At the bottom is a set of tabs displaying details for the overall query:

  1. Overview
  2. Status
  3. SQL used
  4. Progress Timings
  5. Statistics
  6. Query Settings

BqVisualiser is Open Source

BQVisualiser is open sourced. It is an Angular SPA that you can host on any appserver/webserver. It can be found on

https://github.com/GoogleCloudPlatform/professional-services/tree/master/tools/bq-visualizer

I keep a fork on my own github repo: https://github.com/smeyn/professional-services/tree/master/tools/bq-visualizer

(this one tends to have the newer versions with bugfixes)

Find out more about BigQuery

Read this great article by BigQuery’s PM: The 12 Components of Google BigQuery

--

--