How to Monitor BigQuery Usage and Costs on GCP with Interactive Visualizations in Looker Studio

5 min readApr 8, 2023

Monitoring BigQuery usage and costs is an essential aspect of managing data on Google Cloud Platform (GCP). BigQuery is a fully-managed, serverless data warehouse that can scale to petabyte-levels, making it easy to incur significant costs if you’re not careful. Fortunately, GCP provides several tools and services to help you monitor and optimize your BigQuery usage and costs. In this post, we’ll explore how to monitor BigQuery usage and costs using the Information Schema JOBS table and Visualize the information in Looker Studio for further Monitoring.

BigQuery Cost Monitoring Dashboard

The Information Schema JOBS table provides metadata about jobs executed within a BigQuery project. The query provided in this post retrieves information about the jobs executed in your project, including job type, data processed and billed, and job cost. With this query, you can gain insight into your BigQuery usage and costs, helping you optimize your queries and workloads to save money on your cloud bill.

Here’s the query:

SELECT 
project_id,
user_email,
SPLIT(user_email, '@')[OFFSET(0)] AS Name,
job_id,
destination_table.dataset_id AS destination_table_dataset_id,
destination_table.table_id AS destination_table_id,
DateTime(creation_time) AS Date_Time,
job_type,
statement_type,
priority,
query,
state,
total_bytes_processed,
total_bytes_billed,
(total_bytes_processed / 1000000000) AS Processed_GB,
(total_bytes_billed / 1000000000) AS Billed_GB,
((total_bytes_billed / 1000000000000) * 5) AS Cost
FROM region-us.INFORMATION_SCHEMA.JOBS
WHERE
Date(creation_time) >= '2023-03-01'
ORDER BY creation_time DESC, total_bytes_billed DESC

This SQL query selects information about jobs executed in BigQuery from the INFORMATION_SCHEMA.JOBS table, which provides detailed information about job execution. The query filters the results to include only jobs executed on or after March 1, 2023, and sorts the results by creation time and total bytes billed.

Let’s take a closer look at some of the columns returned by the query and their meanings:

  • project_id: the ID of the project in which the job was executed.
  • user_email: the email address of the user who executed the job.
  • Name: the name of the user who executed the job, extracted from their email address. The SPLIT function is used to split the user email address and retrieve only the user’s name.
  • job_id: the ID of the job.
  • destination_table_dataset_id: the ID of the dataset in which the destination table is located.
  • destination_table_id: the ID of the destination table.
  • Date_Time: the date and time when the job was created.
  • job_type: the type of the job (e.g. query, load, export).
  • statement_type: the type of SQL statement executed by the job (e.g. SELECT, INSERT, UPDATE).
  • priority: the priority level of the job.
  • query: the SQL query executed by the job.
  • state: the current state of the job (e.g. PENDING, RUNNING, DONE).
  • total_bytes_processed: the total amount of data processed by the job, in bytes.
  • total_bytes_billed: the total amount of data billed for the job, in bytes.
  • Processed_GB: the total amount of data processed by the job, in gigabytes (GB).
  • Billed_GB: the total amount of data billed for the job, in gigabytes (GB).
  • Cost: the estimated cost of the job, based on the amount of data billed and a flat rate of $5. (total_bytes_billed / 1000000000000) * 5. This formula converts the total_bytes_billed value from bytes to terabytes, and multiplies it by 5 to calculate the estimated cost. Since BigQuery costs around $5 per Terabyte of data processed. BigQuery Cost References.

Update: Google updated their pricing for data storage a few months back. new pricing is around $6.25 per terabyte (TB), which varies depending on the location of the data. Previously, it was a flat rate of $5 per TB.
Additionally, Google has introduced a storage cost, which is not included in the query.

In conclusion, the query provided in this post is a useful tool for monitoring and checking the cost of DML queries in GCP BigQuery. By using this query, you can optimize your usage and reduce costs, as well as identify expensive queries and users who generate high costs.

Now Let's Visualize BigQuery usage and costs in Google Data Studio (Looker Studio) using the previous SQL code. It can help you gain better insights into your data and identify areas for cost optimization. Google Data Studio is a powerful data visualization and reporting tool that allows you to create custom dashboards and reports using data from various sources, including BigQuery.

To get started, create a connection to BigQuery in Google Data Studio. Navigate to the “Resources” section and select “Manage added data sources.” From there, select “Add a data source” and choose BigQuery as the database type. Follow the prompts to connect to your BigQuery project.

Next, In the “SQL” tab of the Data Source editor, enter the query used to monitor BigQuery usage and costs. Copy and paste the query from our previous post to ensure accurate results.

Once the query is entered, you can start building your visualization. I am not going into the details of the Dashboard design in this post. Let me know if you are interested in the comment section.

Here is a sample Dashboard Design that you can follow:

In this visualization, different tables describe different information, Let's describe the tables.

  1. Day on Day cost: This visualization will show the total cost of BigQuery usage each day, allowing for easy identification of trends and any spikes in cost that need to be addressed.
  2. Total jobs and the average cost per job for each person: This visualization will show the number of jobs run by each person in the project, along with the average cost per job, allowing for easy identification of any outliers or patterns that could indicate areas for improvement.
  3. Timely cost distribution: This visualization will show the cost of BigQuery usage throughout the day, allowing for easy identification of peak usage times and any potential cost-saving opportunities.
  4. Day on day per person Query Performance: This visualization will show the daily usage details of BigQuery for each person in the project, providing a comprehensive view of usage patterns and costs.
  5. Details Query table: This visualization will show the details of specific queries that have used the most resources and incurred the most cost, allowing for easy identification of any inefficiencies or areas for optimization. The visualization will also show who ran the queries, providing valuable insights for team management and training.

These five visualizations provide comprehensive insights into BigQuery usage and costs, allowing project managers and administrators to effectively monitor usage patterns and identify areas for optimization. With the ability to visualize daily costs, job details, hourly usage distribution, and specific query details, teams can gain a complete understanding of their BigQuery usage and take action to minimize costs and maximize efficiency. These visualizations help streamline data management and ensure that BigQuery usage is optimized for both cost and performance.

  • To delve into more projects and stories, visit my portfolio.
  • For any inquiries, feel free to reach out to me on LinkedIn.

Thank you.

--

--

Niloy Biswas
Niloy Biswas

Written by Niloy Biswas

A Data Science Enthusiast. Working as a Data Analyst at 10 Minute School.

Responses (3)