BigQuery Cost per Query Visibility with Labels
BigQuery ranks at number three in top Google Cloud spending. Given it is a high contributor to cloud costs, users are looking for ways to cut costs by optimizing their expensive queries. However, pinpointing which specific queries are driving the highest costs can be a challenge. This is where labels come into play.
By leveraging labels on query jobs you can filter to see the spending of each query job, thereby gaining insight into which queries may need to be optimized.
BigQuery Pricing
BigQuery pricing is relatively simple, primarily consisting of compute pricing (the cost to process queries) and storage pricing (the cost to store data). It gets a little more complicated if you consider additional features or the free tier. For our purposes, we’re just going to focus on on-demand compute pricing.
BigQuery on-demand compute pricing is $6.25 per TiB with one free TiB per month.
For companies running many queries, these costs can add up and users look to understand ways to cut costs on costly queries. The challenge lies in the fact that BigQuery does not provide a direct way to view the cost associated with each query. While you can see the overall cost incurred during a particular period or try to estimate it yourself, breaking down these costs on a per-query basis is not readily available through the native BigQuery interface.
BigQuery Labels
Labels can be applied to query jobs and provide an excellent way to filter and gain insights into your query costs. The difference between query jobs and queries in BigQuery are as follows:
- Query: The SQL statement or request that you submit to the system to retrieve, manipulate, or analyze data.
- Query Job: The complete process of executing a query, which includes the submission of the query, its processing by BigQuery’s infrastructure, and the retrieval of the results.
So by adding a label to a query job you can track the total cost associated with queries in that query job label. However, be mindful of some limitations when determining if it is the best solution.
- Non-Retroactive: You cannot apply the labels retroactively as they can’t be added or updated for pending, running, or completed jobs.
- Manual Labeling: While adding labels is straightforward, it can be a tedious process for large amounts of query jobs because you will need to add labels to query jobs individually.
Labels are formatted KEY:VALUE
, where keys must be unique. Keys and values “can contain only lowercase letters, numeric characters, underscores, and dashes”.
Add Labels
Labels can be added from the bq command-line tool or API.
Pre-Requisites
Before you can start adding labels, ensure that you have been granted the necessary access permission, the BigQuery User (roles/bigquery.user
) IAM role.
Add Via API
Use the jobs.insert
method and populate the labels
property in the job configuration.
Add Via bq
Add the --label
flag to your bq query` command.
Example — Add Via bq
This query retrieves NYC 311 calls from a public dataset on a given date. By adding the --label
flag, the query will be identified with the “purpose” key and “6-15-311-analysis” value.
bq query --nouse_legacy_sql --label=purpose:6-15-311-analysis \
'SELECT
COUNT(*) as `311_count`
FROM
`bigquery-public-data.new_york_311.311_service_requests`
WHERE
DATE(`created_date`) = DATE(2021, 06, 15)'
View Labels In Vantage
To view queries with the labels use the filter feature to sort by the tag(s) or group by your tag.
After applying you will be able to see the cost for queries with those filters.
Note there may be a two day delay in seeing resources appear.
Conclusion
Optimizing query costs in BigQuery is essential for cost control and efficient resource utilization. The ability to add labels to your query jobs provides a valuable tool for tracking and managing costs. By identifying the most expensive queries, you can proactively optimize them and cut unnecessary expenses.
Originally published at https://www.vantage.sh on November 7, 2023.