Understanding BigQuery billing and optimizing it

David Eller
Indicium Engineering

--

Google BigQuery is a fully managed, serverless data warehouse that enables super-fast SQL queries using the processing power of Google’s infrastructure. A powerful tool for data analytics, but it can also be a significant expense for businesses. If you’re not careful about how you use the service, you could find yourself with a surprisingly high bill at the end of the month.

Understanding how BigQuery billing works can be a bit confusing at first and this post will help you understand the details behind it and give you some extra tips on how to reduce your BigQuery costs and get more value out of the service.

BigQuery Pricing

The 2 most important lines of costs in BigQuery are: Storage and Analysis. Note that BigQuery also charges for other operations such as BigQuery Omni, BigQuery ML, BI Engine and streaming reads and writes. In this post, we’ll focus on the main lines: storage and analysis.

An important thing to note is that BigQuery prices (and for most cloud services) depend upon the location! In this post, we’ll consider the US region and prices on January 10, 2023!

1. Storage pricing

Storage pricing is related to how much data you store into BigQuery. The more data you store, the more you’ll pay.

They are divided in 2 factors:

1.1 Storage Time:

1.1.1 Active Storage: includes any table or table partition that has been modified in the last 90 days

1.1.2 Long Term Storage: includes any table or table partition that has not been modified for 90 consecutive days. The price of storage for that table automatically drops by approximately 50%. There is no difference in performance, durability, or availability between active and long-term storage.

*Modified meaning executing one of these actions in a table or partition of a table:

Note that operations such as:

  • Querying a table
  • Creating a view that queries a table
  • Exporting data from a table
  • Copying a table (to another destination table)
  • Patching or updating a table resource

Won’t reset the timer.

1.2 Byte Type:

1.2.1 Logical bytes: logical bytes are uncompressed and half of the price of physical bytes. They are the default unit of consumption for storage and this type of unit of consumption also includes time travel bytes.

1.2.2 Physical bytes: physical bytes are compressed. They represent the actual size of memory in disk and they cost double the amount of logical bytes. You can switch from logical bytes to physical but once doing that, you can’t go back.

Now that you know the 2 factors that influence BigQuery storage pricing, here is a summary of it’s values and combinations:

So, let’s head to:

2. Analysis pricing

Analysis pricing refers to the cost to process queries. This includes mostly SQL queries (in BigQuery console or data visualization tool) but also user-defined functions, scripts, and certain data manipulation language (DML) and data definition language (DDL) statements that scan tables.

BigQuery offers 2 pricing models:

2.1 On-demand pricing: this is a pay-as-you-go pricing model. BigQuery charges for the number of bytes processed by each query and the first 1TB for month is free. In this pricing model, you can use up to 2000 concurrent slots.

2.2 Flat rate pricing: unlike on-demand pricing, flat rate pricing charges for renting computing capacity, or slots as BigQuery calls it. You can commit to rent computing capacity over different periods of time and the longer you commit, the lower are the prices. The plans are:

2.2.1 Flex slots: You commit to an initial 60 seconds.

2.2.2 Monthly: You commit to an initial 30 days.

2.2.3 Annual: You commit to 365 days.

Bigquery allows you to combine both models of pricing, for example, using flat rate pricing for predictable jobs and on-demand for other use cases.

The prices are:

On demand:

Flat rate flex slots:

Flat rate monthly:

Flat rate annually:

Now that you know everything about BigQuery pricing, let’s move on tips on how to optimize then.

Optimizing BigQuery Billing

Avoid “select *” expressions! A way to reduce your costs is to only query the data that you need. If you have a large table with a lot of columns, but you only need to query a few of them, you can save money by only selecting the columns that you need in your query. This will reduce the amount of data that is scanned, which will turn to reduce your costs. You can also use filters to further reduce the amount of data that is scanned. For example, if you only need to query data for a specific time period or for a specific group of users, you can use a WHERE clause to filter the data accordingly.

Use aggregated tables: Evaluating the table's granularity is a good practice when developing models in a Data Warehouse. Transactional data is very often grouped in dimensions to be analyzed in dashboards and rarely consumed in its most granular form. For instance, if you have an order table and each row in the table is an order, consider, for example, grouping by day and calculating the main metrics as counting distinct orders, summing the gross value, summing net value etc.

Use data samples for developing: It’s counter intuitive but using the “limit” when querying in BigQuery won’t reduce the amount of data that is scanned in a job! If you want to save processing and money, you can use the sample function. This is very helpful when developing and exploring data! You can also check this blog post if you’re working with dbt and BigQuery.

Use the appropriate data types: One of the easiest ways to reduce your BigQuery costs is to use the appropriate data types for your columns. Each data type has a different storage size, so using a data type that requires less storage can significantly reduce the amount of data that is stored and processed in a query, which will in turn reduce your costs. For example, if you have a column that only contains integers, you should use the INTEGER data type instead of the STRING data type. Similarly, if you have a column that only contains Boolean values (true/false), you should use the BOOLEAN data type instead of the STRING data type. You can check all data types available in BigQuery here.

Use partitioned and clustered tables: If you have a large table that you query frequently, you can save money by partitioning and clustering the table. Partitioning allows you to divide the table into smaller pieces, called partitions, based on a specific column or set of columns. When you query a partitioned table, you can specify which partitions you want to include in the query, which can be more efficient and cost-effective than querying the entire table. For example, if you have a table that contains data for multiple years and you only need to query data for the current year, you can specify the partition for the current year in your query and only those rows will be scanned, rather than the entire table.

Use _TABLE_SUFIX_ when querying tables with wildcard functions: Although partitioning and clustering table is more efficient than sharding (check this link by BigQuery), if you use sharding strategy, make sure you use _TABLE_SUFFIX to filter the amount of data scanned.

Use cached results: If you have a query that you run frequently, you can save money by using the query cache. The query cache stores the results of queries in memory, allowing you to retrieve the results more quickly the next time the query is run. This can be particularly useful for queries that take a long time to run or that are resource-intensive, as it can significantly reduce the amount of time and resources required to run the query. Cached results are activated by default in BigQuery but it’s necessary to follow a list of conditions so BigQuery uses the cached result. You can check more on these conditions here.

Optimize your queries: Another way to reduce your costs is to optimize your queries. There are several ways you can do this, such as using nested and repeated fields, avoiding self joins and cartesian joins. Checkout the best practices listed by BigQuery.

Consider using alternative storage options: If you have a large amount of data that you need to store, but you don’t need to query it frequently, you may want to consider using alternative storage options such as Cloud Storage or Cloud Bigtable. These options can be less expensive than BigQuery for storing large amounts of data that don’t need to be queried often.

Use flat-rate pricing: If you have a predictable workload and you know that you will be using BigQuery on a regular basis, you may want to consider switching to flat-rate pricing. With flat-rate pricing, you pay a fixed fee for a certain amount of query processing capacity, which can be more cost-effective than paying for on-demand.

Set expiration time for you tables: It's possible to set a custom expiration time for tables and datasets in BigQuery and it can be used when only recent data is necessary or in development datasets, for example.

Use custom quotas: BigQuery lets you be creative when applying quotas. You can set many custom quotas in BigQuery, such as how many bytes a user can query per day, how many bytes can be scanned in a project per day, the maximum size of a table in BigQuery and even force users to only be able to query in a table if they use a filter in a partitioned field!

By following these tips, you can significantly reduce your BigQuery costs and get more value out of the service. Remember to regularly review your usage and costs to ensure that you are using BigQuery in the most cost-effective way possible.

References:

--

--

David Eller
Indicium Engineering

Data Product Manager | GCP | dbt | Analytics Engineer