BigQuery Storage Billing Models

Ash Broadley
Appsbroker CTS Google Cloud Tech Blog
4 min readJul 7, 2023

Can you save on your BigQuery Storage costs? Let’s see by exploring the different pricing models and how to use the information available.

Introducing Compressed Storage

Announced at the same time as BigQuery Editions, Google now allows customers to opt for Compressed Storage, meaning that the amount they pay is determined by the physical storage space taken up by their data. Previously, customers paid for their data storage at its “logical” size.

To see the differences between the sizes and costs of logical and physical storage in the BigQuery console, simply select any table in a dataset and under the “Details” tab, you can see the “Storage info”.

Comparison of logical vs physical storage costs showing the compression ratio

In the above example, we can see the total logical size is almost 18TB, while the total physical size is only 1.13TB - a compression ratio of 16:1. That’s impressive by any standard! Using this information, we can simply plug those numbers into the Google Cloud Pricing Calculator and see the costs associated — It’s worth noting this example assumes europe-west2 region and no discounts.

How do I get wider visibility?

Being able to inspect a single table is nice and all, but in isolation, it’s useless — especially if you have a large estate.

All the information we need is available in the INFORMATION_SCHEMA.TABLE_STORAGE view. We can create a pretty simple query to total the storage on a per-dataset level.

DECLARE active_logical_gib_price FLOAT64 DEFAULT 0.02;
DECLARE long_term_logical_gib_price FLOAT64 DEFAULT 0.01;
DECLARE active_physical_gib_price FLOAT64 DEFAULT 0.04;
DECLARE long_term_physical_gib_price FLOAT64 DEFAULT 0.02;

WITH
storage_sizes AS (
SELECT
table_schema AS dataset_name,
-- Logical
SUM(active_logical_bytes) / power(1024, 3) AS active_logical_gib,
SUM(long_term_logical_bytes) / power(1024, 3) AS long_term_logical_gib,
-- Physical
SUM(active_physical_bytes) / power(1024, 3) AS active_physical_gib,
SUM(active_physical_bytes - time_travel_physical_bytes - fail_safe_physical_bytes) / power(1024, 3) AS active_no_tt_no_fs_physical_gib,
SUM(long_term_physical_bytes) / power(1024, 3) AS long_term_physical_gib,
-- Restorable previously deleted physical
SUM(time_travel_physical_bytes) / power(1024, 3) AS time_travel_physical_gib,
SUM(fail_safe_physical_bytes) / power(1024, 3) AS fail_safe_physical_gib,
FROM
`region-europe-west2`.INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT
WHERE total_logical_bytes > 0
AND total_physical_bytes > 0
-- Base the forecast on base tables only for highest precision results
AND table_type = 'BASE TABLE'
GROUP BY 1
)
SELECT
dataset_name,
-- Logical
ROUND(
ROUND(active_logical_gib * active_logical_gib_price, 2) +
ROUND(long_term_logical_gib * long_term_logical_gib_price, 2)
, 2) as total_logical_cost,
-- Physical
ROUND(
ROUND(active_physical_gib * active_physical_gib_price, 2) +
ROUND(long_term_physical_gib * long_term_physical_gib_price, 2)
, 2) as total_physical_cost
FROM
storage_sizes

Whilst the query doesn’t look simple, all it’s doing is creating a base table from information in the view, with the data we need. It then aggregates and creates the associated per-dataset cost table.

TABLE_STORAGE query results

To bring this to life, we could take this one step further and set up a Looker Studio dashboard to monitor and view this information at a glance. Maybe something a little like the dashboard below!

Looker Studio dashboard showing BigQuery storage sizes & costs

What affects the compression ratio?

There are a few factors that come into play, including:

  • Cardinality — The number of distinct values in a set. The higher the cardinality, the less compressible the data, as each distinct value has to be stored.
  • Correct data types — Using the right data type for your data. If you’re storing everything as a STRING and not taking advantage of the native data types available in BigQuery, then you’re probably wasting a lot of space.

What’s the catch?

There isn’t one. It’s important to note that the introduction of Compressed Storage enables Google customers to pay for the data they are storing based on its physical (compressed) size, rather than its logical (uncompressed) size. There is no other change or impact on capacity or performance. Data has always been compressed in BigQuery through the use of the Capacitor storage format.

There are some caveats, however…

  • The physical storage billing can only be enabled at a dataset level. If you have a number of tables within a dataset, you need to consider the total storage size of the dataset as a whole. While you may have a number of tables that compress well, there may be others that don’t, which will drag the ratio down. Given that the pricing for physical storage is almost double that of logical storage, it may not be worth the change.
  • If you change your mind, or the physical billing model proves more expensive, then you have to wait 14 days to revert the change (this originally was an irreversible switch, but it has been changed by Google since initial announcement — Thanks @convivial_kobi_bear_638 for the correction!).

I’ve decided to enable it, what do I do next?

Enabling the physical storage billing model can be done by using SQL in the BigQuery console. It’s as easy as running the following:

ALTER SCHEMA MY_DATASET
SET OPTIONS(
storage_billing_model = 'physical'
);

Et Voila!

BigQuery Storage. Done right!

About CTS

CTS is the largest dedicated Google Cloud practice in Europe and one of the world’s leading Google Cloud experts, winning 2020 Google Partner of the Year Awards for both Workspace and GCP.

We offer a unique full stack Google Cloud solution for businesses, encompassing cloud migration and infrastructure modernisation. Our data practice focuses on analysis and visualisation, providing industry specific solutions for; Retail, Financial Services, Media and Entertainment.

We’re building talented teams ready to change the world using Google technologies. So if you’re passionate, curious and keen to get stuck in — take a look at our Careers Page and join us for the ride!

--

--

Ash Broadley
Appsbroker CTS Google Cloud Tech Blog

Interests include, but not limited to: Google Cloud (Certified Arch, Data Eng & DevOps), Programming, Squash, Technology, Space, Food. Views are my own.