How BigQuery Physical Storage works

Sobhan A
3 min readFeb 3, 2023

--

Google Cloud recently released a new billing model which is based on the physical storage bytes as opposed to the current logical model. The physical storage is the amount that actually gets stored on the hardware due to compression of the bytes of data. BigQuery users can now change the billing model of existing datasets under this preview feature for some potential cost savings.

It does however come with a few caveats :

  • It’s a one way operation, meaning you cannot go back to logical billing model once you change the billing model to physical.
  • Total storage billed will now also include time travel storage.
  • SKU price for physical storage per GB is more than logical one. ( Both for active and long term)
  • If you’re compression ratio is less than roughly 2:1 then you will actually end up paying more. Check the SKU prices for better clarity.

Putting the caveats aside, the new billing model is still a fantastic cost saving opportunity for heavy BigQuery storage users with high compression ratios.

So how can you figure out if it’s going to reduce storage cost?

First we need to do an audit to figure out the logical & physical storage values per dataset in our projects.

we can actually do this by leveraging Google’s public guidance on this. I decided to run the query against my entire organisation & get the cost difference per dataset across all my projects.

  # Change these to SKU prices for your region

DECLARE active_logical_gb_price FLOAT64 DEFAULT 0.02;
DECLARE long_term_logical_gb_price FLOAT64 DEFAULT 0.01;
DECLARE active_physical_gb_price FLOAT64 DEFAULT 0.04;
DECLARE long_term_physical_gb_price FLOAT64 DEFAULT 0.02;

CREATE or REPLACE table bq_stats.bq_comp_data as (
WITH
storage_sizes AS (
SELECT
PROJECT_ID,
table_schema AS dataset_name,
CURRENT_DATE() AS the_date,
SUM(active_logical_bytes) / power(1024, 3) AS active_logical_gb,
SUM(long_term_logical_bytes) / power(1024, 3) AS long_term_logical_gb,
SUM(active_physical_bytes) / power(1024, 3) AS active_physical_gb,
SUM(long_term_physical_bytes) / power(1024, 3) AS long_term_physical_gb,
SUM(total_physical_bytes) / power(1024, 3) AS total_physical_gb,
SUM(total_logical_bytes) / power(1024, 3) AS total_logical_gb
FROM
region-us-central1.INFORMATION_SCHEMA.TABLE_STORAGE_BY_ORGANIZATION
WHERE total_logical_bytes > 0
AND total_physical_bytes > 0
GROUP BY 1,2
)
SELECT
PROJECT_ID,
dataset_name,
round((total_logical_gb / total_physical_gb),4) AS compression_ratio,
((active_logical_gb * active_logical_gb_price) + (long_term_logical_gb * long_term_logical_gb_price))
- ((active_physical_gb * active_physical_gb_price) + (long_term_physical_gb * long_term_physical_gb_price)) AS total_cost_difference
FROM
storage_sizes
ORDER BY
compression_ratio DESC
);

If you host your data in multiple regions then you need to run this query for other regions; but make sure to adjust the prices of the SKUs at the very top of the query. You can get these numbers from this page.

At the second step we can export the the results for investigation.

In my case, I had two datasets (dev & test) each with more than 20TBs of storage & compression ratios of around 4.8 , which suggested $400 savings potential by changing the billing to Physical.

You can also see that for certain datasets where the compression ratios are’t really above 2, you end up paying more if you decide to change the billing model.

The final step is to actually change the billing model for these datasets, which is super straight forward.

bq update -d --storage_billing_model=PHYSICAL <PROJECT_ID>:dev
bq update -d --storage_billing_model=PHYSICAL <PROJECT_ID>:test

--

--