Exporting BigQuery usage logs to… BigQuery

Daniel Lee
Dataform
Published in
3 min readApr 1, 2020

--

Dan is the Founder of Outlier, your fractional data team | weareoutlier.com

BigQuery makes detailed usage logs available through Cloud Logging exports, but before you can start analysing them you probably want to start exporting them to BigQuery. It’s really simple to turn the export on but unfortunately Google’s documentation is pretty vague, so I thought I’d write this blog post to help.

Setting up the export

Open the Logs Viewer and make sure you’ve chosen the correct GCP project in the drop down at the top left.

Expand the advanced filter box.

Add the following advanced search into the filter box:

resource.type="bigquery_project" OR resource.type="bigquery_dataset"

You should see something like this:

Click Create Sink, enter a "Sink Name", choose BigQuery as the "Sink Service" and select a BigQuery dataset to write the logs into (you can also create a new dataset through the dropdown interface).

That’s it! All future logs relating to BigQuery usage in your project will be written to this dataset. Note, it’s not possible to backfill historical data.

Modelling the data

You now have a table in BigQuery with details of every single activity relating to your BigQuery project. However, the table isn’t particularly user friendly. Answering simple questions like “which user spent the most on BigQuery yesterday” requires complex JSON transformations of difficult to remember fields:

select
protopayload_auditlog.authenticationInfo.principalEmail
sum(5.0 * cast(
json_extract_scalar(
protopayload_auditlog.metadataJson,
"$.jobChange.job.jobStats.queryStats.totalBilledBytes"
) as int64
)) as spend
from
bigquery_logs.cloudaudit_googleapis_com_data_access_*
where
resource.type = 'bigquery_project'
and date(timestamp) = date(timestamp_sub(current_timestamp, interval 1 day))
group by 1
order by 2 desc
limit 1

To help with this, we’ve created a Dataform package (documentation here) that creates a clean and easy-to-use table with all the fields you’d expect. Adding a Dataform package to your project is as simple as adding a few lines of code.

https://www.loom.com/embed/fdfa25dcdc8544e38fe844199b970f87

“How much did we spend yesterday” can then be answered with the much simpler query:

select
principal_email,
sum(cost_usd) as spend
from
dataform.bigquery_compute_logs
where
date(timestamp) = date(timestamp_sub(current_timestamp, interval 1 day))
group by 1
order by 2 desc
limit 1

If you’d like to learn more about Dataform, check out our documentation and sign up for a free account.

Originally published at https://dataform.co.

--

--