Uncover Powerful Insights: Analyzing Your Consumption with BigQuery Audit Log

Ramdhan Nur Cahyadi
Blibli.com Tech Blog
6 min readAug 23, 2023

Are you utilizing BigQuery as a comprehensive data tool for tasks like SQL-based data retrieval and dashboard sourcing? Have you ever considered calculating your total consumption in BigQuery for these activities?

In this article, I want to share about utilizing Audit Log to uncover your consumption on BigQuery.

What is Audit Log?

Referring in this article, audit logs are a collection of logs provided by Google BigQuery that provide insight into operational concerns related to your use of Google BigQuery services.

An audit log will help us know our activities while using BigQuery, such as running queries and embedding data into google data studio, tableau, or any business intelligence tool.

How Audit Log Works?

BigQuery Audit Log is a comprehensive logging system that captures detailed information about activities and events within your BigQuery environment. It records a wide range of data, such as queries executed, table and dataset modifications, access attempts, and administrative actions.

The log data includes details such as the user or service account involved, the timestamp of the event, the executed SQL statement, and the affected resources.

Diagram BigQuery Audit log

How To Get Audit Log?

Google Cloud Audit logs provide the following capabilities:

source: https://cloud.google.com/blog/products/data-analytics/bigquery-audit-logs-pipelines-analysis

Audit logs are useful in monitoring specific BigQuery use cases which may not be possible with other options like INFORMATION_SCHEMA or BigQuery Admin Panel.

You can get audit log by following this article . These summaries about what you need to do based on that article:

Enable Audit Logs in your project: Before you can access the BigQuery Audit Log, you need to ensure that audit logging is enabled for your project. You can do this through the Google Cloud Console by navigating to the project settings and enabling the appropriate audit logs.

BigQuery Logs Explorer

Export Audit Logs to a destination: Once audit logging is enabled, you can configure the export of the audit logs to a specific destination. before exporting you need to Filter the logs using a filter of resource: BigQuery and logName: data_access.

Filter log Explorer

then sink the data to your destination

create sink data to your dataset

One important note, the audit logs won’t backdate, so you will have to manually output the historic results to join the data sources. You should also be careful about what audit logs you export as these datasets can get incredibly large especially if you are using GCP app features.

Audit Log Insights

After a few hours, and as long as logs fitting your criteria have been generated, you will start to see your audit log dataset populate.

Query Activity

One useful audit insight we can obtain is a summary of our query activity, which shows our consumption. For those using on-demand pricing, exercising caution is crucial. With this insight, you’ll have the ability to track the cost of each query and identify which jobs are consuming the most cost. Optimize these jobs accordingly to save money.


WITH
BASE AS (
SELECT
DATE( timestamp) AS date,
protopayload_auditlog.authenticationInfo.principalEmail
AS principalEmail,
JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobConfig.queryConfig.priority") AS querytype,
CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobStats.queryStats.totalBilledBytes")
AS INT64)/ POWER(2, 40) AS totalBilledGB,
CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobStats.queryStats.totalProcessedBytes")
AS INT64)/ POWER(2, 40) AS totalProcessedGB,
CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobStats.queryStats.outputRowCount") AS INT64)
AS totalproducedrows,
JSON_EXTRACT(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobConfig.queryConfig.query") AS metadataquery,
CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobStats.endTime") AS TIMESTAMP) AS endtime,
CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobStats.createTime") AS TIMESTAMP) AS createTime
FROM
`audit_logs.cloudaudit_googleapis_com_data_access`
WHERE
JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobConfig.queryConfig.priority") = 'QUERY_BATCH'
AND protopayload_auditlog.authenticationInfo.principalEmail
NOT LIKE '%.gserviceaccount.%' ) (
SELECT
FORMAT_DATE('%Y-%m', date) AS month,
principalEmail,
metadataquery,
COUNT(DISTINCT(date)) AS run_count,
ROUND(SUM(totalBilledGB),3) AS gb_billed,
ROUND(SUM(totalProcessedGB),3) AS gb_processed,
ROUND(SUM(totalBilledGB),3) *0.5 AS gb_cost_cents,
MIN(createTime) AS creation_time,
MAX(endtime) AS end_time
FROM
BASE
GROUP BY
1,
2,
3
ORDER BY
5 DESC)

The output will be as follows:

query activity from audit log

You can keep track of how much you’re using, what you’re being charged for, and the overall cost of your query. This information can help you make your on-demand query more efficient and pinpoint where you can make improvements.

Also, to advance your analysis, refer to another metadata in the BigQuery audit log in this article.

Dashboard Performance

With the use of BigQuery Audit log, you can gain an understanding of the time it takes for each visualization chart to appear if your Business intelligence tools rely on BigQuery as their source.

so how is it possible??. take a look on this figure

looker dashboard is running job for visualization

If you look at the red circle, you will see the blue line on it. This line indicates the dashboard’s progress as it retrieves data from BigQuery. As the information is sourced from BigQuery, it will be automatically logged on the audit log data. This information can then be examined and analyzed for further insights.

An instance of this is the log retrieved from the interaction dashboard, which utilized information from BigQuery.

sample queries which came from dashboard interaction

We can see from the log that it retrieves a query from dashboard interaction, which we can use for analyzing to know how long our dashboard appears in the user’s place.

Now that we have the log, let us take action and gain valuable insights to enhance our dashboard’s performance. Our article concentrates on using job duration as a metric to assess the speed of our users’ experience while using our dashboard.

WITH
BASE AS (
SELECT
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.
job.jobName.jobId AS jobid,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job
.jobStatistics.createTime AS create_time,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.
jobStatistics.startTime AS starttime,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.
jobStatistics.endTime AS endTime,
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.
jobConfiguration.query AS Query,
protopayload_auditlog.requestMetadata.callerSuppliedUserAgent
AS user_agent,
protopayload_auditlog.authenticationInfo.principalEmail
AS PrincipalEmail,
)

, dataset as (
SELECT
create_time,
DATE_DIFF(endTime, Starttime,Second) Duration
Jobid,
(CASE
WHEN principalEmail LIKE '%gserviceaccount.com%' THEN 'application'
ELSE
CASE
##can put any BI user agent in log in example "tableau server"
WHEN user_agent is not null THEN 'application'
ELSE
'adhoc'
END
END
) AS usage_type,
CASE
WHEN REGEXP_CONTAINS(UPPER(query.query), r'FROM\s+`([^`]+)`')
then array_to_string(
REGEXP_EXTRACT_ALL(UPPER(query.query), r'FROM\s+`([^`]+)`'), ',')
WHEN REGEXP_CONTAINS(UPPER(query.query), r'FROM\s+([^\s(.]+)')
then array_to_string(
REGEXP_EXTRACT_ALL(UPPER(query.query), r'FROM\s+([^\s(]+)'), ',')
END AS sources
FROM
BASE )
Select
DATE(create_time) event_date,
sources,
Average(duration) average_duration
FROM dataset
where usage_type = 'application'
group by 1, 2


The output will be as follows:

dashboard duration output

With this output, it is essential to keep a close eye on your dashboard deployment and promptly identify any lag or errors experienced by users. However, it is important to note that this insight can only extract information about the sources used in your dashboard, and not the specific name of the dashboard. To obtain the name of the dashboard, you can make a dictionary first about all sources you used for dashboards then join it with the output of audit log.

Conclusion

Using the data collected from the BigQuery audit log, we can confidently develop a strategy to optimize our usage and cut down on costs. It’s not just about improving our daily job queries, but also keeping a close eye on how other users interact with our dashboard. Armed with this knowledge, we can make informed decisions to enhance the dashboard and queries, resulting in a more cost-effective solution that offers an exceptional user experience.

Reference

https://medium.com/google-cloud/visualize-gcp-billing-using-bigquery-and-data-studio-d3e695f90c08

https://cloud.google.com/blog/products/devops-sre/log-analytics-in-cloud-logging-is-now-ga/

--

--