Google BigQuery: Best Practices for Big Data

Crystalloids
Crystalloids
Published in
5 min readJul 17, 2023

Businesses rely on enterprise data warehouse platforms like Google BigQuery to store, process, and analyze big data. But– if not used correctly– these data analytics tools can become expensive.

Using a client case, we will share how you can dramatically decrease your BigQuery and analytics costs via strategic data retrieval improvements.

Costly Data Chaos:

Our e-commerce client was experiencing a significant spike in their data analytics costs on BigQuery. Unsure as to why there had been an increase in cost, they turned to our team.

With our expertise in PaaS on Google Cloud, our team at Crystalloids examined how our client was utilizing BigQuery and where their data retrieval pain points were. We found that the e-commerce brand was experiencing increasing costs due to the growing size of their data tables and inefficient update requests on BigQuery. They were spending more on:

  • Processes they had implemented to run on an hourly basis
  • How their data tables were updating and inserted
  • Their data retrieval and enriching process once the data was retrieved from BigQuery
  • Transmitting the enriched data back to another location

These Google Analytics (GA) tables, which handle data such as customer transactions, were set to run every hour. The issue was in how the data was updated once the process was triggered at the hour.

“Our client had 5 different processes on 5 different tables…running every hour. Once they were triggered let’s say at the beginning of the hour, they were updating tables for each transaction. Suppose that there were 100 transactions in the last hour. In the current code, it was updating the table 100 times. This was the biggest issue that we needed to change.”

Implementing Cost-Saving Best Practices

Not only did we find a solution for the e-commerce brand, we implemented 3 best use practices to enhance their BigQuery use while reducing total spend.

Below we outline the three implementations that improved our clients use of the serverless data warehouse while reducing their overall costs:

1. Updates and Batch Processing

One of the primary contributors to high BigQuery costs comes when running frequent and individual row updates. We always advise our clients to avoid running single update queries because BigQuery is designed to analyze and query large datasets rather than performing frequent updates.

In fact, the price model for BigQuery is primarily based on the amount of data being processed/scanned. Performing frequent update queries will lead to increased data processing and, subsequently, higher costs.

As related to our client, we found the frequent update mechanism that had been implemented was ineffective and recommended they consolidate update queries through the use of a caching mechanism. This caching mechanism would then allow for a batch update approach, to reduce the number of individual updates.

In the long run, this tweak in their data retrieval process significantly minimized associated costs and unnecessary operational expenses while improving overall efficiency.

2. Selective Field Usage for Query Optimization

Sometimes, a review of the fields being queried can uncover ways to optimize your BigQuery data while reducing costs. When we looked at the SQL select queries utilized in the implemented scripts for our client, we found that they were retrieving all fields (using *). This resulted in the excessive scanning of unnecessary data.

Our solution was to walk through and analyze the specific fields required for the processes with our client. We then modified the queries accordingly. Once the selected fields had been narrowed down to only those essential for the processes, we drastically reduced the scanning bytes required by BigQuery.

This optimization technique led to a significant decrease in the amount of data being scanned and sorted, resulting in substantial cost savings.

3. Harnessing the Power of Partitioned Tables

Finally, we found that when our clients converted existing BigQuery tables into partitioned tables, the total spend on BigQuery decreased.

When you use partitioned tables, the data is grouped based on logical divisions such as time intervals or specific criteria. This drastically reduces the amount of data being scanned for each query.

Once we implemented partitioned tables for our client, they saw a significant reduction in scanning bytes, which translated to substantial cost savings.

Spend Less with Google BigQuery

After implementing these three optimization techniques on BigQuery, our customer saw a remarkable improvement in data and analytics operating costs. Within 20 days, our client experienced a cost reduction of 38.91% in BigQuery expenses (see Figure 1 below).

After one month, a side-by-side comparison was conducted with the passing month. The results were remarkable: our client saved 77.23% (see Figure 2). By identifying and rectifying inefficiencies, our team implemented substantial cost saving tactics while enhancing the performance of current processes.

Conclusion

Efficient cost management is essential when working with enterprise data warehouses like BigQuery that are designed for analyzing big data sets. Remember:

“updates are costly in BigQuery. As a data warehouse with distributed architecture and parallel processing, BigQuery handles queries that can scan and process gigabytes, terabytes, or even petabytes of data in seconds. It is not designed for updates but rather to leverage for analytical purposes.”

By following a systematic approach and implementing strategic optimization techniques, you can significantly reduce costs and improve performance for your business. Through the consolidation of update queries, selective field usage, and the utilization of partitioned tables, our team achieved significant cost savings and enhanced overall efficiency for our e-commerce client.

This customer success story is just one example of how we help businesses strengthen their data processing and retrieval while reducing overall spending. If you want to explore data optimization opportunities or assess your current costs on BigQuery or other Google Cloud services, feel free to reach out to us.

Crystalloids offers workshops to discover use cases such as these together. Interested? Reach out now!

--

--

Crystalloids
Crystalloids

We transform complex data into actionable insights, specializing in GCP & Azure, empowering you to be data-driven.