Published in


BigQuery 101, how to tame the beast?

Google BigQuery often seems to be Google’s magic tool, allowing you to browse huge amounts of data, via SQL, without worrying about the infrastructure. Small flat, magic does not exist! A certain number of levers allow you to avoid some performance and cost disappointments!

This series of articles will present the architecture and the best practices to put in place to be efficient on a daily basis with BigQuery, by explaining how a column-oriented database works or how the parallel processing of your queries is managed.

BigQuery, what’s this?

You may have arrived here luckily and wondering what BigQuery is?

Photo de Marcos Paulo Prado via Unsplash

Google defines BigQuery as:

“A serverless, highly scalable, and cost-effective multicloud data warehouse designed for business agility”

BigQuery is a product launched in 2012 by Google to address processing issues in the Big Data ecosystem. BigQuery is a serverless solution that has the ability to be auto scalable. It is a solution that makes it easy to address multi-cloud use cases, all at a relatively controlled cost.

Once we said that, we did not say much! From my point of view, what is important here is:

  • the fact that the solution is serverless. That doesn’t mean it’s magic (although some layer of magic can appear sometimes and we’ll get to that later) but that we don’t have to worry about the ressources behind this service.
  • Then, it’s important to understand that this is not a database… BigQuery is a Data Warehousing solution. Kesako? Data Warehousing makes it possible to respond to problems such as Analytics or Machine Learning required a large amount of data. Unlike its Relational cousin, we will favor reading over writing.
  • Finally, it is a solution designed for the future with the multi-cloud! Today, BigQuery is a tool for working together with other Public Cloud (hello AWS and Azure).

BigQuery and its ecosystem

Simply put, BigQuery is a central solution in data architectures with Google Cloud. To feed BigQuery, we will be able to use different types of sources. These can be of batch type allowing you to load data from files present on Cloud Storage, or of the stream type to consume data in real time with Kafka or Pub/Sub.

It will sometimes be necessary to add a “transform” step between these sources and BigQuery, such as Apache Spark or Apache Beam.

BigQuery in a data ecosystem

Once the data is persisted in BigQuery, we will be able to interact with it with our dataviz solutions (Data Studio, Looker, Tableau, etc.) but also to train our Machine Learning models from our historical data.

BigQuery will allow you to completely free yourself from software and hardware issues to focus on the data and its use. Thus, building a solution allowing you to have a 360 view of your customers requires almost no skills, just a few SQL queries. Are you a Data Scientist and need data to train your models? Nothing could be easier, explore BigQuery and make your queries directly in your notebooks.

BigQuery in action

I invite you to go to the Github page below to find and perhaps run the demonstration that I have prepared for you.

If you prefer to watch the demo on video, you can see this demo live (but in French 🇫🇷). Don’t spoil yourself, pause the video after the demo and continue reading.

BigQuery 101 demo at Devfest Lille 2021

As you may have seen in the demo, we just played with a 5 TB dataset in SQL, with a more than acceptable response time. If you’re curious, let’s meet again next week to see how BigQuery works.




Learnings and insights from SFEIR community.

Recommended from Medium

Top. 10 Twitter Accounts of Data Science

Pandas Complete guide ( Part 3 )

New consulting service at Kickstarter

Power BI Modelling Error

Practical Humor For The Data Scientist

Logistics Center of Gravity Analysis in Python

Comparing Distributions

Visualizing State Drug Utilization Data Sets

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aurélien Allienne[Ξ]

Aurélien Allienne[Ξ]

Head of Data @sfeir

More from Medium

New Google Cloud Platform cert for database engineers

Data Masking with Tokenization using Google Cloud DLP and Google Cloud Dataflow

Migrating Workloads to GCP : Part II :Phases of Migration

Pulsar: A tutorial of a reusable native Google Cloud App skeleton for events-based Functions (part…