Read BigQuery data faster using the Storage Read API with python

Antonio Cachuan
Google Cloud - Community
2 min readJun 16, 2022

--

Today is a special day, I’m returning to write an article after some weeks of busy work and It’s double special because It’s after I had the honor to get interviewed by Google Cloud :) Let’s start!

Weeks ago I got an out-of-memory problem trying to read a table of more than 100 million rows and 30 columns with python on a Vertex AI Notebook. I figure out that I was using an old API and I needed to migrate using the new Google BigQuery Storage Read API.

What is the BigQuery Storage Read API?

It’s one of the five APIs and It’s BigQuery’s preferred data read alternative. When you use the Storage Read API, structured data is sent over the wire in a binary serialization format. This allows for additional parallelism among multiple consumers for a set of results [GCP Blog].

Features

The API brings several characteristics and many client libraries.

[GCP Blog]

Example

In order to run the example read carefully the requirements.

  1. Install the library
pip install --upgrade google-cloud-bigquery-storage

2. Authentication

--

--

Antonio Cachuan
Google Cloud - Community

Google Cloud Professional Data Engineer (2x GCP). When code meets data, success is assured 🧡. Happy to share code and ideas 💡 linkedin.com/in/antoniocachuan/