Read BigQuery data faster using the Storage Read API with python
Today is a special day, I’m returning to write an article after some weeks of busy work and It’s double special because It’s after I had the honor to get interviewed by Google Cloud :) Let’s start!
Weeks ago I got an out-of-memory problem trying to read a table of more than 100 million rows and 30 columns with python on a Vertex AI Notebook. I figure out that I was using an old API and I needed to migrate using the new Google BigQuery Storage Read API.
What is the BigQuery Storage Read API?
It’s one of the five APIs and It’s BigQuery’s preferred data read alternative. When you use the Storage Read API, structured data is sent over the wire in a binary serialization format. This allows for additional parallelism among multiple consumers for a set of results [GCP Blog].
Features
The API brings several characteristics and many client libraries.
Example
In order to run the example read carefully the requirements.
- Install the library
pip install --upgrade google-cloud-bigquery-storage
2. Authentication