Member-only story
The fastest way to fetch BigQuery tables
A benchmark of the fastest methods used to fetch tables from BigQuery. Also introducing bqfetch: an easy-to-use tool for fast fetching.
As a Data Engineer, I wanted to fetch as quickly as possible tables from BigQuery. I also needed to fetch these tables as pandas DataFrames. So I considered a lot of alternatives, I have tested and benchmarked many implementations using multiple frameworks, and I will show you in this article a tool I have built that allowed me to get the best performance for fetching BQ tables as DataFrames.
All the following recommandations are based on benchmarks tested on Google Compute Engines (GCE), it might be possible that some better implementations exist according to the machine you use and the Internet bandwidth.
Old method
One of the well known method used to fetch data from BigQuery works as follow:
- Extract the table to Google Cloud Storage using GZIP compression. It will create multiple csv files, each containing some rows of the table, compressed using the GZIP format. This action…