Load CSV Data into a Clustered Table in BigQuery

How to use Python to import Data to Google BigQuery

Christianlauer
CodeX

--

Photo by John Fowler on Unsplash

With Python and BigQuery, you have a very powerful Data Science toolset. With a Jupyter notebook you can access BigQuery and import, parse, analyze and if necessary load data back into BigQuery. Another use case would of course be the integration process of data per ETL/ELT jobs into BigQuery. Again, Python would be a possible solution. If you want to store data especially performent for later processing, it is a good idea to store the data in a clustered table.

Clustered Tables

Using clustered tables in BigQuery, the table data is automatically organized based on the contents of one or more columns in the table schema. The columns you specify are used to compile related data. When you cluster a table with multiple columns, the order of the columns you specify is important. The order of the specified columns determines the sort order of the data. Clustering can improve the performance of certain types of queries, such as queries that use filter clauses and queries that aggregate data.

Example Script

With the following script you can create a clustered table to use it later. I use the booking_date for the time partitioning while using the id as a normal…

--

--

Christianlauer
CodeX

Big Data Enthusiast based in Hamburg and Kiel. Thankful if you would support my writing via: https://christianlauer90.medium.com/membership