Getting Started With Google BigQuery on Python.

Adebowale Saheed Abiodun
FormPlus Blog
Published in
7 min readAug 25, 2020

BigQuery is Google’s highly-scalable, serverless and cost-effective solution for enterprise interested in collecting data and storing the data. You can view BigQuery as a cloud-based data warehouse that has some interesting machine learning and BI-Engine features.

Formplus is an easy-to-use form builder which allows anyone to build forms on top of cloud storage services. At Formplus, we must collect the data that our users get on their forms, we don’t stop at collecting these data, data is mostly useless in its raw state, so we analyse these data so that we can help them make sense of it.

Formplus application is powered by Google Cloud Platform (GCP), we use GCP to host and store data, we find BigQuery as the ultimate data warehouse because it is optimised for the challenges that come with real-time data analysis, another factor is that BigQuery is cost-effective, also you don’t have to worry about the architecture and it works seamlessly with SQL for real-time data streaming.

As an engineer at Formplus, I want to share some fundamental tips on how to get started with BigQuery with Python.

How To Install and Setup BigQuery.

This tutorial is not for total beginners, so I assume that you know how to create a GCP project or have an existing GCP project, if not, you should read this on how to get started with GCP.

After creating a GCP project, There are four basic ways to interact with BigQuery according to the official Google BigQuery documentation. However, this article will only focus on how to interact with BigQuery from your Python code. So let’s get started.

Interacting with BigQuery from Python can be done with either the BigQuery REST API or client libraries, I have found myself using both of them in some cases. I prefer using the Python client library because it’s like using the BigQuery REST API but on steroid. The BigQuery REST API makes it a little bit harder to access some methods that can easily be done with the Python client.

BigQuery can be used by making the popular HTTP request to the server, I am going to talk about this later in the article. Google developers have provided an easy way to make these HTTP requests to the server using your favourite language. A lot of Google APIs do not have API client libraries, luckily for us, Google BigQuery has one. This article is majorly on how to query your Google BigQuery using the Python client.

In the below section, I will be explaining how to install, setup and use the Python client library to manage your data.

How To Install and Connect to BigQuery Client

Install BigQuery API client

BigQuery needs authentication by using Google Application Credentials, to authorize BigQuery, you can either let BigQuery client determine the authentication credentials from the environment or you can directly set the path of your credential JSON to the environment variables. Check out this link for more information on how to get the credential JSON.

set path to credentials

After setting the path of your credential JSON to the environment variable, the next thing is to authenticate BigQuery client.

authenticate BigQuery 1

Or if your application is set up to use Google Cloud Platform (GCP), you can dynamically authenticate BigQuery client by letting your application environment determine your project ID and credentials.

authenticate BigQuery 2

Congratulations 😃 you have successfully authenticated your BigQuery client using the Python client library.

Now that we have our BigQuery running, let us delve into some of the basic things you can do with BigQuery.

Create A Dataset Using BigQuery Client

If you already have a dataset, you should skip this part.

A dataset is to a folder, as tables in a dataset are to files inside a folder, in other words, a dataset is a collection of tables. For the rest of this article, I will be authenticating BigQuery by letting the application environment determine the project ID and credentials.

create BigQuery dataset

Delete A Dataset Using BigQuery Client

There are two ways to delete a dataset, you can either delete a dataset without deleting the table that is contained in the dataset or you can delete a dataset including the tables inside the dataset.

delete dataset

Create a Table Using BigQuery Client

Tables are inside the dataset, they are like the files inside folders I told you about. Here is how to create a table. You have to specify the Schema for each column in the table, you can check out BigQuery official documentation on different types of schema fields. Also, you can specify if a particular column of your table should be required or not.

In the example below, I have three columns apart from the first column which holds the row counts.

Here is what my table would look like after creation: This is to show that the table is empty, we only created the schema for the table.

Empty table

Insert Rows with Data in Table Using BigQuery Client

insert rows in a table

Here is what my table would look like after inserting two rows with data:

Table with 2 rows

Now that you have your data stored in BigQuery table, this is the right time to talk about how to query your table using SQL.

Query Rows With Data in Table Using BigQuery Client

Here, you will learn to run queries using SQL, and how to use the QueryJobConfig() method to enable BigQuery cache when fetching data, this will reduce the cost of fetching data from BigQuery table. The idea is that when the cache is enabled, the query will try to fetch the cached result if that same query has run in the past and if the query can’t find the result in the cache, it will then fetch the data from the table.

After fetching the result, you can either get the row values by the field name or index.

SQL query in BigQuery

Delete a Table Using BigQuery Client

delete table

Congratulation, 🎊 these are the basic things you need to know to get started with working with Google BigQuery using Python Client Library. For more powerful things you can do with BigQuery Python client, see the official documentation

In the next section, I will be talking on how to use the BigQuery REST API.

BigQuery REST API

You can also use the BigQuery REST API to perform some operations on datasets and tables. Firstly, let us see how you can create a BigQuery service, this is similar to creating a BigQuery client using the Python client library.

Connect to BigQuery client using BigQuery REST API

​This way of connecting to BigQuery project is suitable for when inserting or fetching all the data in your table without the power of data manipulation that can be done with SQL.

I will be getting my credentials through the OAuth2 client in AppEngine.

BigQuery service using REST API

Congratulations 😃 you have successfully connected your client using BigQuery REST API.

Insert row to a table using BigQuery REST API

In this part, we will learn to insert rows to BigQuery table using the BigQuery REST API v2. We will be referring to the same table, dataset and project used in the section above.

Insert rows in a table using REST API

Here is what my table would look like after inserting rows with data:

Table with 3 rows

Disadvantages of BigQuery

Google BigQuey is an excellent tool for people that only want a data warehouse without the need of understanding the underlying architecture behind it. However, the issue I have with Google BigQuery is that it is read-only. You can’t modify the data in a BigQuery, you can’t delete any row. You can only delete the whole table or append to the table. This is as a result of the technology behind it.

​Conclusion

In a nutshell, we have learnt how to use basic BigQuery to achieve results and we have seen how powerful BigQuery is when combined with the Python Client Library. The advantages of BigQuery far outweigh its disadvantages. Compared to other data warehouses out there, BigQuery has an edge of processing speed of complex queries. BigQuery is the perfect tool for when speed is a factor because it can process Terabytes of data in seconds, it is also very suitable for continuous streaming of data to your tables.

--

--