Connecting to the Google Public Covid-19 Datasets with BigQuery and DBeaver

Peter Allen
7 min readJul 10, 2020

--

Google Public Datasets Program — Free Access to COVID-19 Dataset

Google has a large public dataset which is openly available for use via their BigQuery platform. As part of this program they have recently added completely free access to this dataset including queries run on it until September 15, 2020. After this date the queries run on it through BigQuery will revert back to the normal and very substantial Free Tier of 1TB/month of processing.

Dbeaver — Free Open Source Universal Database Tool

This article will show you how to gain access and run queries on this dataset in both BigQuery and also then connecting via BigQuery to a local SQL Editor, in this case the free and very functional DBeaver. I won’t take you through downloading and installation of Dbeaver as it is your typical installation process but it can be found at: https://dbeaver.io/

Gaining Access to the Dataset:

To gain access to the Open Dataset made available by Google visit the overview page. The overview page shows the details of what is contained in the dataset which is contained largely in a single table with case numbers, recoveries, tests and a few others by country and sub-regions such as states and also by date. The sources of this data and the other fields are described in detail on the overview page.

The next step to connecting to this dataset is to visit either the US-Region or EU-Region via the associated links on the overview page. Upon visiting these links for the first time, you will be asked to make a new project as per the following image. Just fill in a new project name of your choosing, which below is “Covid-19 Dataset” (take note of the Project ID below the name which is generated, this will be used to connect to Dbeaver later). A location isn’t required unless you are connecting from a parent organization so either leave as No Organization or change to your organization name, then choose “CREATE”. This will take you through to BigQuery in the Google Cloud Platform.

Running Queries in BigQuery:

At this point you have access to the Google Open Datasets through BigQuery. The cloud platform is pretty much a SQL editor in the cloud and can be used to query the datasets, save the results, single click through to explore with Google Datastudio (a future blog post will show you how to use this awesome free tool for dashboarding) or GeoViz.

The main Google BigQuery Editor Page

The first thing to note is we have access to a plethora of datasets within this BigQuery environment. These are on the bottom left and we can see there are census datasets, mobile coverage datasets and many, many more. The dataset that is used in this article is in the schema called, “covid19_open_data” which has a table of the same name within it.

To see what is in this dataset and get an idea of the what the data looks like. We run a basic query in the editor which just shows us 10 lines of data.

SELECT * FROM `bigquery-public-data.covid19_open_data.covid19_open_data` LIMIT 100;

This returns the results in the bottom right section of the page which we can then scroll through to see what interesting data we might like to investigate. From looking through this a query to gain the data related to the largest 2 states in Australia was decided upon. Only a few of the fields were required in the final output so the final query was:

SELECT 
date, country_name, subregion1_name, subregion2_name,
new_confirmed, new_deceased,
new_recovered, new_tested, cumulative_confirmed,
cumulative_deceased, cumulative_recovered,cumulative_tested
FROM `bigquery-public-data.covid19_open_data.covid19_open_data`
WHERE country_name = 'Australia'
AND subregion1_name IN ('Victoria','New South Wales');

This returns the some metrics about the number of cases in Victoria and New South Wales in Australia and the date of these metrics. Then to export this data to use in another program you can click on the “Save Results” button where you’ll have the option to save it as CSV, JSON, Google Sheets, Big Query Tables or even just copy it to the clipboard.

So now you can run whatever queries you like in BigQuery directly, but sometimes people prefer to run it in their normal editor, so let’s take a look at how to connect this to DBeaver. If that interests you, read on.

Connecting DBeaver to the BigQuery Datasets:

All right, we have our access to the datasets working BigQuery now to set up DBeaver we need to do a few additional steps in Google using the project we have just set up to allow it to be used elsewhere.

First of all, Click on the menu icon on the top left of the BigQuery screen and select “APIs & Services” then “Dashboard” which will show the following screen. Down the bottom have a look to check that you can see BigQuery API in the list at the bottom of the screen.

APIs & Services Dashboard

Next step, click on the “Credentials” option in the left hand menu. Which will present the following option at the top of the next screen. Click on the “+ Create Credentials” button.

On the next screen as below, enter a display name for this service account and a description in the bottom box. The ID will auto-populate. Click on “Create”.

On the next page you can select a role for this service account. I have chosen BigQuery Data Viewer as we won’t be modifying the data at all just viewing it and it’s contents. Then click “Continue”. The final step about granting users is optional and not required here so just click finish.

The final step is to create an API Key file. So at the bottom of the Credentials screen you should see your newly created service account. Click the Edit Icon on this and then scroll down to click “Add Key”

This will give you the option of a JSON or p12 file. In this case choose JSON and then save the file to your harddrive in a location you can use. For this article I just saved it to the desktop. We need to know where this is for set up in DBeaver.

Ok, so now we have that file, let’s get DBeaver working. Open up Dbeaver and Add a “New Database Connection” under “Database”. It will ask you to “Select your Database”, so in the search box type in bigquery and the Google BigQuery option will appear. Select it and click next. The following screen will appear:

In the Project: you need to enter your project ID which we spoke about at the start. Hopefully you remember it but if not just go back to BigQuery and at the top select the Project Name which will open up list of projects with associated ID’s. Then in Service Account: enter the name of the service account we just made noting that it will have filled in spaces with hypens as above. Then select, OAuth type: User-based.

Now select the browse icon to select the location of the credentials JSON file we saved before (on the desktop for me), noting it might have an odd name with numbers if you didn’t rename it.

Finally, select “Edit Driver Settings” and then “Download/Update” and “Download” to update your BigQuery drivers. Then click OK and we are ready to click on “Test Connection”

Assuming you’ve done all of this correctly it will be successful so click Finish.

Now you have a new connection, in Dbeaver on the left hand side. Right click it to get a new sql editor and run a query to check it is all working. I just used the same queries I ran in BigQuery and it all works well.

Conclusion:

Now we have set up both BigQuery to access the public datasets so we can run queries directly from there and have easy access to the Google Suite of tools. We also have set up Dbeaver connections to connect to the same BigQuery set of datasets, so you have all of your favourite Database editors in built functions and the familiarity of this environment outside a browser.

Hopefully, this has been useful. In the next article I will show how to connect this directly through the Google DataStudio and build a basic dashboard so we can monitor the cases of Covid-19 in an automatically and easily accessible and shareable dashboard.

--

--

Peter Allen

Data Analyst in Melbourne Australia. Ex-mechanical engineer who transitioned across due to the love of all things data. Beekeeper. DIY. Tinkerer.