Authenticating Jupyter notebook against BigQuery

There are lots of guides available on how to query data in bigquery from a Jupyter notebook (such as https://cloud.google.com/bigquery/docs/visualize-jupyter). However most of them rely on authenticating against Google before starting the Jupyter server, or running the server as a service account with access to BigQuery.

This is not always possible and so this guide provides an alternative way to authenticate from a notebook. This is managed using the python package pydata-google-auth. This saves an authentication token to the server where Jupyter is running, so it still may not be suitable for all cases if you are really paranoid about security.

Prerequisite install

First install the python packages for authenticating and using BigQuery using from a terminal where Jupyter is running

pip install --upgrade pydata-google-auth google-cloud-bigquery[pandas]

Initial authentication

This is the part that actually authenticates against Google. In a Jupyter notebook, in a new cell enter this and run it

import pydata_google_auth
credentials = pydata_google_auth.get_user_credentials(
['https://www.googleapis.com/auth/bigquery'],
)

This will show a response something like (the URL will be different)

Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=123456789123-1234ab56cde78fghi9jxlmn12o3p4iqr.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery&state=AbcdeFG1H8IjkLMnOPlQrs234tUVWX&prompt=consent&access_type=offlineEnter the authorization code:  

Open the URL specified and authorize access to BigQuery with your Google account. You will get an authorization code on the final screen. Copy this and paste it into the authorization code box in the Jupyter notebook and press return.

The variable credentials now contains credentials that can be used when submitting a query.

To make the rest of the examples a bit easier to follow we are going to create some variables which will be used in subsequent steps. In a new cell enter the code below. For the project id, put the name of a Google project you have access to and have permission to run BigQuery queries in

query = """SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15"""
project_id = '<insert name of gcp project you have access to>'

Using pandas DataFrame

To use pandas DataFrame, you can put something similar to the one below in a cell

import pandas as pd
df = pd.read_gbq(query, project_id=project_id, dialect='standard', credentials=credentials)
df

Alternative method using pandas DataFrames

There is an alternative way to use pandas DataFrames as shown in the code below

from google.cloud import bigquery
client = bigquery.Client(project=project_id, credentials=credentials)
df = client.query(query).to_dataframe()
df

Google cloud bigquery magics

The BigQuery python library provides a magic command to submit a query. To use this, put this in a cell and run it

%load_ext google.cloud.bigquery 
from google.cloud.bigquery import magics magics.context.credentials = credentials

Then to use it, put something similar to the one below in a cell

%%bigquery --project <gcp project>
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15

--

--