Authenticating Jupyter notebook against BigQuery
There are lots of guides available on how to query data in bigquery from a Jupyter notebook (such as https://cloud.google.com/bigquery/docs/visualize-jupyter). However most of them rely on authenticating against Google before starting the Jupyter server, or running the server as a service account with access to BigQuery.
This is not always possible and so this guide provides an alternative way to authenticate from a notebook. This is managed using the python package pydata-google-auth. This saves an authentication token to the server where Jupyter is running, so it still may not be suitable for all cases if you are really paranoid about security.
Prerequisite install
First install the python packages for authenticating and using BigQuery using from a terminal where Jupyter is running
pip install --upgrade pydata-google-auth google-cloud-bigquery[pandas]
Initial authentication
This is the part that actually authenticates against Google. In a Jupyter notebook, in a new cell enter this and run it
import pydata_google_auth
credentials = pydata_google_auth.get_user_credentials(
['https://www.googleapis.com/auth/bigquery'],
)
This will show a response something like (the URL will be different)
Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=123456789123-1234ab56cde78fghi9jxlmn12o3p4iqr.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery&state=AbcdeFG1H8IjkLMnOPlQrs234tUVWX&prompt=consent&access_type=offlineEnter the authorization code:
Open the URL specified and authorize access to BigQuery with your Google account. You will get an authorization code on the final screen. Copy this and paste it into the authorization code box in the Jupyter notebook and press return.
The variable credentials now contains credentials that can be used when submitting a query.
To make the rest of the examples a bit easier to follow we are going to create some variables which will be used in subsequent steps. In a new cell enter the code below. For the project id, put the name of a Google project you have access to and have permission to run BigQuery queries in
query = """SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15"""
project_id = '<insert name of gcp project you have access to>'
Using pandas DataFrame
To use pandas DataFrame, you can put something similar to the one below in a cell
import pandas as pd
df = pd.read_gbq(query, project_id=project_id, dialect='standard', credentials=credentials)
df
Alternative method using pandas DataFrames
There is an alternative way to use pandas DataFrames as shown in the code below
from google.cloud import bigquery
client = bigquery.Client(project=project_id, credentials=credentials)df = client.query(query).to_dataframe()
df
Google cloud bigquery magics
The BigQuery python library provides a magic command to submit a query. To use this, put this in a cell and run it
%load_ext google.cloud.bigquery
from google.cloud.bigquery import magics magics.context.credentials = credentials
Then to use it, put something similar to the one below in a cell
%%bigquery --project <gcp project>
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15