Photo by Carlos Muza on Unsplash

How to Get Google Analytics Report Data as Pandas DataFrame?

Getting Google Analytics report data with python through Google Analytics Reporting API V4

Dayal Chand Aichara
Analytics Vidhya
Published in
4 min readNov 14, 2019

--

Google Analytics tracks and reports website traffic which helps to understand customers as well as website’s performance.

Google Analytics report data can be fetched through Google Analytics Reporting API V4 using python with few lines of code. I will explain whole process in 4 steps.

1. Set up the project in Google Developer Console and get client token and secret

In order to access Google APIs, user need a project on Google Cloud Platform and a service account. Follow the steps written below to set up project and service account in Google Developer Console.

I. Create a project here https://console.cloud.google.com/

II. Create a service account and save credentials (.json file). Read more about service account here.

III. Enable Google Analytics Reporting API here https://console.developers.google.com/apis/dashboard?

If you face any issue with this step , watch Google Cloud Platform’s video guide on their YouTube Channel.

Note: This is one time setup.

2. Authentication

Once you have clients secrets (credentials file), you can access Google Analytics Reporting API by authentication using OAuth 2.0. We will use oauth2clientand httplib2 python libraries for authentication.

Google Analytics Authentication code
ga_auth = authenticate_ga_api() 

Note: When you run authenticate_ga_api() for the first time , it will redirect to authentication link in default browser. Choose/enter gmail id registered with Google project and click on allow button. This will create and save authentication credential file “ analyticsreporting.dat” for Google Analytics Reporting API. See sample authentication credentials file below.

Authentication credentials sample

Note: The authentication access token expires in 1 hour, we have to rerun authenticat_ga_api function to refresh access_token .

3. Request submission

In this step we define requests and execute them using reports.batchget
method in the Google API Client.

Request must be a dictionary object. A request usually includes viewId , dateRanges , dimensions , and metrics .

View Id : View id can be found in Google Analytics under view tab.

Date Ranges: start date and end date in YYYY-MM-DD format.

Dimensions: Variables which are used to group metrics (secondary variables). example: user type, data type : string

Metrics : Secondary variables, example: users, datatype: numeric

Explore all dimensions and metrics combinations at https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/ .

Report request sample

Execute requests to get data.

Request execution code

I will provide code for few samples in this step and their processing in step 4.

Sample 1: Dimensions and Metrics

Let’s get page views, unique page views , and average time on page for different pages for a month.

Single request with single date range.

Note: It is also possible to create a custom calculated metric, by combining existing metrics into a new metrics expression.

# Create the Metrics object.
metric_expression = {
'expression': 'ga:users/ga:pageviews',
'formattingType': 'FLOAT',
'alias': 'Metric Expression'
}

Sample 2: Multiple Date Ranges

We will get page views, unique page views , and average time on page for two date ranges.

Multiple date range sample

Sample 3: Multiple Dimensions

In Multi dimensions request, metrics values are grouped by dimensions in their order. You can understand this by comparing this to pandas DataFrame groupby with more than one column and aggregate other columns by sum.

Multiple dimensions sample

Sample 4: Histogram Buckets

We add list of histogramBuckets to dimensions, where each value in list, is starting number of bucket. Also, add orderBys to request to define order of buckets. Following code we will return a dictionary of session count buckets and user counts with respect to bucket.

Histogram buckets sample

Note: It is also possible to get segments, pivots, and cohorts through Google Analytics Reporting API. See samples here.

4. Data processing

Request submission return a dictionary object.

A data sample

Let’s process data from samples showed in step 3.

  1. Dimensions and Metrics
Data Processing for single request and single date range

Above code will create a pandas DataFrame from dictionary object.

Output pandas DataFrame

2. Multiple Date Ranges

Code to process a multi date output
Multi date range DataFrame

3. Multiple Dimensions

Data processing code for multi dimensions
Multi dimensions range DataFrame

4. Histogram Histogram Buckets

Data processing code for histogram buckets
Histogram buckets DataFrame

Jupyter Notebook for this tutorial, is available on Github.

Summary:

  1. Create a project & a service account on Google Cloud Platform, enable Google Analytics API, and download credentials.
  2. Do authentication.
  3. Write requests and execute them.
  4. Process dictionary object to get a pandas DataFrame.

Notes:

  1. It is possible to define specific date ranges using python libraries. Read date and time related article to see use cases.

2. We can store pandas DataFrame in Google Sheets. How? Read my another article on Medium.

Thank you for reading! If you have any query, please comment below or reach out to me on LinkedIn or Twitter.

References:

  1. https://developers.google.com/analytics/devguides/reporting/core/v4/quickstart/service-py
  2. https://developers.google.com/analytics/devguides/reporting/core/v4/samples
  3. https://cloud.google.com/iam/docs/creating-managing-service-accounts

--

--

Dayal Chand Aichara
Analytics Vidhya

Data Scientist at KPMG Ignition Tokyo , Blockchain Enthusiast, Traveller, Trekker — https://www.linkedin.com/in/dcaichara/