How to Get Google Analytics Report Data as Pandas DataFrame?
Getting Google Analytics report data with python through Google Analytics Reporting API V4
Google Analytics tracks and reports website traffic which helps to understand customers as well as website’s performance.
Google Analytics report data can be fetched through Google Analytics Reporting API V4 using python with few lines of code. I will explain whole process in 4 steps.
1. Set up the project in Google Developer Console and get client token and secret
In order to access Google APIs, user need a project on Google Cloud Platform and a service account. Follow the steps written below to set up project and service account in Google Developer Console.
I. Create a project here https://console.cloud.google.com/
II. Create a service account and save credentials (.json file). Read more about service account here.
III. Enable Google Analytics Reporting API here https://console.developers.google.com/apis/dashboard?
If you face any issue with this step , watch Google Cloud Platform’s video guide on their YouTube Channel.
Note: This is one time setup.
2. Authentication
Once you have clients secrets (credentials file), you can access Google Analytics Reporting API by authentication using OAuth 2.0. We will use oauth2client
and httplib2
python libraries for authentication.
ga_auth = authenticate_ga_api()
Note: When you run authenticate_ga_api()
for the first time , it will redirect to authentication link in default browser. Choose/enter gmail id registered with Google project and click on allow
button. This will create and save authentication credential file “ analyticsreporting.dat” for Google Analytics Reporting API. See sample authentication credentials file below.
Note: The authentication access token expires in 1 hour, we have to rerun
authenticat_ga_api
function to refreshaccess_token
.
3. Request submission
In this step we define requests and execute them using reports.batchget
method in the Google API Client.
Request must be a dictionary
object. A request usually includes viewId
, dateRanges
, dimensions
, and metrics
.
View Id : View id can be found in Google Analytics under view tab.
Date Ranges: start date and end date in
YYYY-MM-DD
format.Dimensions: Variables which are used to group metrics (secondary variables). example: user type, data type : string
Metrics : Secondary variables, example: users, datatype: numeric
Explore all dimensions and metrics combinations at https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/ .
Execute requests to get data.
I will provide code for few samples in this step and their processing in step 4.
Sample 1: Dimensions and Metrics
Let’s get page views, unique page views , and average time on page for different pages for a month.
Note: It is also possible to create a custom calculated metric, by combining existing metrics into a new metrics expression.
# Create the Metrics object.
metric_expression = {
'expression': 'ga:users/ga:pageviews',
'formattingType': 'FLOAT',
'alias': 'Metric Expression'
}
Sample 2: Multiple Date Ranges
We will get page views, unique page views , and average time on page for two date ranges.
Sample 3: Multiple Dimensions
In Multi dimensions request, metrics values are grouped by dimensions in their order. You can understand this by comparing this to pandas DataFrame groupby
with more than one column and aggregate other columns by sum.
Sample 4: Histogram Buckets
We add list of histogramBuckets
to dimensions, where each value in list, is starting number of bucket. Also, add orderBys
to request to define order of buckets. Following code we will return a dictionary of session count buckets and user counts with respect to bucket.
Note: It is also possible to get segments, pivots, and cohorts through Google Analytics Reporting API. See samples here.
4. Data processing
Request submission return a dictionary
object.
Let’s process data from samples showed in step 3.
- Dimensions and Metrics
Above code will create a pandas DataFrame from dictionary object.
2. Multiple Date Ranges
3. Multiple Dimensions
4. Histogram Histogram Buckets
Jupyter Notebook for this tutorial, is available on Github.
Summary:
- Create a project & a service account on Google Cloud Platform, enable Google Analytics API, and download credentials.
- Do authentication.
- Write requests and execute them.
- Process dictionary object to get a pandas DataFrame.
Notes:
- It is possible to define specific date ranges using python libraries. Read date and time related article to see use cases.
2. We can store pandas DataFrame in Google Sheets. How? Read my another article on Medium.
Thank you for reading! If you have any query, please comment below or reach out to me on LinkedIn or Twitter.