Export Google Analytics Raw Data
Why you need raw unsampled data from Google Analytics
Google Analytics is an awesome tool which has become the standard for web analytics tools. Google Analytics provides valuable insights about an online business. Be it the ease of setup or the large variety of out-of-the-box reports and dashboards. Add to that the ability to build segments, funnels, track goal conversions. Awesome Right!
However, there are limitations in Google Analytics Data that prevents you from digging deeper. Google Analytics only makes aggregated data available from the API. Effectively, you end up downloading a Google Analytics report with each request. This makes actions like segmenting users by behavior using machine learning tools challenging. This especially so because, as pointed out earlier, you get limited by the granularity of data being exported via Google Analytics API.
Key limitations of Google Analytics Data exported via Google Analytics API.
Data Sampling: Even Google’s processing servers can’t always handle endlessly large volumes of data in a finite amount of time. Thus, Google Analytics applies sampling when you request a large amount of data.
This data sampling is different from the limit of 10 million records for hits or events, per property, per month.
Aggregated Data: Google Analytics API does not provide access to hit level data, While you can find out the number of visitors in a particular segment on a particular day on your website and fetch a variety of metrics for those users (such as source/medium, browser, or session duration), you cannot get the underlying, user-level data to follow an individual user’s journey.
Fragmented Data: Google Analytics limits the number of dimensions and metrics you can include in a Google Analytics API request. Not every metric can be combined with every dimension. Each dimension and metric has a scope: user-level, session-level, or hit-level.
Fortunately, the answer to all these problems is raw data collected at hit level. Having Hit-level data means that you can access the underlying hits that were sent to Google Analytics, allowing you to do your own aggregation as you wish, based on any criteria or dimension.
How to export raw data from Google Analytics
As mentioned before, you can’t get raw data at hit level from Google Analytics. The premium version of Google Analytics (360) and its BigQuery export feature will get you closer but for $150,000 it is an awfully expensive option.
What If I tell you, there is a way to access raw data for each hit from Google Analytics for free. Yes, that’s correct it can be achieved using API functionality together with custom dimensions in Google Analytics.
Custom dimensions can be used to capture, analyze, and visualize information that is not available in Google Analytics by default. You can use custom dimensions as keys for combining information from GA and other systems, as well as to enhance your reports with information that is relevant to your business. For example, you can save the User Login ID from your website and use it for integrating offline and online actions.
Some useful custom dimensions, that improve your Google Analytics data collection:
1. Hit Timestamp : a hit-scoped custom dimension that captures the exact timestamp when the hit happened, in the yyyy-mm-ddThh: mm: ss format with the timezone offset.
2. Session ID : a session-scoped custom dimension that collects a unique, random value, used to identify hits that belong to the same session.
3. Client ID : a session-scoped custom dimension that collects the unique value assigned to the client’s device from the _ga cookie.
4. User ID : a hit-scoped custom dimension that collects the value representing a user who has logged in to your website, allowing you to identify all the sessions and hits of a user.
All you need is a Java Script to send these custom dimensions to Google Analytics along with the data that is already being sent from your website to Google Analytics. Easy Right?
Check out this awesome post by Simo Ahava on improving data collection with custom dimensions and Google Tag Manager for more details.
Now comes the difficult part, exporting this data out of Google Analytics. You will need an understanding of how APIs work, a Google Analytics account, and a Data Warehouse of your choice. The steps involved in loading the data from Google Analytics to a data warehouse are as follows:
Step 1: Identify Your Data
The first step is to identify the right dimension and metric combinations allowed by Google Analytics API. Lucky for us there is a tool provided by Google Analytics which makes it easy. Along with the dimension and metrics combination you also need to choose the time period you want to pull the data, the trick is to make multiple requests for a small time period rather than making one request for longer time period.
Step 2: Extract Your Data
Now that you have identified the data you want to export, you can use the Google Analytics Reporting API to export data out of Google Analytics. This would involve making multiple requests with different combinations of dimension and metrics. However, the common dimension in all your requests sent to Google Analytics would be the custom dimension we discussed earlier in this post.
Step 3: Transform You Data
You must first transform your data to ensure that it is in a format that can be accepted by your data warehouse. For example, it will be easy to use a JSON format for Google BigQuery but you may have to choose to convert to a CSV or SQL format for more traditional relational databases like Microsoft SQL Server. But the most important step is to join the data exported by multiple requests based on common custom dimensions to get a composite record with hit level data of a user.
Step 4: Create a Data Receiving Repository in Your Data Warehouse
Creating a data stage for your data could make your data transformation easier to perform before it is finally ingested for analysis/reporting. This is easy to create in data warehouses like Google BigQuery or Snowflake.
Step 5: Load Your Data
It is advisable to design a schema for your chosen data warehouse and then map it to your Google Analytics data. In this way, you are almost ready to load your raw data from Google Analytics to a data warehouse after making sure that all the steps are completed to suit your needs.
Congratulations, you have now exported raw data from Google Analytics to Data Warehouse. The raw data will look something like this.
Client Id : 1802577120.1595862941
Hit Timestamp : 2020–07–27T10:15:41.267–05:00
Hit Date : 2020–07–27
Session Id : SID-20200727–08324964
Visitor Id : VIDc7ee42d2-c22f-690f-548a-61c5efdbbddd
Hashed IP Address : 05d11e92511d7f7b1bcda8327b855aa79f30cf3631f309d98dc78b84d79e5c16
Hit Type : pageview
Hit Order : 1
Pageview Order : 1
Property Id : UA-34208182–4
View Id : 207093576
Ad Group : GA Data Extract
Ad Query Word Count : 5
Ad Slot : Google search: Top
Ad Targeting Type : Keyword
Ad Group Id : 85404108936
Ad Campaign Id : 8247077736
Ad Creative Id : 452094560289
Ad Criteria Id : 301206554611
Ad Customer Id : 9227698748
Channel Grouping : Paid Search
City : Elk River
Continent : Americas
Country : United States
Country ISO Code : US
Latitude : 45.3377
Longitude : -93.5691
Metro : Minneapolis-St. Paul MN
Region : Minnesota
Sub Continent : Northern America
Exit Pagepath : /google-analytics-hit-data-extractor
Hostname : electrik.ai
Landing Pagepath : /google-analytics-hit-data-extractor
Pagepath : /google-analytics-hit-data-extractor
Page Title : Google Analytics Hit Level Data Extractor | Electrik.AI
Previous Pagepath : (entrance)
Browser : Chrome
Browser Size : 1580x760
Browser Version : 84.0.4147.89
Data Source : web
Device Category : desktop
Operating System : Windows
Operating System Version : 10
Ad Content : Export Google Analytics Data
Campaign : Evergreen_GA Hit Data Extractor Cmpgn
Full Referrer : google
Social Source Referral : No
Keyword : +export +google +analytics +data
Medium : cpc
Source : google
Source Medium : google / cpc
Days Since Last Session : 0
User Type : New Visitor
and more…
Awesome Right! But this does not end here, you need run this process daily to export hit level data from Google Analytics and you must keep in mind that technologies like Google Analytics are evolving and you might find what was working yesterday might not work today. Trust me we have been following this space closely.
So far we have just scratched the surface on how you can export raw data from Google Analytics. It gets even more complicated when you integrate data from different marketing sources with Google Analytics. So instead of building and maintain your own solution or paying $150,000 for Google Analytics 360. Try Electrik.AI’s Google Analytics Hit Data Extractor.
Wrapping Up…
Give Hit Level Data a try, you would be surprised with the amount of depth it adds to your marketing data analysis.
Using Electrik.AI, marketing professionals with no programming experience can export raw data from Google Analytics at hit level granularity in few minutes. You can view the list of all dimensions/metrics exported from Google Analytics here.
Electrik.AI’s Google Analytics Hit Data Extractor, uses Google Analytics to track raw hit level data on your website and exports Google Analytics Data to any Data Warehouse of your choice. Along with raw un-sampled Hit Level Data you also get the following in your data exported from Google Analytics.
- Hashed IP Address of the Visitor on your website.
- Unique Visitor ID for each user on your website.
- Unique Session ID for each period a user is active on your site.
- Client ID created and assigned by the Google Analytics cookie.
- Order of Pages viewed by a user in a session.