Julien Kervizic
Jul 22 · 7 min read

There is more than just placing a small snippet on a website to implement google analytics. There are different integration patterns in order to capture the data into Google Analytics, and each integration is subject to a lot of pitfalls and potential regressions needed to guard against. There are also question as to whether or how to use the different APIs provided by by GA.

Integration patterns

Beside using a hardcoded implementation of google Analytics, there exists three main integration patterns for tracking data using a tag manager and pushing it to Google Analytics.

Scraping & event listener

One of the potential integration pattern for Google Analytics, is around scraping information on the website, typically through calls on the Html Dom, but also through extracting data from url structures, as well as pushing some event listener to capture user’s interaction with certain components on the website.

When going for this type of integration pattern, it is typical to enhance the information already available on the website with data attributes. Essentially hidden information added to the different elements on the HTML page that allow for surfacing these piece of elements to google analytics.

<ul>
<li data-category="all">All</li>
<li data-category="beer">Beer</li>
<li data-category="cola">Cola</li>
</ul>

Above, an example setup of a list using data attributes.

The typical way of extracting this information and passing this data to google analytics is through the use of jQuery and a tag manager. Certain tag managers such as Tealium and Ensighten, have facilities within their UI to do these jQuery calls without the need to have to program the code snippets directly in Javascript, while GTM has specific selector for Events and DOM elements:

GTM AutoEvent and DM element selector

Choosing this integration pattern, allows for being able to start pulling data from the information already available on the page and improve it as new information is progressively surfaced. It has the disadvantage of being extremely brittle being dependent on front-end components such as CSS classes, that can change throughout the course of the lifecycle of a website and of increasing the size of the container, by adding all the scraping logic onto it.

DataLayer

The “DataLayer” is a javascript object that serves to provide information to the Tag Manager, Google Tag Manager uses an object named “dataLayer”, while other tag managers use their own object such as Tealium’s utag_data, Qubit’s universal_variable, while some tag manager like Ensighten are interacting with any JS object present on your page.

Events can be pushed directly onto the dataLayer with the help of some javascript:

GTM: dataLayer.push({'event': 'myEventName'});
TEALIUM: utag.link({ "tealium_event" : "myEventName"})
QUBIT:uvHelpers.trackStructEvent(category, action, label, property, value);
ENSIGHTEN: Bootstrapper.ensEvent.trigger(“myEventName”)

This type of integration relies on code between placed on the website and directly integrated as part of the different classes and functions used.

Generally, relying on a direct dataLayer integration is often more reliable than alternatives, events and attributes exposed can more easily be generalized and typically offers higher performance than adding scraping/listening scripts within a tag managers’ container.

The drawback being that it requires websites’ developer to be integrating the logging of attributes and events directly into their code base, which depending on the organization might not be the quickest, flexible, or offer the quickest turnaround and might require a“deployment” to be slotted in.

Structured Data

Structured data is a way to provide information related to your page, that arose to help search engine index websites. The different data points abide to specified schemas, that are meant to represent the most common actions and entities. This information is typically included in pages through a linked data json(ld+json):

This approach can also be leveraged for capturing data for analytics purposes, effectively tying the analytics implementation to the SEO one. This for for good and bad, it allows for re-use but put extra constrains in the analytics implementation.

Beside this, this integration pattern presents a number of advantages:

  • The first advantage from using this implementation approach is that data is structured and doesn’t need to be scraped, meaning less work should be needed in the tag manager to implement the analytics tag.
  • The data is also strictly structured and universal schema, simplifying the deployment across multiple different websites.
  • The integration pattern is not directly dependent on visible UI code

Like the dataLayer pattern, this integration pattern does require development on the website, and is subject to the same types of development pitfalls. Another of its draw back is that it requires custom implementation in the tag manager in order to read the structured data object.

Implementation pitfalls

Even having chosen a robust integration pattern, there is quite a lot of pitfalls that one needs to avoid in order to gather useful and accurate analytics:

Non-Interaction Events: Not properly defining certain events as non-interaction will affect core metrics such as bounce rates.

Duplicate events: Duplicate events might be sent to google analytics due to an error of implementation, duplicate events can affect bounce rates, and make funnels more complicated.

Unexpected page behavior: Unexpected behavior can be the cause of some implementation issue, imagine inputing a coupon or updating quantity of an item in the cart page yielding to a refresh, each refresh would generate a page refresh. This type of unexpected behavior while not being a technically wrong implementation, provides data that is difficult to interpret.

Mismatching ids: One potential issue that may arise, is an inconsistent use of ids across events/pages. What might be used as product-id for example for view content might be an internal product id, while when added to cart a SKU is being used and when finally making a purchase a variant id is being sent to google analytics. These can impact your different reports and make it unable to properly track behavior across the funnel.

Inconsistent price formats: In certain cases, an implementation can contain an inconsistent price format across pages. Sometimes even an invalid price is being pushed. For instance in one implementation the following 3 product prices data had been implemented in different pages:

data-product-price="99"
data-product-price="£99.00"
<meta itemprop="price" content="99.00">

Product price should be a numerical encoded string without currency symbol and needs to be in US currency format.

X-domain tracking : Cross domain measurement can be complicated to start with, but nowadays some browsers have implemented measure against cross domain tracking, such as as Safari that introduced ITP (“intelligent tracking prevention”), Firefox introduced “Private Browsing’ blocking different trackers.

Some of the way to go around some of these cross domain tracking protection, is to host the resources onto the same domain:

Cross-origin resources loaded from the same eTLD+1 as the top-level context will still have access to their storage.

Hit Counts: If there is a plan to use raw data information such as Hit count. max hit count, you can run into some issues in cases where you have implemented tracking for multiple properties based on the same clientId. Google analytics allows the hits to be incremented even though they are not pushed to specific property to allow for merging of sessions in rollup properties.

PII data: PII data or tokens could end up being sent to google analytics, often by inadvertence. By default, the full page url ends up being sent to google analytics on page view. Google analytics explicitly mandates that no PII data be sent, and having PII data sent to google analytics, might end up in your account being blocked.

Maintaining

Once the tracking is implemented, it also needs to be maintained, being an invisible component, regressions can often be overlooked. And setting up some protection against these regression through automated tests and monitoring on the website can be primordial to ensure good quality tracking.s

Automated tests

It is possible to setup automated test on the datalayer implementation. These can be setup as defensive measure, as part of the deployment flow, in order to ensure that code changes don’t impact negatively the tracking currently present on the websites.

Monitoring

Setting up tracking monitoring for Google analytics can be done in multiple ways, one is to rely on Google Analytics Alerts, this however requires that you have a well trafficked website in order to get the be able to get relevant tracking alerts in ways.

Website monitoring tools such as updown.io, offer an alternative way to monitor that tracking is still present on the website.

Scheduled Selenium tests, is another possibility 3to check that the tracking is still present on the website and conforms to the specific tracking requirements.

Google Analytics APIs

Implementing google analytics, sometimes requires integrating with Google Analytics APIs, be it for reporting purpose, to push some backend data, or to provide cost or product information. Google Analytics has 3 main APIs for these purposes.

Reporting API

The reporting API, is Google’s way to allow for programatic reporting using GA. The reporting api let’s you query the datasets in a similar manner to a custom report in google analytics. There is however a different naming of the fields, which can be checked using the dimension explorer. Users of Google analytics 360, may opt rather than use the reporting API to use its’ Big Query exports capabilities instead.

Measurement protocol

The measurement protocol allows someone to push data directly to google analytics, this can be used for example to push transaction data that did not occur directly onto the website. Or by using a backend service to handle all the clickstream logging.

Google provides a tool in order to set up the different type of requests using the measurement protocol through its’ hit builder.

Management API

The management api, allows you to perform tasks such as automating data imports, or manage the remarketing audiences. This API allows you to upload cost information from external ads providers, such as Facebook, product catalogue uploads or additional user data.

If you have Google analytics 360 available using this functionality allows you to further perform query time import and essentially handle some of the data available within google analytics with master-data.


Hacking Analytics

All around data & analytics topics

Julien Kervizic

Written by

Living at the interstice of business, data and technology | Solution Architect & Head of Data | Heineken, Facebook and Amazon | linkedin: https://bit.ly/2XbDffo

Hacking Analytics

All around data & analytics topics

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade