Using clickstream event collectors to complement your Google Analytics

Clickstream collectors such as Snowplow and Divolte can provide a good complement to Google Analytics

Julien Kervizic
Dec 17, 2019 · 4 min read
Image for post
Image for post
Photo by Cleo Vermij on Unsplash

Clickstream events collectors are applications that let you collect raw clickstream data from the front-end of an application. There are multiple reasons why you should rely on these event collectors, and setting them up isn’t that complex.

Why use clickstream event collectors

ClickStream event collectors let you leverage raw data

There is at the same time an overlap and a complimentary value between these solutions. Snowplow and Divolte are event collectors, collecting the raw data that is needed to do deep analysis. Raw data can be used to increase the depth of analysis, it can for instance be leveraged for conversion rate optimization (CRO) for instance, where it allows for a granular analysis of the different customer touch point using click path analysis.

Exporting clickstream to raw data is a feature that is offered within Google Analytics 360, but not the free version, and if you are only looking at acquiring the 360 version for this feature, going the snowplow route might turn out to be more cost effective.

and they make it easier to bypass ad-blockers

The second complimentary value, lies in using them to potentially bypass some of the tracking restriction of ad blockers. Being open-source solutions that need to be self hosted, you can easily set them up on your own domain

Setting them up on your own domain allows to bypass domain black-lists and modifying or having a different tracking script lets it avoid checksum detection. The ability of further customizing the tracking script name, allows it to bypass ad-blockers looking for “track” or “analytics” in their name.

These solutions allow the data to be ingested as a data stream, and creating an application, it is possible to push back this data to Google analytics or other analytics tools.

they can make clickstream an integral part of an application

The clickstream collectors ability to push data to a message broker such as Kafka, allows to include them as an integral part of an application. The type of applications that could rely on this type of data range from real-time reporting, real-time marketing trigger to real-time (prediction) model evaluation.

Setting up a clickstream event collector

Components

Image for post
Image for post

Tracking Script: The tracking script role is to capture the different actions performed by users browsing the website and pushes these events to the clickstream collector API.

Clickstream collector API: A clickstream collector API, is merely a receiving end point of an API, that might perform 1) request authorization 2) schema validation and push the data to a message broker for ingestion.

Message broker: A message broker is there to allow for the asynchronous processing of the data. One of the most popular message broker for data is Apache Kafka. Applications can directly consume the data stream to compute real-time aggregates or filter the stream.

Data Sink: A data sink will take the incoming data from the message broker and push it to the storage layer. This is usually a S3 bucket on AWS, a DataLake storage on Azure or a plain HDFS file.

Storage Layer: The storage layer provides a long term storage for the incoming data. Most compute engine on Hadoop such as Presto, Spark or their cloud equivalent such as AWS Athena are able to query files on a bucket storage.

Containerized Applications

At their core, setting up these solutions can be done through setting up “Container” applications, both snowplow and divolte provide docker images. These can be setup on a VM, a container service such as Azure Container instance, or on Kubernetes or docker-swarm.
The containers should interact with a load balancer for autoscaling and a messsage queue. The docker-compose file of divolte for example bootstrap a Kafka instance for example.

Tracking Script

There are different methods for the implementation of these clickstream loggers from a front end perspective.

  • Leverage their tracker SDK directly: this can be done in the following way for divolte: divolte.signal(‘myEvent’, { param: ‘foo’, param2: ‘bar’ })
  • Leverage the Snowplow tracker protocol: This is a similar setup to Google Analytics Measurement protocol but for snowplow, allowing direct calls to the tracking API
  • Leverage custom plugin for Google analytics: This lets you leverage your current tracking for Google Analytics and duplicates the tracking for Snowplow/Divolte

Conclusion

Using a clickstream event connector can bring a lot of benefits in terms of how your data is being tracked and in the granularity of the data available or making the data available in real-time to the application. There are open source solutions for it that can easily be deployed based on docker containers.

Hacking Analytics

All around data & analytics topics

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store