Using clickstream event collectors to complement your Google Analytics

Clickstream collectors such as Snowplow and Divolte can provide a good complement to Google Analytics

Julien Kervizic
Dec 17, 2019 · 4 min read
Photo by Cleo Vermij on Unsplash

Clickstream events collectors are applications that let you collect raw clickstream data from the front-end of an application. There are multiple reasons why you should rely on these event collectors, and setting them up isn’t that complex.

Why use clickstream event collectors

ClickStream event collectors let you leverage raw data

Exporting clickstream to raw data is a feature that is offered within Google Analytics 360, but not the free version, and if you are only looking at acquiring the 360 version for this feature, going the snowplow route might turn out to be more cost effective.

and they make it easier to bypass ad-blockers

Setting them up on your own domain allows to bypass domain black-lists and modifying or having a different tracking script lets it avoid checksum detection. The ability of further customizing the tracking script name, allows it to bypass ad-blockers looking for “track” or “analytics” in their name.

These solutions allow the data to be ingested as a data stream, and creating an application, it is possible to push back this data to Google analytics or other analytics tools.

they can make clickstream an integral part of an application

Setting up a clickstream event collector

Components

Tracking Script: The tracking script role is to capture the different actions performed by users browsing the website and pushes these events to the clickstream collector API.

Clickstream collector API: A clickstream collector API, is merely a receiving end point of an API, that might perform 1) request authorization 2) schema validation and push the data to a message broker for ingestion.

Message broker: A message broker is there to allow for the asynchronous processing of the data. One of the most popular message broker for data is Apache Kafka. Applications can directly consume the data stream to compute real-time aggregates or filter the stream.

Data Sink: A data sink will take the incoming data from the message broker and push it to the storage layer. This is usually a S3 bucket on AWS, a DataLake storage on Azure or a plain HDFS file.

Storage Layer: The storage layer provides a long term storage for the incoming data. Most compute engine on Hadoop such as Presto, Spark or their cloud equivalent such as AWS Athena are able to query files on a bucket storage.

Containerized Applications

Tracking Script

  • Leverage their tracker SDK directly: this can be done in the following way for divolte: divolte.signal(‘myEvent’, { param: ‘foo’, param2: ‘bar’ })
  • Leverage the Snowplow tracker protocol: This is a similar setup to Google Analytics Measurement protocol but for snowplow, allowing direct calls to the tracking API
  • Leverage custom plugin for Google analytics: This lets you leverage your current tracking for Google Analytics and duplicates the tracking for Snowplow/Divolte

Conclusion


Hacking Analytics

All around data & analytics topics

Julien Kervizic

Written by

Living at the interstice of business, data and technology | Solution Architect & Head of Data | Heineken, Facebook and Amazon | linkedin: https://bit.ly/2XbDffo

Hacking Analytics

All around data & analytics topics

More From Medium

More from Hacking Analytics

More from Hacking Analytics

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade