Cookies, Tracking, and pixels: Where does your Web data come from?

Julien Kervizic
Oct 23, 2018 · 10 min read
Image for post
Image for post
“chocolate cookies on blue textile” by Yullina D on Unsplash

Cookies, tracking scripts, and pixels are different tools used to get a better understanding of the users on your website. They are used to help identify, collect, and transfer data from a given site to software catering to analytics and advertising services.

Cookies

Cookies are some form of storage within your browser that is generally used to store some kind of IDs such as userids and session ids, some session parameters, for instance, if you already agreed to a cookie gate or some personalization parameters.

They are generally split into two categories:

  • First party cookies: These are cookies set on your domain, this means that you can easily create, retrieve, and edit their content as you see fit when a user visit your website. Within first-party cookies, you further have a class of cookies called “Samesite” which provides a layer of protection against cross-site request forgery. Using Samesite, only the request for data originating from the first party domain would result in the cookies being provided.
  • Third-party cookies: These are cookies set on external domains, the browser usually set a restriction on accessing cookies on external domains. You are, however able to usually check the information that you are creating within a session on these websites. Third-party cookies are generally placed to do cookie synching and matching and stitching of identities across websites, usually the domain of Data Management Platform (DMP) and Customer Data Platform (CDP) as well as of ad exchange.

For instance, it is possible to identify from Chrome’s Developer tools the different cookies being set. This is available from the google chrome’s menu under more tools.

Image for post
Image for post

Once there, checking on the “Application” tab displays the different storage components in your browser. We can unfold the different cookies being set in the example below from Medium. For instance, we can see all first-party cookies being set. They are considered first parties as the highlighted domain is the same as the one accessing them.

First party cookies for Medium for instance as displayed in Google Chrome Developer Tools

Tracking Pixels & Tracking Scripts

Tracking pixels are pieces of code, usually using an image to serve as a bridge between websites. They are typically set up as a 1 by 1 pixel gifs in order to save on bandwidth and hence are called tracking pixel. Tracking Scripts on the other hand are pieces of Javascript code that usually implement a tracking pixel on a website and are responsible for creating different types of request to external domains, ultimately passing data to them.

Image for post
Image for post

In the above example, let say we have a website such as Medium having implemented a google analytics tracking scriptanalytics.js” the code of which is supposed to inject a tracking pixel on the page. The tracking script is able to access and create requests using the data available on the page as well as set different cookies that can be used as identifier. This data normally passed through consist of data available within the HTML, within an url, within a data layer, cookies, or obtained through an event listener or an API call. The tracking script can make different types of requests to pass that information to the tracking pixel.

Image for post
Image for post
In the above example, the data for a given google analytics user id is being passed to Google.

The above example shows how google analytics is able to extract the _gid set in a cookie in order to pass it through to google analytics for a page view event. The particularity of this _gid is that it was set by google analytics to have a particular identifier to track unique users anonymously.

The tracking pixel is meant to handle the communication between the webpage you are hosting and the external site.

Image for post
Image for post

By the cookie being itself placed on an external site, it can act as a bridge between the webpage and the external domain. Being hosted on an external domain, it can access this domain’s first-party cookie and merge the data provided by the external website with its first-party cookies. In such a sense, it is able to do matching of identities.

In the case of Facebook, for instance, we can see of a specific local Facebook userid is passed as a cookie as long side a request generated by an external site:

Image for post
Image for post
we can see that my userid is being passed alongside in the request headers of the specific Pageview event
Image for post
Image for post

Typing this id in facebook for instance:facebook.com/500985020 would resolve in my personal Facebook page.

In order for the Facebook pixel to have access to this data, it does not require to have the Facebook connect or other connector being installed on the site, merely to have the cookie already set (ie: you loggedin facebook) and to have the tracking pixel implemented on your site.

Image for post
Image for post

Looking in Chrome developer again, to understand how tracking is implemented. We can go to the network tab to better understand how the data is passed between the website and google analytics for instance. In some cases google analytics tracks different event using the collect call which is represented in the picture above. We can identify a few things from there on:

  • From the type, we can see that it refers to a request of type “gif” indicating that it is a call to a tracking pixel.
  • The initiator is “analytics.js”, google tracking script and the one ultimately deciding what data needs to be passed
  • Since the call is a GET request, we can see the different request parameters within the request url. The different parameters sent represent the information that we want to push from our website to the external google analytics website, excluding their first party cookies
Image for post
Image for post
For Google analytics the measurement protocol provides an explanation of these query parameters

Some tracking websites such as google analytics with its measurement protocol normally provide some explanation of the different parameters being used.

Implementing GA Tracking

The first step of implementing google analytics tracking after having setup analytics.js on your website is to initialize google analytics for a given property on your website.

This is usually done using the above command , where the second parameter represent what google analytics call their “property”, ie: a specific individual space within an account to collect data.

The above snippet show how a variable is initialize from some available javascript data on a given page. In the above example the variable checkoutType is defined.

This piece of data collected from the available javascript can then be transferred to google using a custom event as shown above.

Beside custom implementation as shown above, Tag Managers such as google tag manager, Tealium or Ensighten can be used to map available data and events to specific tracking within google analytics. These work by assigning relationship between cookies, javascript variables and other information part of a website data layer and its implementation within a tag.

Image for post
Image for post
the mapping of page_category javascript variable on a website to Google Analytics Category Category variable within Tealium’s google analytics tag

What can be tracked?

There is a wide range available with regards to the sources of data that can be tracked from a given user visit. From the url that the visit on the page, to the data that has been surfaced on the page either directly through HTML elements or through javascript variables, to ids or personalization parameters setup within cookies, through tracking certain actions being performed on pages or by enriching the data available with API calls, potentially from external sources.

Urls

Urls are one of the first source of information used in tracking, current url can easily be extracted using a javascript call or using a tag manager.

Image for post
Image for post
current url can be extracted using a javascript call to window.location.href

Google notably introduced Urchin Tracking Module (UTM) parameters in order to be able to tie the source of traffic to a given marketing campaign. There are five UTM parameters are described below:

  • Source: used to identify the origin of the traffic, ie: google, bing, facebook, email
  • Medium: which identify what type of objective was used to bring traffic to the website, ie: cpc, cpm, email, social,, …
  • Campaign: the campaign that was used to bring the traffic to the website
  • Content: Usually provide some deeper information on the source of the click that lead to the website visit, for instance which particular piece of content on a specific email or page led to that visit on your website
  • Term: Typically a field only provided for paid search to provide information related to the keywords that brought up the visit

Not only parameters but also full page urls can provide value. Notably, having visibility on full page urls allows one to do funnel analysis, allow to split the data by domain when consolidating data from multiple website on a given google analytics property etc…

Javascripts Variables

Javascript variables can also be used for tracking purposes, google tag manager (GTM) notably tries to consolidate the data available for tracking purposes within a dataLayer javascript variables and Tealium a utag_data javascript variables. Other variables might have also been set on the page and accessible by a script.

In the GTM documentation, for instance there is a definition of two variables defined in the data layer:

<script>
dataLayer = [{
'pageCategory': 'signup',
'visitorType': 'high-value'
}];
</script>

These could be easily accessed to provide some additional context related to the pageCategory and the visitor type for a given page view. Accessing and passing these variables can be done in a similar way to the checkout type example previously seen.

HTML Data

Generally all the data setup within a page’s HTML structure is accessible for tracking purpose. jQuery notably provide an easy way to access the data contained with HTML by providing a way to access HTML data by calling its’ CSS path, example:

Image for post
Image for post

In the above example we use the following jQuery command to access the css path of an HTML component.

It is also traditionally customary to include within certain html elements additional data than the one displayed using data- attributes hidden in the code.

Cookies

As we previously seen it is possible to pass along data contained in cookies for tracking purposes. Data contained within the website is usually composed of different identifier that can be sent such as a user id for tracking or identity stitching purpose, or can contain personalization parameters such as your gender. Within javascript a call to document.cookie

document.cookie

Notably allows to get a semicolon separated list of cookievar=cookievalue, which can then be parsed to retrieve specific cookies’ values. jQuery introduced a more accessible way to extract a given cookie’s value:

$.cookie("cookievar")

Javascript calls and Event Listener

The cleanest way to generate and track event is to add them when a specific event occur. It usually involves binding a given action to a javascript function providing the tracking. For instance let’s look at the code of a buy button:

The buy button above would call the javascript function “buy” with the parameter 123, presumably denoting some sort of product id.

This function could then be decomposed in javascript between one that is intended to handle the functionality, in the above example adding to cart and one responsible for providing data to a tracker.

Javascript and more specifically jQuery provide an easy way to track specific actions when they were not originally built in using event listeners. Event listeners allow to add specific actions when certain event occurs, for instance:

The above javascript add a click event listener to the link of class up which will trigger a function to output “Someone clicked” in the javascript console. jQuery contains different class of event listeners, that can focus on different triggers such as on page load, on mouse hover … This feature of jQuery is something that can be both coded or setup within different Tag Management systems.

API Calls

API calls can be used to obtain additional data from different sources when the data available from the page itself is not sufficient. Imagine for instance that you want to track competitor’s prices, or a foreign currency exchange rate when your website visitors are seeing certain pages.

In the above function a request is made using jQuery to fetch the data provided by a given url and output it to the console. The data obtained this way can easily be passed and parsed to a javascript variables to enrich the data being tracked on your website.


Hacking Analytics

All around data & analytics topics

Julien Kervizic

Written by

Living at the interstice of business, data and technology | Solution Architect & Head of Data | Heineken, Facebook and Amazon | linkedin: https://bit.ly/2XbDffo

Hacking Analytics

All around data & analytics topics

Julien Kervizic

Written by

Living at the interstice of business, data and technology | Solution Architect & Head of Data | Heineken, Facebook and Amazon | linkedin: https://bit.ly/2XbDffo

Hacking Analytics

All around data & analytics topics

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store