Cookies, Tracking and pixels: Where does your Web data comes from?

“chocolate cookies on blue textile” by Yullina D on Unsplash

Cookies, tracking scripts and pixels are different tools used to get a better understanding of the users on your website. They are used to help identify, collect and transfer data from a given website to software catering to analytics and advertising services.

Cookies

Cookies are some form of storage within your browser, that is generally used to store some form of IDs such as userids and session ids, some session parameters for instance if you already agreed to a cookie gate or some personalization parameters.

They are generally split in two categories:

  • First party cookies: These are cookies set on your own domain, this means that you can easily create, retrieve and edit their content as you see fit when a user visit your own website. Within first party cookies you further have a class of cookies called “Samesite” which provides a layer of protection against cross-site request forgery. Using Samesite, only the request for data originating from the first party domain would result on the cookies being provided.
  • Third party cookies: These are cookies set on external domains, browser usually set restriction on accessing cookies on external domains. You are however able to usually check the information that you are creating within a session on these websites. Third party cookies are usually placed in order to do cookie synching and matching and stitching of identities across website, usually the domain of Data Management Platform (DMP) and Customer Data Platform (CDP) as well as of ad exchange.

For instance, it is possible to identify from Chrome’s Developer tools the different cookies being set. This is available from the google chrome’s menu under more tools.

Once there, checking on the “Application” tab displays the different storage components in your browser. We can unfold the different cookies being set in the example below from Medium for instance we can see all first party cookies being set. They are considered first parties as the highlighted domain is the same as the one accessing them.

First party cookies for Medium for instance as displayed in Google Chrome Developer Tools

Tracking Pixels & Tracking Scripts

Tracking pixels are pieces of code, usually using an image to serve as a bridge between websites. They are usually setup as a 1 by 1 pixel gifs in order to save on bandwidth and hence are called tracking pixel. Tracking Scripts on the other hand are pieces of Javascript code that usually implement a tracking pixel on a website and are responsible for creating different types of request to external domains, ultimately passing data to them.

In the above example, let say we have a website such as Medium having implemented a google analytics tracking scriptanalytics.js” the code of which is supposed to inject a tracking pixel on the page. The tracking script is able to access and create requests using the data available on the page as well as set different cookies that can be used as identifier. This data normally passed through consist of data available within the HTML, within an url, within a datalayer, cookies or obtained through an event listener or through an API call. The tracking script is able to make different type of requests to pass that information to the tracking pixel.

In the above example the data for a given google analytics user id is being passed to google

The above example shows how google analytics is able to extract the _gid set in a cookie in order to pass it through to google analytics for a page view event. The particularity of this _gid is that it was itself set by google analytics to have a particular identifier to track unique users anonymously.

The tracking pixel is meant to handle the communication between the webpage you are hosting and the external site.

By virtue of the cookie being itself placed on an external site it can act as a bridge between the webpage and the external domain. Being hosted on an external domain it can access this domain’s first party cookie and merge the data provided by the external website with its first party cookies. In such a sense it is able to do matching of identities.

In the case of facebook for instance, we can see of a specific local Facebook userid is passed as a cookie as long side a request generated by an external site:

we can see that my userid is being passed alongside in the request headers of the specific Pageview event

Typing this id in facebook for instance:facebook.com/500985020 would resolve in my personal Facebook page.

In order for the Facebook pixel to have access to this data, it does not require to have the Facebook connect or other connector being installed on the site, merely to have the cookie already set (ie: you loggedin facebook) and to have the tracking pixel implemented on your site.

Looking in Chrome developer again, to understand how tracking is implemented. We can go to the network tab to better understand how the data is passed between the website and google analytics for instance. In some cases google analytics tracks different event using the collect call which is represented in the picture above. We can identify a few things from there on:

  • From the type, we can see that it refers to a request of type “gif” indicating that it is a call to a tracking pixel.
  • The initiator is “analytics.js”, google tracking script and the one ultimately deciding what data needs to be passed
  • Since the call is a GET request, we can see the different request parameters within the request url. The different parameters sent represent the information that we want to push from our website to the external google analytics website, excluding their first party cookies
For Google analytics the measurement protocol provides an explanation of these query parameters

Some tracking websites such as google analytics with its measurement protocol normally provide some explanation of the different parameters being used.

Implementing GA Tracking

The first step of implementing google analytics tracking after having setup analytics.js on your website is to initialize google analytics for a given property on your website.

This is usually done using the above command , where the second parameter represent what google analytics call their “property”, ie: a specific individual space within an account to collect data.

The above snippet show how a variable is initialize from some available javascript data on a given page. In the above example the variable checkoutType is defined.

This piece of data collected from the available javascript can then be transferred to google using a custom event as shown above.

Beside custom implementation as shown above, Tag Managers such as google tag manager, Tealium or Ensighten can be used to map available data and events to specific tracking within google analytics. These work by assigning relationship between cookies, javascript variables and other information part of a website data layer and its implementation within a tag.

the mapping of page_category javascript variable on a website to Google Analytics Category Category variable within Tealium’s google analytics tag

What can be tracked?

There is a wide range available with regards to the sources of data that can be tracked from a given user visit. From the url that the visit on the page, to the data that has been surfaced on the page either directly through HTML elements or through javascript variables, to ids or personalization parameters setup within cookies, through tracking certain actions being performed on pages or by enriching the data available with API calls, potentially from external sources.

Urls

Urls are one of the first source of information used in tracking, current url can easily be extracted using a javascript call or using a tag manager.

current url can be extracted using a javascript call to window.location.href

Google notably introduced Urchin Tracking Module (UTM) parameters in order to be able to tie the source of traffic to a given marketing campaign. There are five UTM parameters are described below:

  • Source: used to identify the origin of the traffic, ie: google, bing, facebook, email
  • Medium: which identify what type of objective was used to bring traffic to the website, ie: cpc, cpm, email, social,, …
  • Campaign: the campaign that was used to bring the traffic to the website
  • Content: Usually provide some deeper information on the source of the click that lead to the website visit, for instance which particular piece of content on a specific email or page led to that visit on your website
  • Term: Typically a field only provided for paid search to provide information related to the keywords that brought up the visit

Not only parameters but also full page urls can provide value. Notably, having visibility on full page urls allows one to do funnel analysis, allow to split the data by domain when consolidating data from multiple website on a given google analytics property etc…

Javascripts Variables

Javascript variables can also be used for tracking purposes, google tag manager (GTM) notably tries to consolidate the data available for tracking purposes within a dataLayer javascript variables and Tealium a utag_data javascript variables. Other variables might have also been set on the page and accessible by a script.

In the GTM documentation, for instance there is a definition of two variables defined in the data layer:

<script>
dataLayer = [{
'pageCategory': 'signup',
'visitorType': 'high-value'
}];
</script>

These could be easily accessed to provide some additional context related to the pageCategory and the visitor type for a given page view. Accessing and passing these variables can be done in a similar way to the checkout type example previously seen.

HTML Data

Generally all the data setup within a page’s HTML structure is accessible for tracking purpose. jQuery notably provide an easy way to access the data contained with HTML by providing a way to access HTML data by calling its’ CSS path, example:

In the above example we use the following jQuery command to access the css path of an HTML component.

It is also traditionally customary to include within certain html elements additional data than the one displayed using data- attributes hidden in the code.

Cookies

As we previously seen it is possible to pass along data contained in cookies for tracking purposes. Data contained within the website is usually composed of different identifier that can be sent such as a user id for tracking or identity stitching purpose, or can contain personalization parameters such as your gender. Within javascript a call to document.cookie

document.cookie

Notably allows to get a semicolon separated list of cookievar=cookievalue, which can then be parsed to retrieve specific cookies’ values. jQuery introduced a more accessible way to extract a given cookie’s value:

$.cookie("cookievar")

Javascript calls and Event Listener

The cleanest way to generate and track event is to add them when a specific event occur. It usually involves binding a given action to a javascript function providing the tracking. For instance let’s look at the code of a buy button:

The buy button above would call the javascript function “buy” with the parameter 123, presumably denoting some sort of product id.

This function could then be decomposed in javascript between one that is intended to handle the functionality, in the above example adding to cart and one responsible for providing data to a tracker.

Javascript and more specifically jQuery provide an easy way to track specific actions when they were not originally built in using event listeners. Event listeners allow to add specific actions when certain event occurs, for instance:

The above javascript add a click event listener to the link of class up which will trigger a function to output “Someone clicked” in the javascript console. jQuery contains different class of event listeners, that can focus on different triggers such as on page load, on mouse hover … This feature of jQuery is something that can be both coded or setup within different Tag Management systems.

API Calls

API calls can be used to obtain additional data from different sources when the data available from the page itself is not sufficient. Imagine for instance that you want to track competitor’s prices, or a foreign currency exchange rate when your website visitors are seeing certain pages.

In the above function a request is made using jQuery to fetch the data provided by a given url and output it to the console. The data obtained this way can easily be passed and parsed to a javascript variables to enrich the data being tracked on your website.