Measuring Attention on Advertising

Published in

GumGum Tech Blog

10 min readMar 21, 2023

Photo of a closeup of an eye with a rainbow shadow going through it — Photo by Harry Quan on Unsplash

In this post we will explore how we build scalable attention measurement models for digital advertising at Playground XYZ. The content is generally high level, but I have added some technical deep dives boxes throughout the post for readers who want to know more detail.

What is Attention?

Attention is a general English word that has a specific meaning in cognitive psychology, where it refers to the information that our mind is actively processing. We may or may not be consciously aware of where our attention is focused, but careful experimentation can reveal what we pay attention to.

In marketing and advertising, we are usually looking to measure whether a media consumer has their eyes fixated on the part of the screen that contains an ad. This is important because it signals that the consumer is cognitively processing the content of the advertising.

It is critical to recognize that attention is more than a measurement of viewability. The viewability of an ad is an important sanitation metric that determines if the consumer had an opportunity to see the ad. Attention steps beyond that and measures the amount of time that a consumer’s eyes were fixated directly on the ad creative. This is one of the strongest external indications we have that someone’s interest was captured by an ad and they devoted some cognition to its content. This is perhaps the first near-to-real time metric we have that the ad is succeeding in building what modern marketing science calls mental availability. In essence this means creating a memory structure of the brand, so that when the consumer wants to make a purchase, the advertised brand has a chance to be considered.

How can we measure attention?

The process of measuring attention requires multiple steps which we outline below. The key to doing this well requires that you build a solid foundation of ground truth data by using eye tracking technology. You need to collect this data for large numbers of consumers, across large volumes of different media, so that you understand the conditions under which people fixate on ads. Some terminology you will encounter in this blog post:

Panelist: The people you have paid to participate in a data collection process where they consume media as their eyes are tracked.

Gaze Fixation: The point on a screen at which your eyes are focused. Typically measured as a set of x, y coordinates on your screen.

Machine Learning: A set of technologies for building predictive models that learn relationships between data points.

Features: The individual data points that are provided to a machine learning algorithm to make a prediction.

Step One — Track Eye Movements

To measure attention we need an application that is able to track eye movements as people consume media. Historically this has been done using expensive headset devices that require your panelists to come into an artificial lab environment to track their eyes. Interestingly, some of the earliest eye tracking studies ever done were performed to understand which parts of an advertising creative captured the attention of consumers.

Increasingly, eye tracking data is being collected on personal computing devices using machine learning models that access the camera to map facial images to the point of gaze fixation on the screen. The advantage of these on-device eye tracking systems is that you can collect data in a more natural setting, since panelists use their own device in their own home.

Technical Deep Dive - Eye Tracking with Machine Learning
========================================================

The machine learning systems that learn the relationship between facial 
images and eye fixation coordinates use a convolutional neural network. 
This is an approach to designing neural networks for image processing 
systems that learns a set of rotationally invariant feature extractors 
from raw images. This is critical because users of devices can tilt
their heads at variety of angles and appear at different positions within
the camera's field of view.

After the convolutional layers of the neural network, the model then learns
the relationship between the extracted features and the screen position
that the user is fixated on. This is outputed as a pair of x,y coordinates
within the viewport for where we expect the user to be focusing.

The foundation of machine learning approaches to eye tracking is outlined 
in this paper from a research team at MIT.

[Eye Tracking for Everyone](https://gazecapture.csail.mit.edu/)
K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. Bhandarkar, 
W. Matusik and A. Torralba
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

The eye tracking data is the foundation of any attention measurement system. If you do not have eye tracking data to build from, then you do not have a ground truth of how much attention an advertisement garnered within a given media environment.

Step Two — Collect Panel Data

The second step to creating attention measurement technology is to apply your eye tracking model to large numbers of consumers who are reading media and being exposed to advertising. We do this to build up a data set of what people pay attention to, in different contexts, and how they behave in the process.

To achieve this, we create an application that will be used by panel members who agree to let us track their eye movements while they use their device. They are remunerated for participation, and we provide guidance on what we want them to do. Importantly, we collect this data from a large number of panelists, browsing multiple different websites or apps, so that we have representative sets of consumer behaviour. These panels of consumers allow us to build up a large data set of how people interact with media and how long they look directly at advertising.

Step Three — Collect Features

In this step, we want to collect additional information about our panelist’s browsing behaviour that can be used to estimate the attention time when the eye tracking is not present. The reason for this requirement is that our scalable product requires attention measurement that extends beyond the panel members and can be applied to every impression we observe.

This means we need to find signals in the ad log data that are highly correlated with the attention time we measure for panel members. Once we do this, we can predict attention time with accuracy and extend our measurement into the space of campaign tracking and reporting.

What could these signals be?

There are two main sources of signal.

Environment: This is everything about the page and the ad itself. What webpage is the user looking at, at what time of day, with what kind of a device. These are all factors that can affect the amount of attention a consumer will pay attention to an ad. We can capture them directly with the ad tag.

Behaviour: Context can take us a long way, but ultimately it is the way that the user interacts with the page that allows us to get high quality signals of their attention. For example, when they scroll through the page how long is the ad viewable. Which part of the screen is the ad sitting on: the top or the bottom. How long did they spend on the page in total, and how far through the page did they get. All of these characteristics of the user interaction betray what they are paying attention to.

Technical Deep Dive - Panel Data Collection
===========================================In each panelist viewing session, we can control (or inspect) the ad 
inventory on the websites that a panel member is viewing. We can then identify the location of the ad unit and cross reference it with the eye fixation coordinates from the eye tracking model. We take eye tracking snapshots at intervals that depend on the device (average of 40ms) and match these to where the ad was on screen. This requires synchronizing the timestamps between eye tracking and the AMT tag. The combination of this data is how we calculate the attention time for each ad a panel member is exposed to.In addition, we collect a wide variety of other data points including the user context (browser and local time), the media context (the page location and title), and the behaviour of the user (scroll speed, session time, etc). There are more than 40 of these environmental and behavioural features collected during the panel session.At the end of this process, we have large numbers of browsing sessions, across large number of panelists, from different parts of the world. Each session contains multiple ad exposures with detailed measurements of attention time, along with the environmental and behavioural signals. All these information points will be used to build the scalable attention model.

Step Four — Build An Attention Model

Our panel data is useful in-and-of-itself as a means to understand the distribution of attention of different ad creatives in different environments. But in order to apply it at scale across live campaigns, we need a model that can infer the attention time for an impression from the features collected in the previous section.

These features, along with the measurements of attention, are provided to an array of machine learning algorithms in a series of experiments. The goal of these experiments is to determine the perfect combination of data processing and machine learning algorithm to predict attention. A critical part of this process is determining whether the model is accurate enough to be used for measurement. Typically, the distributions of attention time are unusual distributions, nothing like the bell curve you would have learned about in standard statistics courses. There is a big peak at zero (surprise — lots of ads get no attention at all), and then there may be multiple other peaks as well as a long tail, where a small number of ads gets lots of attention.

The complexity of the attention distribution means that simple statistical techniques do not work well. We apply lots of techniques; some from actuarial science, and some from computer science to develop the right model for attention. These models are tuned so that they are able to distinguish well between high attention and low attention impressions. And, as you will see later, we ensure that they can be relied upon to provide accurate measurements of the mean attention on a block of impressions.

At the end of this process we have an Attention Model, a custom algorithm which can tell us the expected attention time for a specific ad impression based on user behaviour and context. This model is the source of our attention measurement for live campaigns.

Step Five — Validate the Model

There are a range of ways to measure the performance of any measurement system. The metric you choose depends on what you are using the model for. At Playground XYZ, we use a wide variety of metrics to understand how our models will help customers measure and optimize attention on their campaigns.

One of the most useful approaches is to look at a decile plot that compares the mean predicted attention against true mean attention on ranked deciles of impressions. These plots tell us how well the model can distinguish between high and low attention impressions, as well as the expected accuracy of the mean attention metrics used in campaign reporting. An example of a decile plot is shown below.

Figure 1. Decile plot of predicted attention time against actual attention time for MREC ads

This plot, from an earlier iteration of our model for IAB standard Medium Rectangle (MREC) ads, shows very close alignment between predicted and actual attention. The first decile is the only disparity, but in this case it has clearly managed to separate these high attention impressions, but struggles to perfectly capture the magnitude of the attention they receive.

Another important factor of building and using any measurement tool is to understand your expected error. In the measurement of advertising, this is typically required at an aggregated reporting level (as opposed to individual ad impressions). The goal of our measurement are to provide guidance on the mean attention on a specific platform, type of inventory or device. We want to know how far these measurements of mean attention are from the true values. We provide clear guidance on these expected errors using a technique called bootstrap sampling.

Figure 2. Estimated distribution of error in mean attention using bootstrap sampling.

In bootstrap sampling, we repeatedly take slices of our data and compare the mean predicted attention time with the mean true attention time on those blocks of impressions. We can then produce plots like Figure 2, which show us that 50% of the time our measurements will be within 50 milliseconds of the true mean attention time. Furthermore, the vast majority of the time, the measured mean attention time will be within approximately 100 milliseconds on either side of the true mean.

Conclusion

Measuring attention on digital advertising requires a series of well engineered processes and a lot of attention (no pun intended) to detail. It is important to build on ground truth data that uses eye tracking to establish the amount of time people look directly at advertising in different circumstances. Each stage of the pipeline needs robust methods to quantify the potential for error, and testing procedures to ensure the accuracy of models is sufficient to deliver business value.

The reward from building a rigorous attention measurement pipeline is a metric that allows us to measure and optimize digital advertising for brand outcomes. It is the first real-world step for programmatic away from being a purely performance ecosystem and towards a new framework for quantifiable brand advertising.

We’re always looking for new talent! View jobs.