The New Age of TV Advertising (Part II) — Re-Thinking Data Driven Attribution Models

Tech@ProSiebenSat.1
ProSiebenSat.1 Tech Blog

--

by Dr. Chi-Ju Wu & Sebastian Fetz

In our first article, we gave you a high-level overview on TopSpot, which is a project to uplift ROI by optimizing the TV advertisement airing using the power of Deep Learning. So how do we do it? We need two steps to achieve this: historical measurement and future prediction. In this article, we will focus on the first step: attribution calculation for historical TV spots.

What is an attribution model?

Have you ever wondered what the trigger behind your own purchase is? Let’s say you receive an e-mail that promotes the new perfume “Elegant Fragrance,” then you see a poster about that perfume on the street. Two days later, you see an advertisement about this perfume on the social media. Since you are now interested in it, you went to the website and purchased the product. And this is how your path for this specific purchase looks like:

Fig. 1 Typical user path before engagement with the brand or product

All the channels (touch points) that you have encountered have more or less impact on your final decision. So how do we give credit of your purchase to each channel, and most importantly, by what extend? There are many existing models, the so-called Attribution models, available to answer this question (Fig. 2). Each model has its way to credit the channels. For example, the linear model assumes each touch point has equal weight. The first touch model focuses more on the first touch point since it serves as a first impression. The last touch model puts more weight on the last touch point since it is the closest one to the purchase. The position-based model puts higher weights on both first and last touch points, whereas time-decay models gradually increase the weight through the order of the touch points.

Fig. 2 Commonly used attribution models.

On top of these basic models, data-driven models are preferred in many big tech companies, such as Google and Facebook. The data-driven model takes advantage of big data from the real world, analyzes the engagement of each customer and determines which feature of the products or which touch points have more impact. However, we need to keep in mind that there is no ‘one true’ data-driven model, nor ‘truly correct’ attribution model. It all depends on the use cases and the business goals.

Data-driven attribution model at ProSiebenSat.1

At ProsiebenSat.1, we air averagely 10,000 TV advertisements per day to a vast number of Smart TVs that have internet connection. Additionally, we could track the website activities of some of the ProSiebenSat.1 NuCom assets. The connection between TV watching to the engagement of the product is like a journey of users. Only in this case, the touch points are the TV spots, and the user might have several engagements during the journey before the final conversion. We want to find out the impact of each advertisement to the customer, such as the graph showing below.

Fig. 3 An example of a user journey in ProSieben world. TV screen colors represent different channels.

However, our use case is much more complicated than a normal attribution problem, as each user could have up to hundreds of touch points, i.e., see hundreds of TV ads, instead of a handful of traditional definition of channels, such as banner ads, emails, and radio. The basic attribution models are no more suitable for this level of complexity, and the amount of data actually allows us to take advantage of data-driven models. Among data-driven models, we chose the Markov chain model to help us gain more insights. The Markov chain is a stochastic model describing a sequence of possible events with corresponding probability between each event. The outcome can be predicted with the Markov process, which consists of the initial state and the transition states. The initial state describes the starting probability distribution across the events, and the transition state describes the probability of transition from one event to another. The Markov process can be visualized with a Markov graph:

Fig. 4 An example of the Markov chain graph. The numbers represent the probabilities from one node to another. Note that the arrows are the only rightwards, as each TV ad at any given time is treated as an individual node.

Each TV advertisement plays different roles in each unique path. For instance, although that both Alice and Bob watched the advertisement about an item, Alice did not make any reaction in the end whereas Bob visited the website on the next day. Naturally, we know that the weight (e.g., attribution) values of this advertisement would be different in Alice and Bob’s journeys. Now, try to imagine we have 300,000 journeys from 50,000 users, who potentially watched 100 ads per day. The graph is getting complicated, isn’t it? Luckily, the Markov chain model can simplify hundreds of thousands of user journeys into pure matrix calculation.

Removal Effect

When calculating the importance of each touch point using the Markov chain, one of the straightforward methods is called the Removal effect. The concept of Removal effect is to see how many reactions we would lose if we did not have a certain touch point. Based on this core idea, we can think that the removed touch point would lead directly to the non-conversion, as shown in Fig. 5, then calculate the difference in the final reactions. We do this N times, for N advertisements. The more important an advertisement is, the more reaction we would lose if we removed it.

Fig. 5 If the node has no effect (gray node), all the users that touch this node would directly go to non-conversion.

Let’s take a simple example: We have 100 people reacting to a certain product and we aired three advertisements. If the reaction probabilities drop by 0.01, 0.03, 0.06[1] after we remove three touch points A, B, C individually, we could consider that the importance ratio of A:B:C is 0.1: 0.3: 0.6. Based on these weights, we attribute credit to the final number engagement. This weighted reactions 10, 30, 60 would be the attribution values for the three touch points. Voila

What next?

Now we have the attributions (or credits) of all the aired advertisements from ProSiebenSat.1, what is the next step? The answer is prediction. Since we can treat the attribution values as the level of importance in users’ journeys, we would also like to increase the amount of such advertisements in the future. For example, if we could somehow measure what features of the advertisements are highly correlated with the attribution values, we could also make use of such features in the future to leverage the customers’ engagement. For this to happen, we built an AI algorithm on the AWS platform. The AI model takes the attribution values and the features of advertisements as input and it predicts the potential visits. This part will be continued in the next article in more detail — stay tuned!

[1] The sum of the removal effect does not necessarily have to equal 1.

--

--