The New Age of TV Advertising — Bringing Performance Marketing to TV with the Power of Deep Learning

Tech@ProSiebenSat.1
ProSiebenSat.1 Tech Blog

--

by Sebastian Fetz and Dr. Chi-Ju Wu

Welcome to our series about AI in TV advertising. We are members of ProSiebenSat.1 Digital Data, a team of 40 AI, Big Data and Cloud specialists that work on our cloud-based ML / DL Platform with daily incoming data of 300 GB. We have a passion about AI-driven digital products and will give you some insights into the work we do. In this blog posts and following articles of this series, we will walk you through one of our core projects called “TopSpot”.

Motivation

Advertisers are accustomed to performance marketing in online advertising. Their target audience doesn’t get advertisements randomly. Instead, consumers are chosen based on their characteristics and preferences that make them very likely to react towards a predefined business goal (e.g. clicks, purchases). Advertisers value this approach not only because they get better results but also, they can monitor and compare the performance of every campaign. In contrast, TV is mainly used to reach a large audience. This is tremendously valuable for advertisers: there is hardly any other marketing channel where they get a rich audio-visual advertisement to an audience of this size. The question we asked ourselves was: can we combine the best of both worlds and bring performance marketing to TV to make it even more attractive?

There are several ways how this can be achieved. One way is to use Addressable TV to target specific audiences. This is a promising approach but still needs to grow because not all TVs are available for Addressable TV solutions. We thus developed a different approach with TopSpot. The goal is to bring AI driven performance marketing to TV in general — not only addressable but also linear TV. There are two main steps that are necessary to achieve that goal:

  1. Measuring the advertising effectiveness (visits, purchases) per TV spot on a data driven basis.
  2. Using Deep Learning to predict the highest ROI slots for a TV spot to increase performance.

Measuring Advertising Effectiveness — From Reach Measurement to Deterministic Attribution

The first step to bringing performance marketing to TV is to measure the effectiveness of TV spots. Historically this has been limited to measuring reach, i.e. how many people watched the TV spot. Media companies relied on traditional market research methods such as building household panels. This involved regularly measuring the viewing behavior of a representative sample of households using technical aids, and additionally interviewing household members in writing or verbally. This model, which was developed in the USA in the 1950s, is still used successfully today to calculate ratings and reach of TV programming and thus to determine the price of advertising space (for an overview of further developments in audience measurement, see egta insight 2020, “Advances in Hybrid TV Audience Measurement”. Nevertheless, it doesn’t provide any insights about actual consumer reactions (visits to website or purchases).

With the emergence of the internet, new possibilities arose to not only measure reach but also reactions of consumers to TV advertising like visits and purchases. Mainly online companies can track their web traffic and generate a baseline of user visits and orders before each TV spot was aired. In a short time window (minutes) after the spot the visits and orders that exceeded the baseline could be tracked and attributed to the TV spot. This approach extended the possibility to measure TV effectives from reach to reactions but came with a downside: long-term effects (over days) couldn’t be measured.

To measure long-term effects, one must move away from aggregated data in the direction of individual deterministic measurements. Specifically, two data sources are of interest:

  1. Which household did see which ad on a smart TV?
  2. And did the viewers in this household visit the website of the advertiser and purchase something?

Fortunately, we can make use of a new data source: our Cross Device Bridge. It allows us to map Smart TVs and digital devices (e.g. smartphone, laptop) within a household based on pseudonymous data. This makes it possible to determine which household watched a specific ad and if one or more digital devices went to the website of the advertiser afterwards.

Based on this, household journeys are created. These consist of all the TV ads that were shown on a Smart TV in a household in a specific time window (e.g. 10 days) and a reaction (visit or purchase) or no reaction (no visits or purchase) at the end of the journey. To determine the aggregated effect a single TV ad has on all household journeys, a stochastic attribution model called Markov Chain is used. In contrast to standard attribution approaches, which allocate the visits and purchases of users to a certain position of the journey (e.g. last ad seen gets the full attribution), the Markov Chain uses a data driven approach and calculates the removal effect, i.e. the change of the amount of visits and purchases if this TV spot has no impact.

With this approach it is not only possible to reduce the attribution noise (wrongly attributed visits/purchases) because the TV spot was exposed to the household and someone in the household actually visited the website of the advertiser. But it is also possible to track the devices over a longer period of time (e.g. days or weeks) and derive long term TV advertising effects each TV spot has. With the first step of calculating the performance of TV spots being taken, we can move on to improving performance, meaning allocating TV spots to the best time and channel.

Optimizing TV Advertising Effectiveness & Testing it in Reality

Optimizing performance can be done in two steps: measurement and prediction. Measurement is done by the data-driven approach mentioned above. The stochastic model helps us to determine which channels, time slots, weekdays or contents work best in the history, and we assume these patterns are similar in the future. The historically aired spots with self-determined scores are fed into a neural network algorithm to perform the prediction.

The prediction algorithm is designed with a goal of predicting the reactions (either visits or purchases) to the website of a certain advertiser who potentially books ad blocks in the next six weeks. With almost hundred features enabled, including weather, surrounding programs, and genres, the algorithm learns the hidden patterns through multiple hidden layers and hundreds of thousands of parameters. It is also tested on various data sets. Next, among all the models we train each time, the one with the best performance on the test set is picked. “Best performance” is an objective KPI that depends on the use cases. Since every advertiser can only book a small proportion of the available advertising blocks with a certain budget, the model with the correctly chosen group of ad blocks is more valuable than the one with least deviations in absolute numbers. Therefore, even though the neural network algorithms are trained on MAE[1], the best model is determined with the highest NDCG[2].

The model is constantly monitored and automatically retrained on two occasions: when more data is available and when the performance falls below a certain threshold. Even though the results on historic data are promising, every AI product needs to be validated in reality. Therefore, the model is currently tested (Q1/Q2 of 2021).

Conclusion

This blog post showed that it is possible to bring performance marketing to TV and how this can be done. In the next blog post, we will go more into detail on the different parts of TopSpot starting with data-driven attribution.

We would also like to thank several team members who participate in developing the project: especially Richard Seitz, Manuel Jockenhöfer, Dr. Rostyslav Shevchenko and Manuel Heller.

[1] Mean Absolute Error: average absolute deviation of the prediction (e.g. predicted visits) from the actual value.

[2] Normalized Discounted Cumulative Gain: normalized measurement of ranking quality which takes both position and relevance into account.

--

--