Simplified Technical Paper: ML-based Contextual Video Ads (on-demand & live)

How to do contextual video ads in live & on-demand content streaming (2020/2021)…

Shane Austrie
This Week In AdTech
5 min readJul 22, 2020

--

This Machine Learning system is based on what’s used at streaming companies/services like CBS Interactive and several other top ad-supported streaming platforms.

Companies like CBS Interactive (as well as their sibling companies) have shown an over 14% gain in ad engagement when using the predecessor of this system.

This system focuses on ensuring that video ads are relevant to the last scene and/or frame.

Additionally, this system can be used for both video-on-demand (VOD) content, as well as live-stream content (e.g. live baseball, breaking news, etc.).

Table of Contents:

* Offline Video Ad Pre-Processing

* Ensemble Hybrid Recommendation System

* On-Demand & Live Video Ad Serving

* Conclusion

For the sake of brevity, in order to keep this Simplified White Paper as a 5-minute quick read, we’ve simplified some of the ML terminologies and generalized the models.

Photo by Clarisse Croset on Unsplash

Offline Video Ad Pre-Processing

When an advertiser uploads an ad, it is specially encoded using an auto-trained/generative model (e.g. a modified version of GANs), in order to reduce the size and allow for frame-by-frame and scene-by-scene ML analysis.

The video ad being converted into compressed numerical data

At the end of each day (or it can be more frequent based on your business needs), an offline batch/Spark job uses a combination of advertiser data (pre-trained models produce this data on an advertiser-by-advertiser basis using various proprietary data sources), NLP (Natural Language Processing) models, and Video Analysis models to turn the encoded video ad into usable numerical data for live recommendations.

The encoded video data being used by both the Video Analysis Models and the NLP Models, then turned useable data that will be stored in a database for live production

The NLP models handle sentiment analysis, topic detection, and other language subtleties (double meanings, accents, sarcasm, etc).

The Video model does frame-by-frame (as well as scene-by-scene) entity detection, and multi-dimensional engaging story detection (what’s going on in the foreground of the scene, what’s happening on the background of the scene, and how “engaging” is each story).

Both of these models are trained using a combination of private company-specific data, as well as through Transfer Learning from open models.

There are also a few key post-processing steps that happen after each model is done analyzing, in order to *effectively* combine the two system outputs (i.e. NLP Result + Video Result).

*Generic Image

Ensemble Hybrid Recommendation System

For On-Demand Video Content: Several video ads are matched with various pieces of content, that is then filtered on a scene-by-scene basis via the heavy-duty version of the recommender system.

For Live Video Content: A large number of ads are matched based on the live video’s entertainment category, then is filtered on a frame-by-frame basis via our light-weight version of the recommender system.

*Somewhat Generic Image

Recommender System Overview (both On-Demand and Live):

For contextual video ads, a Forest-based Recommender Ecosystem should be used (several shallowly trained/generalized recommender systems, controlled by a head recommender system).

Note, this does not mean that the recommender systems are Tree-based, but that the overall architecture/voting process of this system is similar to that of an ensemble, or specifically, similar to models like Random Forest, Gradient Boosting Machine, and XGBoost.

*Generalized Recommender System: A recommender system to obtain a generalized video ad to video content compatibility score, as well as a video ad to scene/frame compatibility score. We’ll save the exact model used for this system for a different article.

For on-demand content, the final output of this model is a list of best content and scenes for a video ad. This list adjusts based on the popularity of a piece of content. We prefer content that is being viewed frequently, in order to give advertisers their ad views as soon as possible.

For live content, the final output of this model is a list of video ads for the current frame, allowing for instantly relevant video ads. If relevancy is flexible, this model can also output the best video ads for a scene (last ~45 seconds).

If the similarity score is below a certain threshold, and no video ads are compatible, we have the option to either show a default ad (e.g. an internal ad like how Hulu advertises their other original programming) or to simply not to show any ad for this time slot.

*User-Specific, Head/Master Recommender Model: This model is partially based on the Contextual Bandit ML model in order to offer high, user-level personalization (e.g. personalizing off the user’s most recent data (e.g. mood, recent viewing history, etc.)).

This model uses user data to filter down the results of the several general recommender models that it controls. Once the choices are filtered, this model will produce a list of video ads to show to the user — with the best one being the one that the user actually sees.

The recommender system choosing a video ad to place in the live content stream

Conclusion

For the sake of brevity, this article leaves out the exact infrastructural details that ensure speed and reliability — especially when working with live-stream content.

The infrastructure needed involves moving past simply using redundancy and pre-computed offline results; but instead, heavily utilizing both in-memory storage (e.g. Redis for handling millions of API requests per second) and a Functional Infrastructure architecture (e.g. Cloud-Based Infrastructure, Spark on Kubernetes, and Microservices).

Additionally, we chose not to cover the effective way to pre-process and post-process your content. However, these steps are necessary in order to ensure that the content is numerically compatible with the generalized recommender systems that produce the compatibility score.

Supplemental Material:

--

--

Shane Austrie
This Week In AdTech

Gen Z AdTech Expert | ML/AI Consultant | SiliconValleyConsulting.io | Casual writer about techy & non-techy things | Connect with me on LinkedIn!