Data Science in AdTech Industry (Real Time Bidding with Demand Side Platform)

8 min readJun 23, 2024

The Ad Tech industry was valued at approximately $987.52 billion in 2023, and it is rapidly expanding. Demand Side Platform has been one of the most integral and interesting part in this AdTech industry. Significant developments have been made in this sector. In this article, we are going to demystify how this Demand Side Platform (DSP) works and how Machine Learning leverages the process of bidding via DSP. So let’s move ahead.

How the Advertising Industry Works (A demand side platform perspective):

Imagine you are searching for a gaming console you want to buy. You search for it on Google, and suddenly, you see numerous ads appearing in the search results. This way you get connected with the potential sellers via your browser who have got just what you need.

This process of connecting you to sellers is much more complex than it seems. These advertisements reach you through a real-time bidding process, which will be explained in this article. Additionally, in this article, we will delve into how machine learning helps optimize the process of connecting advertisers with their potential clients.

The aforementioned use case is just one way of displaying ads and connecting advertisers to their audience. Other methods include showing video ads and display ads on websites and blogs. Sometimes, it involves branding a company rather than promoting a specific product.

The brands and product ads that appear on your screen depends on several factors such as the user’s browsing history, location, environment, and more.

Real-Time Bidding (RTB):

When a user opens a browser or searches for a product, a unique user ID, along with some useful information about the user, is sent to a virtual auction house. This auction house then sends the information to different demand-side platforms (DSPs) to find advertisers interested in displaying an ad to that user.

The demand-side platforms analyze the user information and make bid offers based on the data. The highest bidder wins the right to display an ad to the specific user.

Interestingly, the winning DSP pays the second-highest auction bid instead of their original bid. This mechanism is known as a second-price auction and ensures fair pricing.

Once an advertiser wins the bid, they can decide which ad to display to the targeted user.

A critical factor in this process is the time taken. The response time in an RTB process is typically 150 milliseconds, meaning this entire process happens very quickly. Consequently, bidders must come up with a bid in a matter of milliseconds to win the auction for their targeted user.

Machine Learning in Real-Time Bidding

Now comes the curious case of where Machine Learning and Data Science fit into Real-Time Bidding (RTB).

To put it simply, Machine Learning optimizes the RTB process by providing cost-effective and time-efficient solutions, which save time and resources for advertising companies.

When a user enters a browser, a Demand-Side Platform (DSP) can use Machine Learning and Data Science to:

Determine Whether to Bid on the User or Not:

First, the system needs to decide if the user is a potential customer worth bidding on. A Machine Learning model, equipped with various filters, can help make this determination. For example, if our clients want to target users from the “USA” or only show ads to users “aged 18 and above”, the bid process initially undergoes these filters. Machine Learning models assess these aspects to determine the relevance of the user for the ad.

Note: Sometime filters are enough for this process in order to transfer the user information forward to the ad models.

Determine the Bid Amount:

Once it’s established that a certain user is worth bidding on, the next step is to decide how much to bid. This involves calculating the actual value of the user.

As mentioned earlier, in a second-price auction, the highest bidder pays the amount of the second-highest bid. Therefore, it’s crucial to determine if the bid is profitable.

For each user, there is a true value (i.e., the actual value of the user), which is derived from past user behavior, such as click-through rates (CTR), conversion rates, purchases, app installs, video completions, previous browsing data, and user look-alike segments. Understanding this true value is essential.

For instance, there are different scenarios where a bid can result in a profit, loss, or missed opportunity:

Profit: When the bid amount is less than the true value of the user, resulting in a positive return on investment (ROI).

Loss: When the bid amount exceeds the true value, leading to a negative ROI.

Missed Profit: When no bid is placed, but the user had a high true value, resulting in a missed opportunity for profit.

Machine Learning models help in making these decisions accurately and quickly, ensuring that the bidding process is both efficient and effective.

Determining the Best Ad to Recommend to the User

Once a decision has been made to bid on a user, the next step is to determine which ad to recommend. This process can sometimes be integrated along with the process of deducing the true value of the user, but often it is treated as a separate problem. After the user information has passed through the initial filters and models, and it has been decided to bid on the user, the information is then processed by various ad models deployed for each advertisement.

Each ad model calculates a probability representing how compatible the user is with each advertisement. These probabilities help in selecting the most relevant ad for the user. The decision-making process involves analyzing several features and input data points such as:

Environment
User, look-alikes
Video/Assets
Derived features
Deliver Insights
User feature matrix for recommendations
User browsing history

An instance can be as follow:

Date: 20160320
Hour: 14
Weekday: 7
IP Address: 119.163.222.*
Region: England
City: London
Ad Exchange: Google
Domain: yahoo.com
URL: yahoo.com/xyz.html
Operating System: Windows
Browser: Chrome
Ad Size: 300 * 250
Ad ID: a9604
User Tags: Sports, Electronics
User Feature Matrix: [3.7, 3.2, 0.6, 0.1, 10]

Model Targets:

The ad models aim to optimize various target variables, which can include:

Funnel of Engagement Metrics: These metrics track user engagement with the ad, such as:

Delivered Views: Number of times the ad is delivered.
Completed Views: Number of times the ad is viewed completely.

Client-Specific KPIs: These are key performance indicators specific to the client’s goals, such as:

Click-Through Rates (CTR): Ensuring users click on the ad.
Shares: Ensuring users share the ad.

These models are designed to optimize for each of the above target variables.

Segmentation by Vertical: Ads can be segmented based on different verticals, such as:

Retail: Ads related to consumer goods.
Travel: Ads related to travel services.
Technology: Ads related to gadgets and software.

By segmenting and optimizing each type of ad, Machine Learning models ensure that the most relevant and effective advertisements are shown to users.

Ad Selection Models:

The models to obtain the probability of compatibility to a user and determining the optimal bid price for a user can vary from traditional Machine Learning models such as Logistic Regression, Bagging, Boosting, Support Vector Machines to Deep Learning models such as neural network and deep reinforcement techniques.

Recent research papers suggest that deep reinforcement learning techniques such as Markov Decision Process can be a state-of-the-art technique for determining optimal bid price. Here is a paper for it: https://arxiv.org/abs/2305.04889 . I will be adding the paper review soon.

Here is a very important link I found for different research on model selection in various areas of AdTech. It covers research papers from real-time bid price optimization to user ad recommendations (CTR/CVR Estimation) in DSPs. Additionally, it includes papers focusing on other aspects not covered in this article. Here is the link:

GitHub - wnzhang/rtb-papers: A collection of research and survey papers of real-time bidding (RTB)…

A collection of research and survey papers of real-time bidding (RTB) based display advertising techniques. …

github.com

While I will delve into specific models used in Demand Side Platforms in later articles, a crucial takeaway from selecting models for the bidding process is the speed-accuracy tradeoff. Real-time bidding demands an average response time of 100ms-150ms, requiring bid requests to be processed in milliseconds. Thus, our models must prioritize both speed and accuracy for rapid inference. This necessitates careful consideration of the speed-accuracy tradeoff when choosing the most suitable model.

There is a blog published by google that uses user-click dataset from another company called Criteo, to compare linear models to complex deep neural networks.

The comparison reveals that while deep neural networks offer slightly higher accuracy, they come with significant speed trade-offs. Linear models, which take approximately 70 minutes to train, enable faster inferences due to their simpler structure. On the other hand, neural networks involve high training and inference complexities.

If we take speed into consideration, then we’ll probably prefer linear models on neural networks. However, it’s important to note that the blog dates back to 2017, and since then, there have been substantial advancements in AdTech models. Today, deep neural networks are increasingly viable for predictions owing to their superior accuracy, despite their initial complexity.

Other impact areas in Demand Side Platforms (DSPs):

Covariate Shifts and Segment Retargeting

Over time, data evolves with changing trends and patterns. For instance, in the gaming console market, consumer preferences have shifted from Xbox-360 and PS4 to the newer PS5. This seasonal change in data, known as covariate shifts, necessitates adapting our models to stay relevant. We often need to retarget specific segments to effectively address these shifts, ensuring our models remain accurate and responsive to current trends.

Micro versus Macro Goals

Once ad models predict a user’s likelihood of interaction with an ad, it’s not always the Supply Side Platform (SSP) that proceeds to display the ad. Sometimes, there are macro goals dictated by client requirements. Despite the predicted probability, clients may insist on advertising based on different criteria. Therefore, SSPs must balance these macro goals alongside micro goals of optimizing user-advertiser interactions cost-effectively.

Next Steps:

In the next installments of this series on Machine Learning in the AdTech industry, before delving into Supply Side Platforms (SSPs), we’ll focus on model selection for predicting true values and making ad recommendations. We’ll explore various state-of-the-art models and delve into their architectures. Once we’ve covered the essential models, we can move into the production side as well.

In that, I will demonstrate how tools like Kafka, AWS, and Spark leverages us to build an AdTech pipeline for optimal streaming, data storage and inference. Then finally we can move onto the other aspects of AdTech.