Supply Path Optimization — TV Advertising

Published in

MiQ Tech and Analytics

6 min readMay 14, 2020

With the people moving away from traditional TV and the fact that almost two-thirds of households have a Connected or a Smart TV, e-Marketers predicts that over 80% of online ads will be video in the next couple of years. It is only a smart move by the advertisers to slowly shift the advertising budgets from Display to Video Advertising as the months pass by. However, with a lot of players namely DSP, SSP, publisher, re-seller, and heaps of third party vendors, the auction duplication is getting worse day by day resulting in a multiplexed path. In brief, it can be said that a single impression is being sold by numerous sellers at various prices in multiple paths. On average, an impression is available through 19.8 distinct supply paths. A wise path for a buyer is the one with low CPM (Cost per 1000 impressions) to obtain a better ROI. The question: Which route/path should the buyer choose and what should be the fair price to pay? has been staying put for a couple of years now.

The goal here is to find the optimized path in terms of cost without compromising on the quality of the auction and hence the term Supply Path Optimization came into the picture.

Now that we have discussed what exactly the problem statement is, let us get into the approach. It is important to clean and understand the data before going ahead with algorithms.

This is how the sample data looks like.

The sample data consists of performance details (impressions, starts, completes, cost spent) of each path.

Impression: The number of ads shown to the user.
Completes: The number of impressions for which the user watched 100% of the video ad.
Cost Spent: The cost spent by the advertiser to win a given number of impressions.
CPM: The cost spent per 1000 impressions to be delivered.

In the context of the above data, the intention is to choose a path that is least expensive (here CPM) yet gives a good performance (here completes).

Outlier treatment is applied to this data. There are several ways to apply outlier treatment depending on the problem statement and the structure of the data. As per our problem statement and business understanding, we planned to go ahead with removing the outliers only on the lower end of the distribution.

We’ve approached solving this problem using 2 algorithms.

Data Envelopment Analysis
Scoring based on Classification Modelling

Data Envelopment Analysis (DEA):

DEA is basically a non-parametric method used to rank the Unsupervised dataset of various entries by estimating the relative efficiencies. These efficiencies are calculated by comparing all the possible production frontiers of decision-making units (here supply paths).

Let’s get deeper into how DEA is used in the current example for the following data.

We have two inputs (impressions and cost spent) and one output (key performance metric — not disclosed here) to come up with an efficiency value against each supply path. Using the rDEA library in R, and with specified inputs and outputs, we obtain the relative efficiency ranging from 0 to 1 against each supply path.

The relative efficiencies (outputs) for each of the paths in the data looks like below. Efficiency values of paths (R & S) are calculated by taking a ratio of the distance between the projections on the frontier formed from the origin.

Scoring based on Classification Modelling:

As a statistical and machine learning hybrid, this Scoring method calculates the score against each supply path, helping us decide whether a path is worth bidding.

We use the output of DEA to flag each path. As per the business understanding of each path, the flagged dataset below is prepared based on efficiency values. This will act as a training dataset to the Scoring methodology.

Scoring methodology takes the importance of each player in a path. Each path is a combination of multiple components such as buyers, publishers, vendors, devices and resellers, etc. This scoring methodology breaks down the path and measures the importance of each player in a particular path. For the sake of simplicity, let’s assume each path is a combination of site domain, publisher and vendor in this example. Let’s discuss how the scoring is done.

Binning — Equal width binning is used to bucket the data, thus minimizing observational errors and avoiding overfitting. Original data is divided into intervals known as bins. All the continuous variables are bucketed into ranges using the flag values whereas categorical variables are bucketed into discrete values.
WOE and IV Calculation —

For each bin, we calculate the event and non-event rates using the flag column.
Event rate = # of paths with flag1 within a particular bin/total # of paths within a particular bin
Non Event rate = # of paths with flag0 within a particular bin/total # of paths within a particular bin.
Using the above formulas, once the event distribution and non-event distribution are calculated, we go ahead and calculate the weight of evidence and Information value.
Weight of Evidence fundamentally helps us measure the credibility of each bin. This transformation is based on the logarithmic value of distributions and aids in establishing a monotonic relationship between the independent and dependent variables.
Information value quickly tells us the best features that should be considered by measuring how well a feature is distinguishing the target variable from bad to good. As per business standards, the threshold of information value is typically 0.02. All the features with IV < 0.02 are considered weak predictors and hence, chucked out of the modeling process.

Classification Modelling using Logistic Regression — The data, with only picked features (from the above step) as inputs and flags as outputs, is inputted into the Logistic regression model to obtain the coefficients for each feature.
Logit values are then obtained for each bin by taking a product of WOE of the bin with the coefficient of a specific feature. Using logit values, we come up with odds and probabilities for each bin.
In order to set up score logic, we fix the range of scores and then calculate the factor and offset values to come up with a score using the following formula.

Score = (Logit + intercept/# of features) * factor + offset/# of features

Now that we have a score against each feature, we calculate the final score of the new path from the test dataset by summing all the feature scores in a path.

As per the business understanding, we have a specific score threshold which helps us in deciding whether a path is worth bidding. This scoring methodology was tested in a live campaign, resulting in an 8% reduction in cost spent otherwise which is pumping up a better ROI. In the future, we are planning to scope out the methodology for other campaigns such as Display, Instream, and Outstream Video campaigns.

Supply Path Optimization — TV Advertising

Written by Ujwala Musku