Recommending Configuration Parameters for setting Contextual Ad Targeting

Manish Pathak
MiQ Tech and Analytics
6 min readFeb 11, 2021

Introduction

In our last blog, we discussed how we are leveraging the Contextual strategies to optimize the online advertising KPIs like CPA and CVR using a myriad of data sources and intelligence built on top of them. That was only the tip of the iceberg i.e. a very small part of the entire ecosystem of the product offering. Apart from the backend algorithm and model deployment, the Contextual targeting strategy also has a UI component from where the stakeholders of the product configure their campaigns.

In order to set up a Contextual targeting strategy that renders a segment or a programmable tree, the user must provide a bunch of details related to the campaign as seen in the picture below. The main parameters to configure are the following:

  • Country: The region where the campaign will run with advanced geo option to target by State /City, Postcode, etc.
  • Advertiser: The advertiser for which the Contextual targeting campaign shall be running.
  • Insertion Order ID: This is the ID of a contract for an advertiser. An Insertion order has multiple Line Items which may be several different strategies by which the user wants to do targeting.
  • Seed segment: This is an ID corresponding to the pool of converted user’s information on top of which the inventory gets fetched.
  • Viewability Threshold: The minimum expected viewability percent.
  • Max bid: This value ensures that the clearing CPMs do not exceed the input value.
  • Expected Daily Impressions: This value gets utilized in the internal working of the algorithm, which then does optimizations such that it tries to clear as many impressions as provided here.
Fig. Lab Interface

The user can set up to 4 configurations/clusters in one request. The first configuration or cluster will render a contextual tree/segment which is a high performance and low scale, and the last configuration will render a tree/segment which is high scale and low performance. The Expected Daily Impressions can be set to any value as required by the user and has default hard-coded recommendations of 50000, 200000 impressions, etc. Since this number is a bit arbitrary, it sometimes ends up impacting the underlying working of the algorithm. Our goal is to make these default recommendations dynamic such that whenever a user tries to configure these segments, he/she can get a recommended value beforehand.

Data

We treated this problem as a regression problem where our response variable (Y) is Expected Daily Impressions. The predictors (X) that were readily available to us comprised of the information that we can extract out of the configuration settings. For example, the Country setting gave us the region in which the Contextual targeting setting will be put up, the Advanced Geo-Filter gave us the binary flags indicating whether geography filter is applied at a post-code, DMA, or City level. The Advertiser column gave us some information about the type of advertiser and using other data sources, we also fetched the sub-type of the advertiser indicating whether it belongs to finance, banking, technology, etc. sector. The seed segment gives us the number of users present in that segment. Max bid and IVR for each cluster along with the cluster number (1 to 4) were also available.

Apart from these features we also used the daily dollar budget amount allocated for these segments/trees to deliver. Since there are other strategies delivering in an Insertion Order, we utilized the Insertion order’s (except the Contextual strategy) average (over time) performance metrics like CVR, CPA, CPM, spends, conversions and impressions. The cleaned data were aggregated for the last 15-day window period that had 202 live Contextual strategy campaigns.

Here’s a dummy feature vector of the data for a cluster in one request:

Fig. Data Snapshot

Model Building

We tried a plethora of regressors with RMSE as the evaluation metric to choose the best model.

Since tree-based regressors are known to outperform other regressors as they can capture non-linear relationships and are robust to outliers and correlation in the data, we built out a baseline model on a simple default Xgboost Model with 75 percent data points in the train set and rest 25 percent in the development set. We also tried LightGBM and Random Forest models apart from XgBoost. Boosting models often outperform other tree-based models as they combine a set of weak learners and delivers improved prediction accuracy sequentially by correcting the mistakes made in previous iterations.

The categorical encoding used in the baseline model was one-hot encoding. Apart from the increased dimensionality that one-hot brought to the table, it was a challenge to encode un-common categories between the train and development set; hence it was imperative to use some other approach to handle such cases. In theory, there are several ways to handle this case like hash-encoding, embeddings, or some other custom encoding technique, we decide to go ahead to using advanced tree-based regressor CatBoost, which directly handle categorical data internally.

Fig. CatBoost train-dev RMSE convergence for one of the runs

In order to improve the model’s performance, we introduced Grid-Search tuning over several hyperparameters like iterations, depth, learning rate, etc. To control the overfitting of the model we also added regularization to the model and tuned it as one of the hyperparameters. We saw a gradual improvement in the model’s performance after performing these steps with train and dev errors lingering close to each other, avoiding any overfitting. The CatBoost Model was able to perform better than other models that were tried in all the iterations. The Dev-Set RMSE was around 19554 which is almost 13% of the 95th percentile value of the response variable.

Fig. Model Results

Validation

Apart from the machine learning model metric, we validated this approach with the old default recommendations as well. We compared whether our recommended value was less than the actual value historically for a set of 30 campaigns. This exercise was also repeated with the old default recommended values. We found that while the default recommendation was able to meet the success criteria in 30 percent of the cases the new approach met the criteria in 55 percent of the cases, confirming that this approach has improved our estimates of daily impressions which can be used as the input to feed in the algorithm before configuring a contextual targeting segment or tree.

Future Work

Estimating a configuration parameter before setting up the strategy is a tricky task as we don’t have the strategies’ performance data. But using other information we were able to build a good model that was able to beat the default hard-coded values that we provide to the user. As part of future work, we will try to improve the model’s performance further and add more powerful predictors to better estimate the configuration parameters. The same approach can be extended as a chained Machine Learning problem where apart from daily impressions, we try to predict the Max bid also to the user so that we get rid of arbitrary values that go into the algorithm.

--

--