Neural networks for marketing analytics: Predicting campaign performance

aiden.ai
aiden.ai
Published in
5 min readMar 5, 2019

Written by Jose Luis Ricon Fernandez de la Puente — Machine learning engineer at aiden.ai.

At Aiden.ai we are building an AI-powered marketing analyst that automates time-consuming optimizations and produces improvement recommendations for busy marketers to proactively improve ROI. Aiden works across multiple platforms such as Facebook, Snapchat, Twitter, Apple Search Ads, and Google Ads.

For a marketer, being able to forecast the consequences of their actions would bring massive value. Imagine being able to know: what if I were to spend 20% more this week? What would happen, and how should I spread this incremental spend?

A few months ago, we studied the possibility of using neural networks to model the performance of marketing ad sets to be able to both forecast performance and estimate what performance improvements our suggested changes would lead to.

Marketing campaigns are ultimately composed of individual ads, but within campaigns, these are grouped into ad sets. Groups of ads that have some commonalities. We decided to focus on the ad set level, as predicting individual ad performance is more challenging: ad set metrics tend to be more stable over time compared to those of ads.

For this test, we used data from a single channel and multiple clients, and designed a neural network architecture with two predictions “heads”: one for spend and the other for a key metric that we cared about in this case. This way we could utilize a single model to model both metrics.

To train the model, we acquired both performance data (impressions, clicks, reach) as well as campaign set setup data, such as audience, or budget. Our training set comprised ~47k rows, with ~1.5k for our test set. Our initial goal was to be able to predict next week’s metrics given the previous weeks.

The features we used could be divided into continuous variables (that can take numerical values in a wide range), and categorical (that can take only value from a specific set). An example of such a case was the gender of the targeted audience or the ID of the campaign that the ad sets belonged to. One cannot directly feed this into a model, so in our processing pipeline, we split the features into two as shown in the diagram below:

The categorical layers go through an embedding layer, an approach pioneered by Guo and Berkahn (2016), to transform each categorical value into an N-dimensional numerical representation that could be processed by the network. This way, if we had a feature that was the “os version” of a mobile device and we mapped it into two dimensions, we could find that Android and iOS versions appear clustered together, meaning that the network was able to infer on its own that Android versions have more to do with each other than with iOS versions. To select the number of embeddings for each variable, we calculated the square root of the number of unique variables for that feature.

The neural network itself had three dense layers (that’s the “head block” in the diagram) with 120 units each and two heads with 60 units each. In between dense layers, we inserted batch normalization layers to aid convergence, and dropout layers to avoid overfitting. In addition to that, we also placed batch normalization layers immediately after the continuous variable input. This way, the network would normalize the variables directly instead of having to normalize them manually, record the mean and standard deviation of the training set features, and then normalize the test set features by those same parameters.

Finally, even before attempting to train the model, we considered a simple baseline to assess if our model was picking up something above and beyond a simple rule; in this case, the baseline was to predict that next week’s metrics were equal to this week’s metrics. This “naive prediction” actually achieved decent results.

As one can see, spend — and our second metric too — varied wildly in magnitude, and this had implications for the kind of loss function that should be used. The most common loss function, RMSE, would penalize larger errors more, which may lead the model to care too little about smaller campaigns. If the goal is to model aggregate spend, this does make sense, but if one prefers to reduce the relative error, then one has to use a different loss function like MAPE that accounts for relative error. The final loss function that we implemented due to its satisfactory performance was RMSE, but we applied a logarithmic transformation to the data first to soften the impact of larger campaigns on the training process; this way, we still cared more about those campaigns, but not as much as we would have otherwise.

We were able to do better than the baseline model, and we checked the error for different campaign sizes. We achieved a substantial reduction in average error for the smaller campaigns, and while our model was marginally better than the naive forecast, we were not able to produce forecasts that were good enough for operational purposes. For larger campaigns our average error rate was still 50%, so for a campaign where metric X would have a value of 2000 next week on average (considering the absolute magnitude of the error), we may predict 1000 or 3000.

The results were encouraging, but not good enough to put into production. We, of course, had already tried different neural network architectures and hyper-parameters, so we then turned to other reasons: Perhaps we had to account for holidays in different countries, but we looked into the data manually to try to find commonalities and patterns in ad set data to aid our model design and no obvious pattern that the network should have picked up emerged from our analysis. It may have also changed in individual creatives -the images that appear in an ad- driving the changes, which we had not included in our dataset; processing images using deep neural nets (in particular, CNN's) was possible, but it adds additional complexity.

Since these initial findings, we have augmented Aiden’s data based capabilities with a hybrid approach, leveraging the knowledge of our in-house marketing experts and applying it at scale, and using user feedback to fine-tune the parameters of the model. With this approach, Aiden is able to provide anomaly detection, budget allocation recommendations, and more, at scale.

A screenshot of our platform where marketers receive daily insights on their data, that they can implement in a single click.

If you think this article is interesting, please don’t hesitate to recommend it by clicking the 👏 button below. To know more about Aiden: www.aiden.ai

--

--