Generalized additive models: when engineers try to shed light on marketing effectiveness

Published in

The Hands-on Advisors

8 min readJan 4, 2018

Since the dawn of marketing we’ve been wondering which half of the marketing budget is going to waste. It’s almost 2020 and we’re still stuck scrambling on the same issue in many organizations. What’s going on? We all know that it’s difficult to measure radio, print, out-of-home or television advertising effectiveness since we don’t have a digital feedback loop such as the ones in online display advertising where we can just follow the cookie/session -trail from impression to purchase.

One of the prominent problems is that many brands sort-of-outsource their marketing to agencies then responsible for metrics and effectiveness measurement as well. It does not take a nobel-winning scientist to sort out that it is beneficial for the agency to show numbers that put their efforts in good light. It just makes sense.

That said, it’s not all bad. Many agencies do econometrics and related modelling and try to optimize marketing budgets for the benefit of the brand. Agencies, however, are not data science and engineering powerhouses by any means and they might lack understanding on the brands’ own data assets and thus feature engineering which is critical for modelling success. More importantly, when you buy insight instead of building insight capabilities (see Jim Collins and the concept of “time tellers” versus “clock builders”), you tend to outsource understanding of your own function.

So, how to build marketing effectiveness insight capabilities instead of buying possibly imperfect insights time after time?

First off, you’d need proper data of your marketing investments and their reach on a granular level. Consider the following as examples:

# case 1: net/gross investments
{
 type: radio,
 media: some-media-brand,
 measure: net-investment,
 target_product: great-boot-sale-of-2017,
 tactical_vs_branding: tactical, 
 value_net: 550,
 value_gross: 550,
 date: 2017-12-30
},...# case 2: gross rating points (GRPs, a relative measure on how many people were approximately "affected" by an ad).
{
 type: radio,
 media: some-media-brand,
 measure: net-investment,
 target_product: great-boot-sale-of-2017,
 tactical_vs_branding: tactical, 
 grp: 40,
 date: 2017-12-30
},...# case 3: row-level impressions (available for digital channels, can be exported from eg. Adform).
{
 type: impression,
 media: adnetwork-123-some-media,
 timestamp: 2017-12-30 12:33:59,
 tactical_vs_branding: tactical, 
 bannersize: 600x400,
 bannertype: rich-media,
 target_product: great-boot-sale-of-2017,
 ...
},...

Going back to our main problem of measuring something without a straight digital feedback loop (like cases 1 and 2 with two different radio KPIs): we’ll need to find a way of approximating the effect of investments without a digital trail. This means finding a target metric and then fitting a model between that target and the explanatory variables, such as radio investments. Here it’s important to understand that the target metric might not be sales after all, but traffic to a website, traffic to stores or something else that’s not dependent on the performance of your sales pipelines. Marketing could be really effective but if your sales channels fail to convert, we have a problem. And sales pipeline effectiveness is not what we’re trying to model out right now (so, in terms of data science, there is indeed a difference between sales and marketing). With this in mind, let’s say we have the following target metrics:

# target metric 1: sales
{ 
 type: transaction,
 product: yellow-pair-of-boots
 productid: ABB1212,
 channel: online-web-1,
 productgroup: boots,
 timestamp: 2017-12-30 12:59:14,
 ...
}# target metric 2: traffic
{ 
 type: pageview,
 product: yellow-pair-of-boots
 productid: ABB1212,
 channel: online-web-1,
 productgroup: boots,
 timestamp: 2017-12-30 12:25:59,
 ...
}

We can measure both sales and traffic by the second, but our media GRPs / net/gross-investments are on a daily level so we’d need to group these by date resulting in a simple enough dataframe of date + traffic_pageviews + sales_transactions. After doing so, we should find what’s the right level of analysis in terms of media investments: brand vs tactical in general (all medias combined), different types of tactical in general (all medias combined), media vs media (e.g. radio vs TV), media + brand/tactical vs media + brand/tactical (radio tactical vs TV branding) or a custom hierarchy best suited for the brand’s needs. As an example, this could yield:

{
 media: radio_tactical,
 sum_of_gross_investments: 4300,
 date: 2017-12-30
},
{
 media: tv_tactical,
 sum_of_gross_investments: 4300,
 date: 2017-12-30
},
{
 media: tv_branding,
 sum_of_gross_investments: 7700,
 date: 2017-12-30
},
{
 media: radio_tactical,
 sum_of_gross_investments: 3400,
 date: 2017-12-29
},
....

Then merging with sales and traffic data and transposing the chosen level of media investment row-data, noted here as column “media” to multiple unique columns:

# an example in R, dcast transposes the chosen (after ~) column's rows to unique columns and uses sum as an aggregate function to populate the new columns.dcast(data, date + sales_transactions ~ media, sum, value.var = sum_of_gross_investments)

We now have a dataframe which has daily sales and the respective net investments in our media instruments. It would be possible to fit a linear model on the data but media investments are difficult since they follow a S type of curve meaning they have a point of diminishing returns and a start-up-phase where you spend a little but gain little or no sales. It might look a little like this (with generated data):

So, if y-axis is sales and x-axis is investment levels, you don’t get a lot of bang-for-the-buck on the highest investment levels but between 40% and 60% it performs well. This means that our largest investments in this type of media don’t add to sales as much as smaller investments. Furthermore, the smallest investments don’t really do anything to sales. So you’d want to spend a little more than what is your current average investment. We cannot fit this type of curve with linear models since the response curve is effectively non-linear. This is why we use generalized additive models to fit proper response curves and find the optimal investment levels.

The usual problem with GAM when related to marketing investments is that you need many unique datapoints to fit a curve. If your investment levels are always the same (daily spending on marketing is the same throughout a week, quarter, or even a year), there is no variance and thus we have no unique datapoints — we cannot fit an S curve which needs at least 3 unique points (you can adjust k parameter in GAM to work with data that has few unique points, but less than 3 will not work), preferably much more. To analyze optimal levels of marketing investment, you need different levels of marketing investment within your data. Given we have enough data points, fitting a GAM model on our dataset could be written as:

mG <- gam(data$sales_transactions ~ 
      s(data$radio_tactical, k=7) + 
      s(data$tv_tactical, k=7), 
      family = poisson, method = "REML")

This would fit two smoothers to our tactical radio and tv investments and try to represent their effect on sales by looking at their relative impact on sales by date, given different amounts of investment (unique datapoints, we restrict k to 7 here since we don’t have more across the range of different media investment types). We use a poisson family distribution since our sales seem to follow a poisson distribution when plotted and REML as smoothing parameter since it usually converges faster than GCV. The results might end up like this, when plotted:

Result 2: Radio tactical response curve.

The orange markers sitting on the bottom are unique factual investment levels. The curve is fitted by their effect on sales. From this we could interpret that using radio for tactical marketing seems to work but with really low levels (one third of average) of investment. Using TV for brand marketing seems to work better in terms of large investments but there is no exponential part in the curve — TV works, but it’s relative contribution to sales is limited (so you’d need to look at the investment size and figure out if it’s worth it or not).

To make things better (or worse), we should add more marketing investments and more explanatory values to our data in order to understand the effect of seasonality, weather, competitor investments and so forth. For easier data operations, let’s also rename media investment columns with a“mediainv_” prefix:

mG <- gam(data$sales_transactions ~ 
      data$weather_daily_precipation +
      data$seasonality_index +
      s(data$mediainv_instagram_branding, k=7) + 
      s(data$mediainv_instagram_tactical, k=7) +
      s(data$mediainv_facebook_tactical, k=7) + 
      s(data$mediainv_facebook_branding, k=7) +
      s(data$mediainv_display_advertising_branding, k=7) +
      s(data$mediainv_display_advertising_tactical, k=7) +
      s(data$mediainv_radio_branding, k=7) + 
      s(data$mediainv_tv_brading, k=7) +
      s(data$mediainv_radio_tactical, k=7) + 
      s(data$mediainv_tv_tactical, k=7), 
      family = poisson, method = "REML")

Now, when you start building this as a formula, you’ll quickly find out that it’s not particularly fun since the amount of columns might be hundreds (remember, right level of analysis could be really detailed to gain proper understanding, tactical vs branding per media is insufficient, we need more detail). Since you need to work out which columns need smoothers and which do not, and even which columns need different parameters, it makes building the formula programmatically hard. There are ways of achieving this but they are out of scope for this example.

For the marketeer, we want to make things as easy as we can. This means summarizing all of this to something that can be visualized by your tool of choice. Luckily, we can print out a dataframe that lets us interpret the results from GAM (notice we use the mediainv prefix to sort out columns of interest, see https://github.com/thomhopmans/themarketingtechnologist for a more in-depth example of plotting out GAM results):

summary_model      <- summary(mG)
model_coefficients <- summary_model$p.table
model_intercept    <- model_coefficients["(Intercept)", 1]
predicted_levels  <- predict(mG, type="lpmatrix")id = 1:nrow(predicted_levels)
data_x_full <- data.frame(id)
data_y_full <- data.frame(id)

for(iter_rs in unique(colnames(data[,grepl("mediainv",
                                 colnames(data))]))) 
{
  data_x_full[iter_rs]  <- data[,iter_rs]
  data_y_full[iter_rs]  <- predicted_levels[,grepl(iter_rs,
                           colnames(predicted_levels))] %*% 
                           coef(mG)[grepl(iter_rs, names(coef(mG)))]
}data_x_full_melted <- melt(data_x_full, id=c("id"))
data_y_full_melted <- melt(data_y_full, id=c("id"))
data_melted = data.frame(data_x_full_melted,
              data.frame(data_y_full_melted))data_melted = data_melted[c('id', 'variable', 'value', 'value.1')]
colnames(data_melted) <- c('id', 'var', 'MediaInv',
                         'AdditionalSales')

Pushing all different media investment levels to one graph will result in different axes values so you need to scale all values to, say, 0..1 scale or a -1..1 scale so that each media variable has a maximum of one and a minimum of zero so they can be plotted in one interpretable graph.

Finally, when we push this to your visualization tool of choice, we can create compelling insights on what are the optimal levels of investment for each media instrument and media type when given a task of optimizing sales (in general or for a particular product/product group), traffic or a combination of these.

Visualization of the results: when targeting sales on a particular product group, what are the corresponding media investment response curves and their optimal levels. Note that the graph has scaled values for both sales and investment levels.

Summing this up, we as engineers want to be clock builders instead of time tellers, it’s in our nature. It’s long overdue to push that same thinking to marketing measurement. This sort of approach is best suited for non-digital channels without feedback loops, but may be beneficial for digital channels as well, since we can measure effects of display advertising on brick-and-mortars and thus create a more holistic approach on marketing measurement than just basic attribution models.

Let’s create something brands can use for their own and what they can use to drive a more data-driven approach to marketing. It might just tell us which part of that marketing budget is wasted and what should we do about it.

Generalized additive models: when engineers try to shed light on marketing effectiveness

Written by Jarno Kartela