A Paradox in MMM Time Effects
Marketing is a highly noisy environment. Naturally, measuring the precise effect of marketing spend can be tricky. Being able to predict how much revenue will be generated from marketing is extremely valuable. With advanced Media Mix Modeling (MMM), we can use these predictions to optimize marketing budget allocation precisely to maximize profit.
In this blog post, we detail a paradox in the modeling of marketing spend vs. revenue for DTC brands. Although the data is presented as a time series, we find that the incorporation of time as a variable can be less effective in some cases. The number of factors that can influence revenue is large, and their cumulative effect can sometimes outweigh the influence of time-based effects. Imprecision in time-based assumptions can lead to less effective models. However, even when we do not use time to improve modeling accuracy, it is still possible to gain an intuition about the time effects at work. Ultimately, these learnings can help refine our assumptions regarding time-dependent features.
Data in Time != Time Series
When trying to model the direct impact of marketing, we sometimes find that time is not one of the top predictive features. The factors influencing marketing performance are many, and it is not uncommon for time features to be overshadowed by these. In these situations, time series models do not offer the best performance. Rather, regular regression models achieve much better results.
Many sources of variance and noise exist in the advertising landscape. When coalescing information from different media, we inevitably stumble upon different competitors, advertising strategies, audiences and more. Advertising channels make decisions on our behalf given these factors, which we cannot observe. These decisions have much more to do with the competitive landscape in the present moment than what happened yesterday or a week before. How this competitive landscape varies is entirely out of our control, and we cannot begin to model the factors that go into these decisions. Still, these processes can introduce more variance in the observed data than the predictive power of seasonality. In this situation, we cannot recover the time-based effect from the surrounding noise.
This is not to say that the variance introduced makes the prediction of return vs. spend impossible. Actually, modeling error is more than satisfactory considering all of this. What is important here is that the predictive power of time-based features can be obscured by all the noise produced by the advertisers. When choosing to include time as a feature, we should verify that it has a positive impact conclusively. Increased accuracy does not always imply using the right set of features.
Under specific circumstances, we could discern the effects of time on our revenue. If we can eliminate some of the noise or variance from our media mix, it should be easier to observe time effects. For instance, given a constant level of spending and schedule, time effects should appear more transparent and have more predictive power.
Assuming the scenario where variance clouds time effects, what emerges is that these uncontrollable effects are still predictable due to the law of large numbers. Given enough diversified examples and a varying spending strategy, the noise averages out, and it becomes possible to establish a temporal relationship between spend and return. The following sections present strategies that can help us grasp the time effects at hand.
Cross-Correlation Analysis
To assess the possibility of predicting in time, it is good to take a look at the cross-correlation of the revenue and spend series under consideration. Essentially, we are trying to see if historical spends impact future revenue in discernible patterns. Let’s look at the cross-correlation of spend and return. Cross-correlation is simply the correlation between two variables, shifting one of the variables in time. The correlation is measured at different lags. For instance, if we want to see how spend affects return over time, we could compute the cross-correlation over 14 days by shifting the spend time series forward and measuring correlation for each of the 14 lags. We might see that return is correlated with variables from specific preceding days.
Here, we are mostly interested in comparing return and spend. Specifically, we are trying to understand how much return a certain spend might generate, not only the day of the spend, but further in the future. This concept is known as adstock. The impact of marketing is not only immediate, and the effect builds up over time as customers are repeatedly exposed to ads. Logically, we should be able to attribute a portion of a day’s revenue to past spends. If there is an effect of spend on future earnings, it should be noticeable when observing the cross-correlation between spend and return.
If it is possible to predict adstock, we should be able to see it from the cross-correlation analysis. For instance, if adstock decays rapidly, so should the cross-correlation values. Likewise, if we are analyzing the effect of sending promotional emails, we should expect the cross-correlation to rise with time and then fall down. When plotting cross-correlation, we should expect the shape of the plot to be similar to what the adstock curve would be.
It is important to look at autocorrelation of spend and return to verify that we are not misattributing the observed effect. Autocorrelation informs us about the dependence of a variable upon itself. It is the same thing as calculating cross-correlation, but we simply use the same time series for both variables. We can use autocorrelation to see if seasonality exists in spend or return. Ideally, to observe adstock, there should not be seasonality in either spend or return. We can ensure this by verifying that the autocorrelation for spend and for return are mostly flat.
We need to be cognizant of the fact that a marketing strategy might interfere with this analysis. If spend is modulated on a schedule, then the results should be caveated because the difference in spend will influence cross-correlation with return.
Adstock Modeling
Although cross-correlation can give us an idea about the temporal relationship between spend and return, it would be great to have a definitive curve to model the decay of return coming from a specific spend. To give us this insight, we use the following approach. We build a model representing return as a sum of discounted spend from previous days.
To model the decay of a certain spend, the Log-Cauchy probability density function can be used. This family of curves is great to model many different types of advertising mediums. We do not want to limit our assumption about return to be in a decaying form. Some mediums might initially produce very little revenue, grow it over time and then decay. For instance, sending out direct mail might produce close to no effect for the first few days as people just haven’t gotten to it. For that reason, the Log-Cauchy curves are perfect for our use case as they allow anything from strictly decreasing, to strictly increasing and many other behaviours in between.
Log-Cauchy pdfs have two parameters, and we are trying to find the two parameters that will best fit the shape of the decay. To fit these curves, we create a model where the return is estimated by summing this curve for each of the previous n points, at a corresponding point on the curve. We limit ourselves to the (0–1] range. If we want to use n=14, the total sum is calculated as follows: today’s spend times the curve at x=0, the previous day’s spend times the curve at x=1/14…
We can define the model mathematically as follows:
We can use an optimizer to find parameters sigma, mu and k, by minimizing the objective function, in this case the mean squared error. This approach yields good results, consistently producing models with 15–20% MAPE. Considering that we are only trying to understand how return evolves given a certain spend and not predict exactly what return will be, these results are acceptable. With this method, we can estimate the expected timeline for starting to see an effect from a marketing medium.
It should be noted that there are a lot of curves that will produce a relatively good error, but will still be completely wrong. Let’s consider a time series where spend never varies. There exists an infinite number of ways to combine these values to obtain the same predicted value. With this methodology, autocorrelation can be used to validate our results.
VAR Model as a Test
Finally, to validate the intuitions that were observed, we can train a Vector Autoregression (VAR) model and see that the fitted coefficients match the curves found. A VAR is a multivariate time series model that can predict any of the variables involved from any of the variables. For instance, the VAR trained on spend and return can predict return from past spend and return.
We can easily fit a VAR model on our time series data. This will find the optimal coefficients to apply to past variables to give the best approximation of a current one. This model is prone to overfitting, which we want to avoid here. It might get caught up trying to predict the variance in the data instead of predicting the actual observed value. For our use case, we would ideally want a potentially under-fit model, because we want to generalize and not be dependent on specific changes in the data. We are trying to find a very general curve that describes revenue decay, not the exact value of a variable at any point in the series.
Still, the VAR model does a good job at modeling the decay in return from past spend. Through the learned coefficients, we can observe this behaviour. If we have a strictly decaying adstock curve, we should expect to see the same shape in the coefficients from the VAR model. The same would be true for other shapes of adstock curves.
Conclusion
In this blog post, we exposed a paradox in time series modeling. Although our data is represented in time and we intuitively know that time effects are present in marketing, we find that in some situations, time-based features cannot be used to improve modelling. Although we might not be using time as a feature, we can still study the time effects expressed in the data. Using cross-correlation and simple models, we can uncover how our expected revenue is tied to a sustained marketing effort across many channels.