How to properly handle Contextual Effects (Trend, Seasonality, Holidays…) in Marketing Mix Modeling using Prophet?

Published in

fifty-five | Data Science

8 min readOct 3, 2023

A description of the subtle role of Contextual Effects in MMM, with a guided approach to optimally handle them, and access to the code.

Introduction

When using a classical approach of Marketing Mix Modeling (MMM), the idea is to decompose a target signal — for instance, a company’s turnover — into different regressors that can, when aggregated altogether, explain the target curve. We can deal with two types of regressors:

Contextual regressors, which do not directly depend on the company’s actions, e.g. the natural trend and seasonality of the sales.
Marketing regressors, which are usually directly controlled by the company, e.g. its investments on a specific marketing touchpoint through time.

To properly handle all these variables, we usually work in two successive steps:

First, we gather contextual information from the decomposition of the target signal. Usually, this will be done by decomposing the signal into its trend, seasonalities, potential custom effects, and noise.
Then, we decompose the same target signal, using linear regression, through all contextual regressors (not counting noise), and all appropriately created marketing regressors (typically by applying some adstock and saturation over the investment curves).

In this article, we will focus on the first step, and present all of the critical aspects of contextual analysis (which are often neglected) to ensure that the Marketing Mix Model will retrieve relevant results.

We will show examples of these theoretical concepts using the library Prophet (from Meta), and share the code to implement this methodology in our public repository → → → here ← ← ←.

What is at stake

Extracting contextual factors from the target signal comes with its pros and cons. On the one hand, it is extremely effective, as they will directly translate how context impacts sales. But on the other hand, it is harder to make sure that we only attribute the actual context to these contextual effects.

For example, let’s say that a company’s marketing strategy concerning TV commercials has always been to invest more during the high season (for instance the summer) and less during the low season (for instance the winter). In this case, it becomes much harder to differentiate (from a regression standpoint) the impact of TV marketing investments from the natural (contextual) seasonality of the curve. Consequently, the final regression will have very correlated regressors, and there is a risk that our model will either underestimate or overestimate the impact of these marketing investments on this company’s results.

Our objective is therefore to gather and isolate all contextual info, while ensuring that we “allow the marketing effects” to explain their actual impact during the final regression. In other words, we want to capture as much contextual information as possible, but not one bit more!

The path to success:

In an ideal world, we would have a lot of variability present in the past investments through time to differentiate context and media effects, but it is not always (even rarely) the case. However, there are a few tips that can be used to go around this problem as efficiently as possible.

These tips can be separated into two parts, which we will dive into in the rest of this article:

First, we can help the Prophet decomposition by giving it as many insightful particular effects as possible, as precisely described as possible, to ensure that the trend as seasonalities signals will only have to focus on their actual insights.
Then, during decomposition, we will make sure to give as much power as possible to all of the other effects, while restraining the variability of both trend and seasonalities.

Particular effects

We will consider all contextual factors outside of the trend and the different seasonalities as particular effects. We could have, for instance,

Some recurring events that have an impact on the target (school holidays, special holidays, football world cup final or Superbowl, new years, Thanksgiving, etc.).
Some exceptional effects (covid lockdown, covid postlockdown boost, inflation, olympic games in a country, economic crisis, riots, war, etc).

In the Prophet library, it is possible to add most of these regressors very easily by using some of the class’s built-in functions. In fact, if you gave Prophet a series of dates that describe an event’s occurrences throughout the years, it would automatically generate a regressor taking the value 1 when the event occurs, and 0 otherwise. Unfortunately, this can oversimplify the reality of the impact of these events:

The impact of one event has to appear only during the time step of the given event. However, in many cases, an event can impact a range of time steps. For instance, the impact of the event “Christmas” could be felt, for a company that sells potential Christmas presents, during the two months preceding Christmas day.
The impact itself is described by a regressor that can only take 0 or 1 as its values. If we keep the Christmas example, we can imagine that the impact of Christmas on sales could actually vary between the week of Christmas (very high), two weeks before (high) or two months before (low).
If we work with one data point per week (which is very frequent for MMM projects), the event impact will be handled the same way if the event happens on a Monday or Sunday. However, in reality, we can often presume that the sales peak caused by an event will happen X days before the event, with X being stable throughout the years. So, in the case of X=1, the peak impact of an event happening on a Monday would actually be the week before.

In order to handle all these issues, we have come up with the following methodology that can be applied to almost every event. Let’s describe the pseudo algorithm:

— — — — — — — — — —

a. We store all events, with their names and all the precise dates at which (or around which) they happen throughout the years in a file (see Image 1).

b. For all of the different event names:

— b.1. We assign to each event name the following hyperparameters:

delta_days_peak_effect = the number of days between the actual date of the event and the date of its peak effect (negative values mean that the peak arrives before)
delta_time_steps_before_peak = the duration (in time steps) of the effect of this event, strictly before the peak
delta_time_steps_after_peak = the duration (in time steps) of the effect of this event, strictly after the peak
curve_std_idx = index translating the shape of the curve around the peak: -1 for a pointy shape, 0 for a triangle, 1 for a bell shape.

— b.2. Given a set of hyperparameters, we generate the curve corresponding to all the same event’s effects throughout the years, remembering to consider the potential cross effects of the year before and year after over the date range we are working with (see examples).

c. We give all these curves as custom regressors to the Prophet model.

d. Afterwards, we will be able to cross-validate the potential values of all hyperparameters to look for the best combination.

— — — — — — — — — —

*Image 1: Example of a table to give to the code, with the events’ names mentioned for all specific dates.*

Now, let’s see the impact of some of the hyperparameters previously mentioned (Images 2 and 3). We are looking at the curve generated by the event “Christmas” over 2 years, considering one data point per week, with the base hyperparameters at

delta_days_peak_effect = 0
delta_time_steps_before_peak = 6
delta_time_steps_after_peak = 2
curve_std_idx = 0

Image 2: Impact of delta_days_peak_effect and delta_time_steps_before_peak over the regressor. We can see how delta_time_steps_before_peak impacts the length of time impacted before the peak. Also, we see that the peak dates do not differ similarly in 2020 and 2021: same peak week in 2021, but one week difference in 2020.

*Image 3: Impact of curve_std_idx over the regressor. We can see that the higher curve_std_idx is, the wider the curve will open up around the peak.*

We can see that just by calibrating these 4 hyperparameters, we can completely control how an event will impact the target signal. By using this methodology, we now have a very precise way to describe the impact of all of the particular events that can influence the context.

PS: for the regressors that are unrelated to event dates, we would have to deal with them differently. For instance, if we want inflation to be a regressor, we could directly give the inflation curve to Prophet as a custom regressor.

Trend and Seasonality

Now that we have created all of the custom regressors, which can precisely describe the contextual impacts of all holidays & special events through time, we simply need to run Prophet using these regressors to get the decomposition of the curve into trend, seasonalities, and custom effects.

Since we have special regressors for all the particular effects that could impact the signal (except marketing-related ones), what should be left to gather in the signal are a smooth trend and some smooth seasonalities, translating into the natural evolution of sales. By “smooth”, we imply that these signals should have low-intensity variations ; in other words, their accelerations should not reach too high or too low values.

We now want to impose this smoothness to ensure that we leave all intense variations to the marketing effects. To achieve this as best as possible, here is a table that summarizes how to deal with Prophet hyperparameters optimally:

*Image 4: Indications for all the relevant Prophet hyperparameters.*

PS: see here for more detailed information on Prophet parameters.

Example

Here’s an example of decomposition using the previously described algorithm (see the repository to have access to all of the code).

We did a cross-validation over the ranges of values considered “plausible” for the hyperparameters and retrieved the best combination in terms of RMSE.

Image 5: Example of decomposition of a sales signal. By giving all events that could impact the signal as particular effects, and restraining the Prophet hyperparameters, we were able to get a satisfactory decomposition.

Conclusion

By giving the decomposition algorithm as many particular effects as possible and imposing the trend and seasonality of the decomposition to be as smooth as possible, we have developed a practical solution that optimally separates context and marketing.

If you want to use this algorithm to gather contextual effects from your target signal, you may have to adapt it to your use case. But if you understood the reasoning behind this article, you should be able to get where you want pretty quickly.

If you have any questions, please let us know in the comments 🙂