What is the best MMM tool for your company? An evaluation of LightweightMMM, Orbit, and Robyn

DP6 Team
DP6 US
Published in
16 min readAug 2, 2023

Marketing mix models (MMM) rely on statistical techniques, such as multiple linear regression, to analyze historical time series data and identify causal relationships between a company’s marketing mix (marketing strategies and tools to leverage KPIs) and business results (sales, subscriptions, etc.).

The main objective of an MMM is to estimate the marginal effect of the variables that make up the marketing mix. The KPI forecast, therefore, is only a secondary result of the model.

The advantages of MMM over other techniques with similar objectives (privacy-friendly, offline resilience, flexibility etc.) and the existence of several ready-made services and tools on the market, many of which are free, make their implementation tempting for any marketing team, at least to discover how many of the promises they make can be fulfilled in reality.

The crucial question in relation to implementing a marketing mix model is: what is the best tool to use?

The answer to this question depends not only on the available tools, but also on the specifics of each company (knowledge of the technical team, time required to implement the model, budget etc.). In this post were going to talk about three tools, which are provided by some of the main technology companies in the market and are all free to use. We hope that when you have finished the article, their similarities and differences will be clear, and you will know which questions to ask when choosing one over the others.

LightweightMMM (Google):

LightweightMMM is a Python MMM library authored by Google. It is an open-source library available on GitHub and is easy to install. The equation governing the model is in the form:

In this model, you can get separate estimates for the effects of the baseline (alpha), trend, seasonality, a variety of media channels, and other external factors relevant to the business (macroeconomic data, holidays, brand studies, competitors etc.). The technical details on how the model works and how it is used are well described in its documentation, and in an article by Jin, Y. et al., (2017), whose links will be provided at the end of this post.

LightweightMMM uses Bayesian statistics in model building. Basically, Bayesian models allow you to introduce to the model any initial knowledge about the probability that something will happen (in the context of marketing, an example would be the probability that a channel is responsible for a certain portion of a result observed in a KPI in the period considered).

Imagine that the effectiveness of specific channels has been estimated using experiments, or that your team has assumptions about the effect of some variable on the KPI based on experience (e.g. you expect an increase of 10% in the number of socks sold in the week leading up to Father’s Day). This information can be passed to the model, and even better, it can include the degree of certainty we have in each situation.

Formally, we call these assumptions priors, and they translate mathematically as probability distributions. The model relates the priors to the probability that the observed data occurred considering these priors (a concept called likelihood) and then updates the value of the priors. The probability distributions obtained at the end of the process are called posteriors. The model parameters are considered random variables, and are estimated through posteriors (in frequentist statistics, parameter values are fixed).

In summary:

  • Bayesian models allow the assimilation of prior knowledge about the model parameters.
  • The prior knowledge provided to the model is adjusted according to what is observed in the available data (sales, impressions, costs etc.).
  • The reliability of parameter estimates is represented by probability distributions.

As it is a specific library for creating marketing mix models, LightweightMMM has particular features for this area. The library incorporates media transformations related to carryover, adstock and saturation (the mathematical equations behind these transformations are explained in the article cited at the end of the post), which allows the model to infer delay, lag-weight and saturation for each of the informed media channels/campaigns. In fact, the main difference between the terms ‘media channels’ and ‘other factors’ in the LightweightMMM equation is that all data identified as media data undergoes the media transformation defined at the time of model initialization (you can choose the most appropriate transformation for your data from 3 options), while data indicated as ‘other factors’ is not transformed.

The library also includes outputs and views of metrics of interest, including a graph relating the actual KPI to what was predicted by the model during training:

A visualization of the assumed initial distributions for each of the model parameters (priors) and the inferred distributions at the end of training (posteriors):

The estimated percentage contribution that each media channel makes to the KPI (you can plot the same graph with the ROI calculation):

As well as the contribution of the channels over time:

LightweightMMM does not explicitly show the contribution of some parameters of its base equation in the KPI, such as the contribution of so-called ‘other factors’ or seasonality. However, the contribution of these parameters can be calculated using a model attribute called ‘trace’, which is created after training. The trace is a dictionary containing the posterior distributions of the model parameters.

In addition, this library comes with the optimization functionality already implemented. When informing the budget for optimization, the ‘find_optimal_budgets’ method determines the ideal investment for each channel. You can compare the percentage of the budget allocated to channels before and after optimization:

Another visualization of interest available in LightweightMMM is the channel response curve. With this you can verify not only the investment vs return ratio per channel, but also the average investment (considering historical data), and the optimal investment according to a budget (this is determined by using the ‘find_optimal_budgets’ method).

One final point of interest is that LightweightMMM was built from the ground up to handle hierarchical data, a feature that is especially useful in an MMM context, where data can be broken down by region.

Robyn (Meta):

Robyn is authored by the Meta team and written in R. It is also an open-source library that was designed specifically for media mix modeling applications. In Robyn, frequentist statistics are used in the process of adjusting the model’s parameters. Robyn’s base equation is:

As can be seen in the equation above (taken from the official documentation), Robyn also breaks down the variable of interest (KPI) into intercept (or baseline), seasonality and trend, transformed media data, and extra factors. Although it uses different equations, Robyn’s media transformations serve the same function as Lightweight i.e. to model the decay and saturation of media impacts.

Similar to Lightweight, during model initialization you must indicate which data belongs to which categories (media, holidays, organic variables, paid variables and costs etc.). While LightweightMMM has specific terms for modeling the trend and seasonality of the data, Robyn makes use of Prophet (Meta’s forecasting library) to calculate these time series components. Prophet is also capable of handling holidays and other one-off events that make up the data period.

During model initialization and training you can also pass the expected value of some parameters, such as those related to media transformations, and data related to experiments. The difference, as previously mentioned, is that in frequentist statistics the parameter values are fixed, which means that it’s not possible to inform the model of a ‘confidence level’ in relation to the information provided. In practice, if experimental data results are passed to the model, they are treated as absolute truths, and start to make up the multi-objective optimization (we will talk about this in more detail later on).

After installing Robyn, you can run a demo script that uses data provided by Meta for demonstration purposes. The outputs and functionalities available in Robyn will be used as examples below and were generated from this script.

During the modeling process, Robyn generates multiple initial models based on two success metrics: NRMSE and DECOMP.RSSD. NRMSE is related to the error in the KPI forecast (actual vs predicted) and DECOMP.RSSD is a metric that considers the distance between investment in channels and the coefficients determined by the model and can be translated as: “if 50% of my investment is in channel A, it is more likely that channel A is responsible for 45% of my KPI than for 3%”. In the documentation these metrics are referred to respectively as ‘Model fit’ and ‘Business fit’.

When inserting experimental data for model calibration, MAPE would also be included as a metric to be minimized.

In the image below, the red line represents the so-called “Pareto front”. The points that make up this front are those considered optimal by the Pareto rule: those in which it is not possible to optimize one of the metrics (in the example, to minimize the NRMSE or the DECOMP.RSSD even further) without worsening the other.

Robyn shows the best models, based on these two metrics and Pareto optimization, but it is up to the analyst to select the best model based on business knowledge.

To help you select the most suitable model, Robyn groups the optimal models into clusters and selects a model within each cluster to analyze. In addition to the graph above, some other graphs are generated for comparison between the models. The main one is a graph with the estimated contribution of each of the regressors in the equation (unlike LightweightMMM, the trend, seasonality and extra factors are already included here).

A graph of the KPI predicted by the model versus the actual KPI (which Lightweight also provides).

Channel response curves, which help you look at channel saturation and average level of current investment (similar to LightweightMMM). One advantage that it has over LightweightMMM is shown by the shaded areas in the image below. These areas show the current level of adstock by channel i.e. the influence of investments already made one step ahead.

Robyn also includes an extremely informative graph, which in a single image incorporates the estimated ROI for paid channels and the comparison between the investment percentage and the channel contribution percentage to the KPI (similar to DECOMP.RSSD).

In addition, it generates graphs related to media effects over time (adstock/carryover), model residuals (which help to determine whether the model systematically returns errors for a specific value range), and clusters generated from the optimal models. By analyzing each of these graphs by model, it allows you to select the model that is most suitable for your business. For example, a model that is better adjusted to the KPI may not be suitable for attributing a very high ROI to a specific channel.

After choosing the model, it is possible to save it and optimize investments. Optimization can be done by defining a specific budget, specifying an expected ROAS, or using the same budget that was entered during the model training stage. The image generated at the end of this optimization contains information like the return on investment, with and without optimization, the initial and optimal investment percentage per channel, and the response curves of the channels that are now included, in addition to the carryover and the initial investment (cited above), the optimal investment point:

Orbit (Uber):

Like LightweightMMM, Orbit is written in Python. It is an open-source library and is also available on GitHub. However, unlike the two libraries mentioned so far, Orbit is not a specific package for media mix modeling. It was created by Uber and its main use was as a more general forecasting tool, but you can still use it to build MMM models (remember, MMM models are nothing more than statistical models that relate media efforts to a KPI).

The implications of not being an MMM-specific package are that Orbit does not treat media data differently to any other regressor used to help predict the dependent variable (i.e. there are no media transformations in this case), and that the orbit outputs refer to the standard outputs of time series forecasting packages and not the outputs we see in MMM tools.

So, what is the advantage of using Orbit? To answer this question, we need to look at one of the models that Orbit uses to forecast time series.

Orbit natively includes 4 models for time series forecasting (similar to LightweightMMM and Robyn equations). These models are: Damped Local Trend (DLT), Exponential Smoothing (ETS), Local Global Trend (LGT), and Kernel-based Time-varying Regression (KTR). Our interest is in the fourth model on this list, the KTR. It was included in Orbit after its launch, and the great advantage of this model is that it is built with time-varying coefficients. The basic KTR equation is:

The first two terms on the right-hand side of this equation represent the trend and seasonality of the time series, while the terms within the product are the series’ regressors: P is the total number of regressors, with no difference between media and non-media. With a logarithmic transformation, the similarity of this equation with the LightweightMMM and Robyn equations becomes even more apparent:

The point here is in the determination of 𝛽, which varies with time according to the formula:

To understand the formula, imagine a time series with 100 datapoints, and that for a given regressor p, the value of the coefficient corresponding to that regressor is known at specific points in the series, let’s say: t=0, t=10, t=20… These values correspond to bj,p, and the calculation of 𝛽 at time t follows the idea of a weighting between the distance of t and tj. That is, if we know that at datapoint j = 0, bj,p is equal to 0.8, and at datapoint j = 10, bj,p is equal to 2, then we know the value of 𝛽 at datapoint 3 will be closer to the value 0.8 than to the value 2. The transition of the value of 𝛽, when walking through the time series, is determined by w, which is a time-based weight function defined by a kernel (which can be understood as a suitable function). The values of bj (with j ranging from 1 to J) are determined by the model in a similar way to the determination of fixed coefficients in a standard model. More details about KTR can be found in the official Orbit documentation and in the article “Bayesian Time Varying Coefficient Model with Applications to Marketing Mix Modeling”.

The title of the article hints at one more detail about Orbit: it also makes use of Bayesian statistics. It is, therefore, a model with coefficients that vary over time, with which you can use priors. Unlike LightweightMMM, in Orbit each channel can receive more than one prior, specified in time. If we run an experiment over a specific time period, we can tell the model that “the expected contribution to channel X between dates D1 and D2 is n%” and not just “the expected contribution to channel X is n%”.

In addition, Orbit has three main classes: Forecaster, Estimator and Model. This structure was designed to facilitate the construction of custom models, where the first two classes would be reused and the user could create their own model.

During the initialization of the KTR model, you can determine the number of nodes in the model, the distance between them, or their specific position in the series. The total number of nodes corresponds to J, and is the number of latent variables bj,p per regressor, with J ranging from 0 to J.

Orbit also generates a predicted vs estimated graph using the training data. As it is a forecasting library, it allows you to plot a graph with the forecast n steps ahead (not shown here).

Another interesting feature is the ability to plot the estimated value for the coefficients of the covariates in time:

In the example above, highly informative priors (with a high level of certainty) were passed to each of the regressors. It is clear where the priors were incorporated in each of the independent variables (the periods where the credibility intervals are narrower).

Orbit is also able to split the time series into a predetermined number of folds and train the model in each split, which is a useful feature in the validation step of the model’s hyperparameters.

For the other Orbit models, whose coefficients are not variable over time, you can also generate a series of plots that allow you to analyze the relationship between the prior and posterior distributions of the model parameters. The generation of these outputs in KTR is not straightforward, as these distributions vary over time, but it is possible with the implementation of modifications to the scripts present in the documentation.

Considering its functionalities, Orbit’s ability to create media mix models is clear, which is even mentioned in the article that explains the mathematical implementation of the KTR model. However, it is necessary to include MMM-specific features like those found in LightweightMMM and Robyn.

Here is a comparative table to summarize the main characteristics of the three tools that we analyzed:

Conclusions

When analyzing its outputs, both in terms of quantity of information and quality, Robyn appears to be the most robust tool in terms of MMM. Its outputs are clear and visually beautiful, despite the complexity of the information it contains. One of the main disadvantages of Robyn is that it is not possible to use priors. However, this is a minor problem when the level of reliability of the information provided is high.

As for the language, the fact that Robyn was written in R could be considered a disadvantage by some. Python, the language used in LightweightMMM and Orbit, has a clearer syntax and a larger community of users, which means that more libraries are available.

One final consideration about Robyn is that it generates multiple models at the end of training. Although you can pre-select the best models and it provides a series of visualizations for comparison, you still have to choose a model that is more aligned with your business. This requires a greater degree of knowledge from the analysts, who need to make the right interpretation of the provided outputs.

The use of Bayesian statistics and the ability to use priors is a great advantage of LightweightMMM. Another benefit of using priors is the option to include more channels (as long as there are more informative priors for these channels) with no need to include a large number of new datapoints.

For those who are just starting to work with Bayesian statistics, LightweightMMM also provides simpler documentation, which makes it easier to understand the mathematical models behind its operation. In addition, its code is well-structured and straightforward, and can easily be modified to implement new features if necessary.

Like Robyn, LightweightMMM is a specific library for creating media mix models. However, its outputs are more basic, and it requires several new features to reach the level found in Robyn.

In relation to Bayesian statistics, an additional precaution that analysts must consider is sensitivity to priors. Models fed with very informative priors, or with an insufficient amount of data, will be more susceptible to the information present in the priors, which can result in biased models.

Finally, Orbit includes very advanced functionalities in its classes, but they are more specific to forecasting and Bayesian statistics than to MMM. In order to use Orbit as an MMM tool, you need to include some functions that are more specific to MMM, which requires more time and a technical team with a higher degree of knowledge.

There are, however, four different models already installed in the library, and it is easy to create new models.

Orbit’s biggest attraction is the KTR model, which allows you to include time-varying parameters, as well as priors in specific periods, corresponding to specific knowledge acquired by the business.

As stated at the beginning of this post, the objective here is not to finalize the discussion about which MMM tool is the best. I hope it has become clear in the course of reading that each of these libraries has advantages and disadvantages, from their complexity to the functionalities they include, and even the time needed to build a model and make necessary modifications. The main objective of the information presented is to help you choose the most appropriate model for your company’s current scenario, your strategies and your team.

Articles and documentation for reference:

LightweightMMM:

Orbit:

Robyn:

Profile of the author: Lucas Suplino| With a Masters in Mechanical Engineering from USP, I’ve been working as a data scientist at DP6 for a year and a half, and am passionate about ML.

--

--