Marketing Mix Model 2022: Why the end of third-party cookies embraces more data science!

Florian Grüning
Published in
6 min readJun 20, 2022

Performance marketing is a key driver for increasing sales and brand awareness for digital and physical businesses. In order to track data-driven campaign performance and return-on-advertisement-spent (ROAS), two measurement methods have become established: a) attribution models and b) marketing mix models.

What is an Attribution Model and why have they lost relevance

An attribution model evaluates the metadata left behind by customers on the Internet. This consists of first-party cookies (usage data on the company’s own website and/or service) and third-party cookies (historical usage data of the user on another website and/or service). With the help of this metadata, a user journey can be traced across different advertising channels. For example, it can be analyzed on which channels a user was exposed to advertising and when, and how the user reacted (e.g. with a click).

Depending on the model, the channels are given a different weighting or relevance for a conversion event(e.g. product purchase): First Touchpoint (100% weighting for the very first advertising contact), Last Touchpoint (100% weighting for last advertising contact), Linear (each advertising channel receives an equal weighting), Time-based (the channel contact that is closest in time to the conversion receives the greatest weighting proportionally.

The different ways of calculating attribution show that the same data can lead to fundamentally different results in ROAS. For example, if the last touchpoint always receives 100% of the conversion. So please take one thing for given there is no real ROAS figure. It always relies on which model was basically used to come up with a ROAS.

Due to governmental ambitions, the tracking and analysis of third party cookies is becoming more and more difficult and unreliable. For example, Apple no longer allows third party tracking, the European GDPR makes it no longer easy to analyse the metadata without a clear consent from the user and finally Google has also announced to allow third party cookies only limited.

The Renaissance of Marketing Mix Model and “real” Data Science

Due to the limited possibilities of attribution models, statistical methods are becoming more relevant again in order to holistically understand the advertising impact. Marketing mix models put advertising measures into a statistical context via a regression model while attribution model relies on advanced tracking but is in the end just a descriptive analysis of data. The statistical marketing mix model makes it possible to include any number of other parameters in the cause effects relationship of advertising, such as economic indicators (e.g., purchasing power of a countries, inflation indices), competitive activities (e.g., competitor prices) and brand effects (e.g. via Searches for your brand name or industry on Google Trends).

An exciting off-the-shelf model is Robyn. Robyn is an open source R package developed by Meta and is freely available. It combines best practices from Time Series Analysis and makes it easy to implement. The model has a dependent variable on which the influence of advertising channels is modeled. Sales and revenue or relevant conversion events (such as sign-ups) are the preferred dependent variable to predict.

Modeling with infinite possibilities to represent the business reality

The independent variables can be divided into paid media (with information on Impression and Spent), organic variables (such as newsletter campaigns or the search behavior on Google of a brand), or context variables (such as competitive prices, inflation indices, or special events). The key advantage of a regression model is that we can model different metrics, measures, and sources in one model, since the variance in the data is key instead of the absolute value (as in an attribution model, for example).

Example from the Robyn Github

Digital Paid media also benefits from being able to derive return on investment through, for example, impressions and spends. Offline Paid Media has no detailed impression measure but the reach is calculated in GRPs (Gross Rating Points). Robyn also takes this into account. In summary, a marketing mix model helps to map the effects on e.g. revenue in a holistic way. There are no limits to creativity, here are a few interesting possibilites for independent variables:

  • Prometheus Seasonality Decomposition for timeseries analysis to derive effects of seasonality, trend and holidays. (directly integrated in Robyn)
  • Calculation of social media influencer costs (fix costs) and subsidies (the discounted value of a voucher promotion as indirect costs).
  • Google Trends or absolute number of searches for the brand to integrate a branding effect in the model. You can use the Kuwala connector to easily build out search terms for Google trends and store them in a PostGres.
  • Inflation rate, to see how strong the influence of the decreased purchasing power is.
  • Events data as a factorial time series moments. Your brand might be sponsoring bigger events like a music festival.

Ridge Regression and Variable Transformations with Adstock Effect

For each dependent variable, one can set different hyperparameters. This is necessary because for adjusting ROAS and improve the predictive power on the dependent variable, the advertising adstock effect is also important. For example, there are TV advertisements that may not have a buying effect immediately when the advertisement is aired. Secondly, TV advertising is known to have a strong influence on brand perception which also needs to be reflected. In fact a person might have seen a commercial which raises the awareness of the brand but did not convert to a revenue on the same day. It is more likely that a potential customer is now primed for your brand name and message. A couple of days later a customer follows on social media a influencer using a product. And that might be the ultimate trigger for a potential customer to buy your product.

For that reason you can make use of a geometric function which basically carries over a part of your marketing spents to the next day. Robyn has also a more advanced Weibull model. Robyn has already prepared some hyperparameters for different channels, however, besides making use of the Geometric function you can also use the Weibull function.

Ridge regression is used in modeling to keep multicollinearity among variables low and only keep variables with a significant added-value to the model. Its often used as a tool for feature selection with high correlation between independent variables. It is important to adress actively multicollinearity since in a business context multiple advertising channels are activated at the same time and are similar in terms variance and revenue response.

Since hyperparameters, saturation curves, coefficients and weights build up a mixture of many potential models and iterations, Robyn comes also with a model selection algorithm. The Nevergrad library is used to determine the survivor of the fittest out of 1,000s models you generated automatically with Robyn.

Lastly, Robyn gives you the possibility to build practical predictions on top of the model that helps you with…

  • Model selection → print out a report for the top 3–5 models and asses them in a team
  • Budget Allocator → Calculating the optimized marketing mix model under a given budget you want to spend
  • Model Calibration → Integrate results to calibrate the model even better.

Ready, Set, Go! Start with your first marketing mix today

We integrate the Robyn model into Kuwala. While Robyn helps you to build a model upon clean data you can use Kuwala to build stable data pipelines that feed data into the Robyn Model. If you are interested to discuss with us your marketing science approaches please feel free to reach out to us or browse our Github Profile here.

Building a Pipeline from a data warehouse that fits into the Robyn Model

In conclusion, Robyn is simple to implement for Data Scientists, as all described processes happen in the background of Robyn. Off-the-shelf models have the problem that it is ultimately difficult to go deeper into all the underlying processes, but the Robyn team has active support via email, issues on Github, a rich documentation and a Facebook group for users.



Florian Grüning

I am all about content on how to enable fastly complete analytics workflows for companies 🚀