Marketing Mix Modelling with Robyn on Vertex AI

Published in

Google Cloud - Community

15 min readSep 7, 2022

The digital marketing landscape is going through a tectonic shift. Cookieless will disturb how we measure and understand the effectiveness of digital advertising investment. That’s why more marketers and analysts are turning towards statistical modeling, which doesn’t require user-level tracking. One of the popular examples is Marketing Mix Modelling (MMM).

MMM is a technique that allows marketers to measure the impact of their marketing and advertising campaigns and determine how various channels contribute to their goal (e.g. sales revenue).

One of the open-source libraries that can help with MMM analysis is Robyn (https://facebookexperimental.github.io/Robyn/). Robyn is an R library, and one of the goals of this article is to show how to run MMM analysis with Robyn on Vertex AI: Google’s Machine Learning platform available as a service on Google Cloud.

Robyn’s installation procedure is well documented but given that the library is yet in the experimental stage, you will easily find posts and comments with complaints from people facing issues here. To help you get through this step, we will also show how to build a Dockerfile definition that considers all the dependencies needed to run Robyn-based R scripts. We will use that Dockerfile to instantiate Vertex AI Workbench for interactive experimentation with Notebooks and then to schedule automated MMM jobs on Vertex AI serverless ML infrastructure.

Why should you care about Marketing Mix Modelling?

MMM is a well-established approach which was proof tested for many decades by media houses way before the digital age. Naïve thinking of the impact of the advertising spend could be a simple linear model, e.g., sales = weigh1 * channel1 + weight2 * channel2 + base.

But the reality is more complex than that. There is a channel saturation (diminishing returns) effect, which is unique to a channel. Channels differ in terms of how quick/slow they are in driving customers’ actions (also known as adstock). Finally, what you want as an outcome is an optimized distribution of your marketing budget across channels.

All of these aspects are a core part of the MMM technique. We will show you in this article how Robyn addresses these mechanisms and how you can run and automate the process on GCP with no or minimal background knowledge in statistical modeling.

Why should you run an MMM analysis on Vertex AI?

When it comes to Machine Learning, Google is perceived as the gold standard with world-class research groups like Google Brain, Google Research and Deep Mind, successful deployments of ML at scale with Google Search, Google Translate and multiple contributions like TensorFlow, Kubeflow, Kaggle, Colabs, TPUs, Kubeflow, BERT, T5, ImageNet, Parti, LaMDA, PaLM — just to name a few.

Given all the above — Google’s Machine Learning platform named Vertex AI, available as a service on Google Cloud seems to be at least worth checking out.

One thing makes Vertex AI different from other similar platforms: it is fully serverless — you get access to Google’s ML Infrastructure with all kinds of accelerators like GPUs and TPUs with NO NEED to manage servers, virtual machines, Kubernetes clusters and NO NEED to install and upgrade any software.

Source: https://cloud.google.com/blog/topics/developers-practitioners/new-ml-learning-path-vertex-ai

The second thing worth mentioning is that Vertex AI is an end-to-end MLOps platform. What does it mean? It means that it is a single place where you can manage features, label training samples, run training using your favourite ML frameworks and your favourite language (Python, R), execute automated hyperparameter tuning, implement and execute MLOps pipelines, register your models in the Model registry, run serverless batch prediction jobs, deploy your models as REST endpoints and have support for model monitoring and explainability of predictions.

The last thing is that Vertex AI will allow you to build fully custom models but will also give you access to many state-of-the-art architectures designed by Google and available as AutoML training. In our next article, we will show how to build an MMM pipeline using Vertex AI AutoML capabilities, but in this article, we will show that Vertex AI can work with any third-party library as well.

Set up Experimentation Environment

Vertex AI Workbench allows you to run the interactive analysis with Jupyter Notebooks. It has several pre-built Docker images packaged with necessary libraries for distinct Jupyter environments like Python 3 optimized for NVIDIA GPUs, TensorFlow (including TensorFlow Enterprise), PyTorch, R and many more.

Even though there is no pre-built image with Robyn, nothing stops us from building our custom image by extending the existing image for R and adding Robyn and its dependencies. And that is precisely our plan:

First, we must build a custom container image from the Dockerfile definition we prepared for this article.
Next, we will go to Vertex AI and a new instance of Vertex AI Notebook. We will instantiate that instance from our custom container image so we can run MMM experiments in an interactive environment.
Now, we have all that is needed to run some MMM experiments and learn the basics of Robyn. The ultimate goal of our experimentation should be a working script that can be executed automatically to compute new recommendations for budget allocations.
Once we are good with our script — we will be able to use the same container image to run our MMM script periodically as a Vertex AI Custom Training Job utilizing Vertex AI serverless ML infrastructure.

Step 1: Build a container image from Dockerfile

Below is a Dockerfile which is the result of many trial-and-error attempts we undertook to have all the dependencies that are needed in order to install the Robyn library.

The best way to build container images on Google Cloud is to use Cloud Build. However, you don't need to have all DevOps practices in place to start using Vertex AI; therefore, we will use Cloud Shell, which is available for all users of Google Cloud Console.

You can get that Dockerfile from our git repository by executing the following command (Cloud Shell has a Git client already installed):

git clone https://gist.github.com/a63416348809d36b46c185fd728d8f6e.git

Set PROJECT_ID variable which will represent your GCP Project ID:

export PROJECT_ID=$(gcloud config get-value project)

List available folders. There should be just one subfolder. When you get into it you should see a file named Dockerfile. Now you need to run the docker build command. We will name our docker container image as rrobyn, and the tag is the latest (the build will take up to 30 minutes, but you need to do it just once):

docker build -t “gcr.io/${PROJECT_ID}/rrobyn:latest” .

Once the build process is finished, push the newly created image to GCP Container Registry:

docker push gcr.io/${PROJECT_ID}/rrobyn:latest

Step 2: Start Vertex AI Notebook from the custom docker image

Go to Google Cloud Console and use the search bar to find Vertex AI service. Click Workbench on the left and then [+ New Notebook] button. You will be presented with a list of predefined types of Notebook environments. In this article, we will set up the Notebook environment from our custom image; therefore, select the [Customize …] option.

You will be asked for the Notebook name, Region and zone. Google Cloud will use these details to create the corresponding compute resources in the requested geographic location. You will also have a chance here to specify that we want to use our custom image:

So lets specify our container image:

Click Create, and you should see a new Vertex AI Notebook instance on the list (you may need to wait a few minutes to have your instance ready):

Click the [Open JupyterLab] button, which will open a new tab with JupyterLab. Select R kernel:

Import the Robyn library and print its version to verify all works as expected.

Step 3: Experiment with Robyn

Robyn library lets you conceptualize many details from the statistical modeling of MMM.

Let’s take a closer look at the key steps needed when doing MMM with Robyn. We hope it will help you to better grasp some basics how to work with this library.

The typical flow is shown in the following diagram:

The key steps are:

Data import
Hyperparameter tuning
Robyn modeling
Modeling output interpretation
Budget allocations

We will go through these steps following the demo script from the official documentation.

Data import

Robyn library comes with a demo dataset. Of course, in your experiments, you will use your data, but if you were looking for some other sources of simulated data, then you can take a look at Google’s Aggregate Marketing System Simulator (AMSS).

To import the build-in Robyn demo dataset, run the following instructions:

The picture below shows few sample records from that dataset:

Let’s have a closer look at this dataset and learn more about the format that is required by Robyn from our input data.

Column 1 — DATE: The date format must be yyyy-mm-dd. Depending on the data granularity (weeks, months, etc.) value represents the first day of the cycle.

Column 2 — REVENUE: Here we can choose whatever you want to optimize your investments, e.g., sales revenue or the number of conversions

Columns 3–8 — channel and non-channel data to correlate for: Those are either spent (i.e. tv_S), impressions (i.e. facebook_I), click (i.e. search_click_P), context (i.e. competitor_sales_B, events) or organic (i.e., newsletter). Put all the aspects of your media-mix, organic and context, which you believe may impact your revenue.

Then map your input data into robyn_inputs fields:

Chose hyperparameters for variable transformations

Robyn will model two characteristics of media spend data:

Adstock (carryover effect)
Diminishing returns (saturation effect)

The idea behind adstock is to reflect that advertising does not bring effects immediately, and it usually takes some time which is known as the carryover effect. Indeed, there is some time between watching a TV commercial and going to a retail store to purchase a product. Robyn gives us two methods to choose from: a geometric one and a more flexible Weibull. If you like to learn more about these methods, please check the docs. In our example, we will use geometric transformation.

Diminishing returns, on the other hand, account for the fact that there is no simple linear dependency between how much money we invest in a specific channel and how much additional sales revenue investment is going to generate. At a certain level, this channel is going to saturate, and additional money is not going to change much here. Use some numbers here: let’s say that increasing the invested budget from $100K to $150K gives us 50K additional sales, but an increase from $150K to $200K will bring only 30K more sales. These numbers are just an example, but you get the point. This is an effect of saturation of a media channel, and Robyn uses SCurve transformation to model this effect.

Adstock and Diminishing transformations model require configuration parameters (theta — for adstock and alpha & gamma for diminishing returns). You can’t really guess which parameter value would fit best your historical data. Fortunately, that is done automatically through hyperparameter tuning. Robyn uses the Nevergrad optimization library to find the best fit. What you can do at this stage is to provide boundaries which define the search space for Nevergrad where it should look for. More information on how to interpret parameters is in the official docs.

Display the complete list of parameters:

Here is an example of how you can define boundaries for parameters corresponding to every individual field that is present in our demo dataset:

Run the model

At this stage, we have all that is needed to run Robyn MMM modeling:

This step generates thousands of models. Some will be better than others. Which one is better is examined through the dimension of NRMSE (normalized root mean squared error: metric of model error) and DECOMP RSSD (decomposition root sum of squared distance: metric of how unrealistic a given model is) on the Pareto front (red line). The Pareto front is a concept that is quite popular in multi-objective optimization. It allows the designer to restrict attention to the set of efficient choices and to make tradeoffs within this set rather than considering the full range of every parameter.

Robyn will select several distinct models from the Pareto front and generate a dedicated report for each.

The next step is a manual examination of those few selected models. Why manual and not automated? Manual reviews are essential because you want to validate the results against your business knowledge/intuition. After all, these are just the models.

Robyn will help here as much as it can and produce a quite informative report describing the selected model in business terms (“model one pager”):

Understand and interpret the outcomes

So you have a report produced by Robyn in front of you and are wondering how all those nice graphs can help you decide how to invest your money to optimize sales revenue or conversions. Let’s see how a better understanding of those graphs can help. Let’s assume our investment budget should be spent to optimize sales revenue. Then below is the list of concepts explained by generated charts:

Attribution: How much of the sales is attributed to a specific channel
Actual vs predicated: How good the model is in predicting the actual sales
Share of spend vs share of impact (total ROI of a channel): How much budget you spend on a channel vs how much of the spend is attributed to your sales. Here you can understand the ROI of the channel and if you are not overspending on ineffective channels.
Response curve (diminishing returns): To understand which channels have room to generate more sales
Fitted vs residual: To understand if we are missing any critical variations in the model
Adstock: To understand how long investment in a given channel impacts sales generated through that channel. This will also show which channels tend to convert sooner (It should be no surprise that digital channels typically convert faster).

You should study those metrics and use your domain knowledge to potentially reject some of the proposed models as not being realistic. Especially pay attention to the diminishing returns, as the mistakes here can have a big impact on the budget allocation. At the end of this review step, you should be confident about which model you want to continue using.

Budget allocator

The final step is to use the selected model to answer the following question, “where should I spend my advertising money”? This is where Robyn’s Budget Allocator comes in handy.

Robyn takes your best model and generates the optimal budget allocation across paid advertising channels. To generate recommendations, it uses a gradient-based non-linear solver.

An example code to create budget allocation recommendations:

We used here the “max_historical_response” scenario, which creates an optimized media mix for the same budget size as in the previous cycle:

Robyn enables us to put some constraints which should be used by the solver (optimizer) when calculating allocation recommendations for us. In our case, we used the following constraints:

channel_constr_low = 0.7 which instructs the optimizer that we do not want to spend less than 70% of historical spend per channel.
channel_constr_up = c(1.2, 1.5, 1.5, 1.5, 1.5) — which instructs the optimizer about the maximum spend increase for each channel. The order is consistent with the order of input parameters representing our channels. So in this example, “tv_S” is constrained to have a maximum budget of 120% when compared to historical spend in this channel and the rest of paid media up to 150%.

The budget allocator produces the visual report (below) and tabular CSV file with a new optimized media channel spend mix.

The resulting graphs represent the following:

Initial vs Optimized Budget Allocation — shows optimized spend as a percentage share of your total media budget.
Initial vs Optimized Mean Response — predicts how many sales (conversions) will be driven by each channel.
Response Curve — How optimized spend (triangle point) is positioned on the saturation curve of each channel.

Step 4: Automate your MMM analysis

Vertex AI Notebooks are great for experimentation, but once our script for shortlisting models and then script for using that model to generate recommendations for budget allocations for the next period is ready, we would like to run it regularly. However, we ‘don’t want to manually go through the process of starting the Vertex AI Notebook instance, then opening our script and executing all its code cells every time we need new recommendations.

Instead, we want to schedule the execution of that script and ask Google Cloud to run it for us, e.g., every Monday at 5 p.m., so that we have the new report in our e-mail when we start our first coffee on Tuesday morning.

We need two components:

Scheduler, which will trigger the execution of our script on a recurring schedule
Computer on which we can execute our script

Scheduling will be handled by GCP Scheduler: a fully managed enterprise-grade cron job scheduler which acts as a single pane of glass, allowing you to manage all your automation tasks from one place.

Vertex AI will handle computing, but this time we will not use Vertex AI Notebooks. Instead, we will utilize its serverless training capabilities. Serverless means we do not care about underlying infrastructure — we will delegate execution: here we have our script; hey Vertex AI — execute it.

To communicate with Vertex AI, we will use its python SDK. We, therefore, need a few lines of code which will communicate with Vertex AI and instruct it on what we want to execute:

Lines 9–12 will initiate a new session (set up communication context) with Vertex AI.

Line 16 is where we set a local variable representing the URL to our custom docker image we built earlier for Robyn.

Line 18–22 is where we instruct Vertex AI what we want to execute on its infrastructure (here, our script is named mmmAnalysis.R).

Vertex AI will need to know how much computing power we think may be required for this job. We need to provide a number of workers and a type of VM instance for those workers. (Line 25 and 26, respectively).

Line 28 is where we send our job and expect Vertex AI to start executing it.

That piece of code also needs some computing. In this case, however, we can use Cloud Functions which is a serverless execution environment for such single-purpose functions.

Summary:

We showed in this article how to use Vertex AI, Google’s Machine Learning platform, to experiment with the Robyn library for Marketing Mix Modeling. First, we shared a custom Docker file which includes all the dependencies necessary to successfully use Robyn. Then we used a compiled Docker image with Vertex AI Workbench to set up an interactive environment to run some experiments and learn how media channels and other variables influence our sales revenue. We were also able to automate a large part of the process with Vertex AI Custom Training Jobs, running our script on Vertex AI serverless ML infrastructure. The outcome of the automation is the media budget allocator, which can run periodically to generate recommendations for budget allocations for the next cycle.

Your next step is to activate produced insights and recommendations. Automation can help here as well, but you may also want to take more control in this last step and put humans in the loop. In this case, you may want to use produced insights and recommendations and put them on the interactive dashboards in Google’s Data Studio or Looker that can then be used by non-technical decision makers on media strategy decision meetings in your company.

This article is authored by Lukasz Olejniczak — Customer Engineer at Google Cloud. The views expressed are those of the authors and don’t necessarily reflect those of Google.

Please clap for this article if you enjoyed reading it. For more about google cloud, data science, data engineering, and AI/ML follow me on LinkedIn.