Applying Marketing Mix Capabilities to the Healthcare Business to Maximize Growth

Piero Ferrante
CVS Health Tech Blog
11 min readMay 1, 2023

By: Darren Malysa, Bingcan Chen, Alexandra Von Gerichten, Matthew Rabito (from the Analytics & Behavior Change department at CVS Health®), and ZS Associates (a consulting firm for data science and analytics)

The need for enterprise-level scaling of analytics capabilities

In today’s world, healthcare and data-driven decisions go hand-in-hand. At CVS Health, we are constantly building innovative solutions to answer the most pressing, challenging business questions. In this blog post, we will demonstrate how we are scaling our Marketing Mix Modeling capability across key business areas, leading to better operational efficiencies and a more data-driven organizational culture with real-time intelligence.

What is Marketing Mix Modeling & what are the common challenges?

Imagine you were asked to answer questions like what’s the impact of an email open on driving a diabetic patient towards a healthier diet? Or what’s the impact of watching a YouTube ad on that individual eventually refilling prescriptions on time? What about the impact of COVID on top of all that’s been happening from your marketing? All of these are difficult but important questions for businesses to understand before marketing can be properly measured and scaled. Marketing Mix Modeling is specifically designed to address these types of questions. Keep in mind that:

1. Marketing always has delayed impact

2. Marketing always has diminishing return for continuous investment

3. Marketing always happens in conjunction with other key growth activities (i.e., product redesign, new competitor entry, etc.)

Due to the above, Marketing Mix Modeling is designed to combine marketing and non-marketing factors into a multivariate statistic model with a time-series setup so that we can isolate marketing while controlling for other important factors. Key considerations for a successful MMM are generalizability and scalability. In most scenarios, model designs are rather complex, and outcomes could be heavily biased by the idiosyncrasies of the business and evaluation cycle. Through our recent work at CVS Health, we have taken steps towards developing an enterprise-level Marketing Mix Modeling capability for the healthcare business that focuses on addressing these complexities to provide faster and clearer insights.

Establishing a successful model construct based on business nature

We started our journey of Marketing Mix Modeling for the healthcare business from the Medicare growth space. Medicare is unique because it’s highly seasonal. Most go-to-market activities happen only during the Annual Enrollment Period (AEP), which is an 8-week window from October 15 to December 7 when eligible Medicare beneficiaries can enroll into or switch between insurance plans offered by different providers. As an analytics organization, we are often asked about what’s working and how we should optimize our marketing budget in that 8-week period. What makes this particularly challenging is that intense marketing activities are happening in such a short window of time while multiple campaigns are running concurrently across target geographies. To tackle these challenges, we made sure to evaluate our MMM from many angles. We detail some of the key considerations and decisions below.

Legacy approaches to MMM and common pitfalls

The classic approach for MMM, dating back decades, is to establish a relationship between the core marketing activities and the KPI of interest (sales, revenue, etc.) through statistical regression. Typically (in contrast to more prediction-focused models), the regression results will consist of a handful of parameters, or just a single parameter, to describe the relationship between a particular marketing activity and the corresponding effect on the outcome KPI. For example, in a very simple linear regression construct, we might obtain a positive coefficient for TV advertising of 1.5, indicating that for every $1M invested, we attribute an incremental $1.5M in revenue. However, given the unique nature of marketing (e.g., delayed impacts and diminishing returns), there are some pitfalls to be aware of when developing these explanatory models.

A couple of the most famous types of marketing-specific dynamics that we considered, “diminishing returns” and the “carry-over impact of marketing”, could warrant their own blog post, but for brevity, we will focus on just the former. Diminishing returns is a critical, un-ignorable component of any MMM. In our example, if we use the actual (non-transformed) marketing activity and KPI data from a linear model, our 1.5 TV coefficient will have a constant return on investment, no matter what the spend level might be. While this is potentially permissible within very small ranges of possible budget reallocation, in general, it’s overly optimistic about the true impact marketing has when spend increases.

A common solution involves applying a non-linear transformation to the marketing variables prior to running the regression. Each marketing activity may have its own transformation, with its own parameters. A typical example is the logarithmic transformation, but the possibilities are vast. Certain marketing activity can be more accurately represented by an ‘S-shaped’ function, where small levels of marketing have a miniscule effect, until a given threshold is surpassed. Given how much our marketing activities coincided with one another in the AEP window, finding the right representation of diminishing returns was critical to our MMM’s success.

Beyond accounting for diminishing returns and carry-over, there are a variety of additional considerations that can improve the performance of an MMM model. For instance, when 2 or more marketing activities are highly correlated with one another over time, an issue can arise. The mechanisms of an unconstrained regression, with all these correlated marketing activities included, can tend to ‘lock-on’ to just one of the activities, and consequently output negligible, or even negative, effectiveness metrics for the other activities. Of course, domain knowledge and last-touch attribution often disprove such results, and the model must then be adjusted to correctly capture the individual marketing effects. Strategies for mitigation might include Bayesian modeling, experiment calibration, redefinition of the objective function, or higher-level aggregation of correlated marketing.

Worth noting, we also need to consider the risk of confounding variables, which can be tricky to deal with. That said, there are ways to attenuate the consequences. In an MMM, an example might involve an underlying connection between online search exposures and the outcome metric (e.g., sales). To demonstrate a worst-case scenario, imagine a user was served a faulty ad, which could in no way influence the customer’s buying decision. Then, the same high likelihood to purchase data that led to them being served that ad in the first place, which was inconsequential, played out independent of the ad. In this case, the regression may incorrectly attribute the blank exposure/spend to the sale. This is to say that each type of marketing activity should be carefully examined for such potential biases. We find that often the best way to mitigate such negative consequences is through supplementation of additional analyses and domain knowledge, such as last-touch attribution or individual prospect journey analysis.

From model to business actions: addressing key challenges to scalability and real-time intelligence

Now, let’s circle back to some of the more unique aspects of our MMM. The main objective is to provide results at a more granular, regional level rather than at a national level. This is necessitated by multiple factors not limited to: the diversity of the product, the contrasts in the competitive landscape, and consumer profiles, each across different geographical regions. By utilizing a hierarchical structure that’s aware of these regional specifics, we can use our robust datasets to appropriately capture the nuances of marketing effects at the most granular level.

While such modeling steps certainly add a non-trivial amount of complexity and development time into the modeling process, as we’ll see the rewards are well worth it. Ultimately, this provides marketing coordinators insights about specific regions that may have previously been blurry at the national level. In our case use, marketing was ultimately measured holistically with 2.5X increased overall impact, and we could successfully reshuffle our budget mix towards the more efficient channels on the local level.

Once we are satisfied with the performance of our model, the next step is activating it. The core ability of any MMM is the re-allocation of spend across all potential marketing activities in the most optimal manner. What if we spend $10M more in Search? What if we pull back the investment in certain areas assuming market disruption? How much more money can we invest before hitting saturation? Ideally, this budget allocation process can be completed in real-time, yielding optimal results after modifying the desired constraints. This kind of real-time computation, in conjunction with a UI, allows a MMM to shift from an interesting analytical toy to an integrated marketing strategy decision support tool.

In the process of standing up such a platform, a key insight, which influenced the budget optimization process, was the improvement we observed from utilizing a multiplicative model structure over an additive one. The additive regression model traditionally has independence between changes in covariates (that are not involved in interactions) due to its additive nature. A change in the marketing spend for 2 activities simultaneously will result in the same net change on the outcome, as in the case where 2 activities are changed one at a time and the outcome changes are summed.

In contrast, a multiplicative model (as the name implies) instead multiplies all the covariates together. This kind of a formulation inherently induces a synergistic effect between the marketing activities and other variables. It cannot be said that a multiplicative model outperforms an additive model in general; however, in our use case, we found the difference to be substantial. This is likely driven by the substantial interaction effect between the upper and lower funnel marketing tactics employed.

Getting back to the additive model, a major benefit is the ease in which the what-if scenarios and budget optimizations can be performed. Depending on the transformations, the optimization of the marketing activities for a simple-to-moderately complex additive model can often be in the order of seconds. On the other hand, the multiplicative model’s optimization is not as straightforward. The search space for a multiplicative model of the same complexity can be much larger, relying on specialized non-linear optimization techniques. With proper feature engineering, transformation, and non-linear optimization, the multiplicative model’s optimization process can be accelerated, but normally not on the order of a few seconds making it unsuitable for real-time scenario planning. Addressing these challenges while retaining the robustness of our model required significant mathematical resourcefulness. This is where we worked closely with ZS Associates to find an innovative solution for handling the rigor and complexity of the multiplicative model, but at the same time enabling near real-time simulation (i.e., seconds).

By applying partial differentiation principles, we eventually converted our multiplicative model to a family of nested, additive models where each marketing tactic has an independent pseudo-response curve. The nature of additive formulation allowed independent optimization of each marketing tactic, which significantly reduced the complexity and runtime. Furthermore, the nested model(s) allowed us to continue capturing the interdependence between different tactics across marketing funnels while retaining the robustness of the original model. As a result, the tested simulation results between the multiplicative model and pseudo-nested additive models ended up being comparable from a local-level optimization perspective despite the simulation time being reduced by more than 100X.

To illustrate this change, imagine a reductive case where we have only two features. Geometrically, the model describes a 3-D surface, f(x,y). Here x and y are, say TV and Radio spend, respectively, and the height z represents total sales. A naïve optimization approach might re-solve the problem from scratch each time we change the total budget, x + y. In other words, even for a small perturbation in total spend, a classical implementation of the algorithm would completely re-compute the response and perform gradient descent on the surface starting at $0. Under this approach, we could either get accuracy, by working our way up in budget in $1 increments, or we could get performance, by using a larger step size (e.g., $1M). The naïve approach would never give us both accuracy and performance.

The important realization was that our surface is continuous and monotonic, as dictated by the diminishing returns transformation functions. That is, more spend always results in more sales, even if the gain is marginal. Additionally, we never spend negative money, so we’re only looking at the quadrant where x and y are positive, and therefore we have no discontinuities. Consequently, there are no global maxima on the surface and all local maxima lie on the boundary of their region. This means that we can discretize the computation and save time by caching the optimal allocation in pre-computed increments, thus allowing us to get both accuracy and performance as we narrow the possible range of values that x and y are allowed to take.

Given a budget, the different ways to spend that money correspond to a collection of points on the surface. For a budget of $5M, for example, we need only consider values of x and y such that x + y = 5M. Geometrically, our search space for possible x and y values is exactly the intersection of the plane, x+y=5M, and the surface defined by our model equation, f(x,y). So, let’s say we compute the $5M case from scratch, and then we want to know about the $5.1M case. We first define a new search space by shifting our plane “up” to x + y = 5.1M. The magical part is that since we know the optimal values of x and y from the 5M case, and because our surface is continuous and monotonic, we do not need to search the entire 5.1M search space, nor do we have to perform gradient descent from $0. In fact, the optimal allocation for $5.1M is very close to the optimal allocation for $5M. Now, we can confidently initialize our optimization using the winning values from the $5M case and perform granular gradient descent between 5M and 5.1M. In this manner, we pre-compute some “checkpoints” for the optimization algorithm up-front and save substantial computation at runtime.

Wrapping up

This is just the beginning of our MMM journey. While Marketing Mix Modeling has been in the field for decades, with the deprecation of cookies it will become more strategically important than ever. We are happy that we were able to find an innovative way to activate more advanced MMM approaches for real-time use cases and are now in the process of expanding MMM to other lines of business.

References:

  1. https://github.com/sibylhe/mmm_stan
  2. https://www.latentview.com/marketing-mix-modeling/
  3. https://www.nielsen.com/wp-content/uploads/sites/3/2019/04/marketing-mix-modeling-what-marketers-need-to-know.pdf
  4. https://github.com/mecommerce/ThirdLove-Tech-Blog
  5. https://engineering.hellofresh.com/bayesian-media-mix-modeling-using-pymc3-for-fun-and-profit-2bd4667504e6
  6. https://medium.com/@Marko_Mi/advertising-adstock-theory-85cc9e26ea0e
  7. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/6995d00a49a332b63e924dbcd42b37782d4ff498.pdf
  8. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45998.pdf

© 2023 CVS Health and/or one of its affiliates. All rights reserved.

--

--

Piero Ferrante
CVS Health Tech Blog

Data Science Fellow at CVS Health with 15 years of applied ML and engineering experience in healthcare, adtech, and fintech.