Analytics Roadmap to Personalization

Published in

GAMMA — Part of BCG X

21 min readApr 25, 2019

Personalization opens up new frontiers in data-driven marketing, enabling organizations to capitalize on a first-mover advantage in the form of increased engagement, deeper loyalty, and a measurable uptick in returns. By establishing processes (such as experimentation) and methodologies (from machine learning to reinforcement learning) to optimize their interactions with the customer, organizations can use analytics to achieve sustainable, long-term value creation.

However, full algorithmic optimization (press the button, receive an answer) does not happen overnight; it requires a layered development of ever-increasing complexity. Nor is Machine Learning (ML) a panacea, but rather one of the many building blocks of a broader ecosystem. Indeed, using advanced analytics is a process of feeding the marketing organization with insights that effectively never ends.

This article will try to provide a business perspective to the AI/ML roadmap to incentive-based personalization: how to start, how to evolve and how to plan processes around it.

What personalization is — and is not

Personalization is a tailored, uniquely constructed interaction — or touchpoint — with a customer. And not just “any” differentiated touchpoint, but the “optimal” differentiated touchpoint for each customer. The goal of personalization is to communicate with an individual customer in ways that optimize for a desired return from that customer, such as reduced friction, higher spend, continued loyalty, etc.

Whatever the nature of that expected return, there are three dimensions to personalized marketing:

1. Measurability. The notion that we can stack-rank different actions based on quantifiable metrics. In other words, the best we can do based on (statistically significant) data, not instinct.

2. Context. The right interaction in the right context, via the right channel, at the right time. In other words, the best we can do at a particular point in time. If the context changes, the optimal interaction might change as well.

3. Depth or Longitudinal Path. The right interaction in light of the sequence of prior interactions with the individual customer. In other words, the best we can do over a period of time, not just now.

So, what exactly is “the best we can do”? In marketing terms, it means influencing a customer’s behavior. To that end, marketers define the type of behavioral changes they want to elicit. Examples include getting a customer to visit stores vs. just visiting online, increasing the frequency of a customer’s visits, incenting a customer to use an app or a digital assistant, and deepening habitual buyers’ exploration of online menu options.

Changing behavior is hard to accomplish, and entails multiple orchestrated, often sequential touchpoints across different service lines, which we call “curricula.” The same curriculum can deliver different experiences to different customers, because timing, context, and individual triggered actions — or the tactics — are personalized. Incentives are a typical, but not exhaustive, type of tactic, for example with offers such as “Visit the store this month and get 10% off,” “Get double points for shopping online,” “Buy one, get one free,” and so on.

Meanwhile, orchestration, such as a strategic decision around what behavior to push, happens at the curriculum level, whereas optimization — for example, how to drive the change in behavior — largely takes place at the tactical level. So, advanced analytics solutions should start at a tactical level before being pushed towards orchestration.

Figure 1 — Example of tactics and curricula driving a customer journey

Finally, personalization is not just about incentives, even when artificial intelligence (AI) is involved. The classic carousel à la Amazon that shows what “you might also be interested in” doesn’t require any discount or reward to be effective. Other levers of personalization are:

· Relevance (e.g., I’m a vegetarian; don’t talk to me about bacon)

· Frictionless experience (e.g., Make my life easier; reduce the number of times I need to click on something)

· Engagement (e.g., Give me a reason to come back and use your app)

Bottom line: Optimizing rewards is a complex analytical challenge.

How traditional marketers and data scientists think about personalized experiences

Traditional marketing is often “calendar-based.” A calendar-based approach works as follows:

1. Set a campaign’s start and end date (“This weekend only, get 20% off pants.”)

2. Define eligibility criteria, or who receives the message (“Our VIP loyalty members are eligible for this exclusive offer.”)

3. Sub-segment within the eligible group (“Women’s yoga pants and jeans, plus men’s dress pants, are on sale.”)

Traditional marketers start with the campaign — its nature, timing, and target audience — and then define the details. Some campaigns are very simple, such as a blanket offer where all jeans are 30% off, whereas some are segmented and somewhat personalized. In an ideal scenario, different business verticals coordinate the marketing calendar towards a harmonized sequence and volume of campaigns across the organization.

Figure 2 — from Calendar-driven to Customer-centric marketing

A data scientist, on the other hand, would approach the marketing calendar as an optimization problem. To solve it, she would choose to maximize an objective function — for example, incremental revenue from sales, net of discounts — by algorithmically selecting the right sequence and set of parameters within a universe of options. Those options include:

· Type of tactic

· Timing and duration of the tactic

· Degree of difficulty of subsequent reward related to the tactic

· Channel(s) used to deliver the tactic

· Creative assets used to promote the tactic

· Verbiage used to sell the tactic

· Etc.

This universe of options easily becomes too vast to be optimized via advanced analytics alone. Indeed, that is the issue most organizations need to solve for: selecting the right sequence and set of parameters.

Finding an approach that solves for such an extremely broad and multidimensional problem is a core challenge. Twenty years ago, we simply would have deemed the problem unsolvable. Today, we potentially have the right toolkit and computing power, but available AI methods still depend on “having seen what good looks like.” And that means we need a considerable amount of data to get to a viable answer, ideally from controlled experiments.

The hunger for experimental data is pervasive. In computer vision, neural networks require thousands of (labeled) cat pictures to accurately recognize another picture of a cat. In pricing, groups of homogenous customers need to be presented with different prices in order to derive the optimal level of price discrimination (creating what’s known as “elasticity curves”). And in marketing campaigns, different journeys have to be deployed in order to figure out which journey will work best going forward. As a result, if an organization has only launched the same A-B-C-D-E sequence of campaigns over time, it will never know if D-A-E-C-B is actually a better option, because it has never seen it before.

So, organizations test, via controlled experiments. The limitations of testing are the same regardless of what’s being tested: time and cost. Going back to our A-B-C-D-E example, it takes time to test the best permutations of five different tactics in the market. Even with weekly campaigns, 120 permutations (5 x 4 x 3 x 2 x 1) would entail months — or more likely, years — of testing, depending on the level of parallelization. There’s also an opportunity cost of not nailing the optimal answer right away. Beyond the sequence, organizations might also want to understand the optimal parameters at a tactical level (e.g., What’s the best version of Tactic A for each customer?), which requires further testing — and with it, more time and more cost.

True AI-enabled optimization is possible, and it already exists in the market. It requires rigor, structure, and persistence, mixed with a hefty dose of simplifying assumptions.

Indeed, the goal is to simplify, and to stratify the learning over time.

The ideal optimization sequence in personalization

If personalized marketing can be treated as an optimization problem, the key is not to solve everything at once. From a marketer’s perspective, the problem is broken down into three steps:

1. Optimize the sequence of tactics over time: Should it be A-B-C-D-E or D-A-E-C-B?

2. Optimize the tactic today: Regardless of what happened in the past, should Jane receive Tactic A or Tactic B today?

3. Optimize within the tactic: Once we’ve selected Tactic A for Jane today, what’s the right combination of reward and difficulty within Tactic A that we need to use?

Our recommendation is to invert the problem-solving order by leaving the sequencing problem for last.

In other words:

1. Determine the right combination of reward and difficulty required to optimize each tactic for each customer/subset of customers.

2. Define the right tactic today, and maybe add interactions of channels or other contextual tactics to the testing strategy over time.

3. Ramp up sequencing tests, starting with small, tenable problems (A-B-C vs. B-A-C).

1) Optimize within the tactic

Let’s assume that Tactic A is a discount for buying a bundle of products. The discount is the reward, while the bundle of products is the difficulty, or the task required to unlock that reward. For example: Buy jeans and tops (the difficulty of the task) and get 5% off (level of reward).

Both the difficulty and the reward in Tactic A could be hyper-personalized, because customers have individual tastes (e.g., Jane reacts to a bundle of jeans and tops, while Amy reacts to a bundle of jeans and jackets) and triggers (e.g., Jane needs a 10% discount, while Amy needs a 20% discount).

Since there are infinite permutations to be tested at the individual level, we simplify, by testing a finite number of reward/difficulty combinations at the segment level, and by hyper-personalizing the so-called “last mile.”

In the world of ML, deep learning, and AI, segmentation is still our friend. Why? Time and cost. Most businesses do not report having more than two interactions a year outside of their top 1% customers. Apparel, luxury, airline, entertainment, and hospitality are all examples of industries where the median interaction is around 1.6 per customer per year. That’s not enough volume to learn and optimize for each individual. But we can learn and optimize across segments, and define — in broad strokes — the best combinations of reward and difficulty.

Optimize the combination of reward and difficulty

An experiment grid breaking down reward and difficulty could look like this:

We segment the population, each segment gets exposure to different variants of a single tactic, and we learn what sticks. The process never ends.

After two or three experiments we understand whether high-spend/high-engagement customers react better to a 10% discount or a 15% discount, and so on. Experience, data analysis, and rigorously designed experiments will help us define

· The right matrix volume: e.g., Should we use 3x3 or 2x2?

· The right axes: e.g., Should we focus on engagement or tenure? Spend or profit?

· The overarching philosophy: e.g., Should we be optimizing for spend, margin, or response?

Personalize the last mile

So, what exactly are we personalizing? We’re personalizing the last mile, or what that the combination of reward and difficulty means to each individual customer.

Last-mile personalization is where ML comes to the rescue. For example, being asked to buy a pair of jeans and a top might be a medium effort for Jane, but a hard effort for Amy, for considerations other than price. The last mile is about translating what a “medium difficulty” bundle of products means for Jane, and for Jane specifically.

ML gives us a leg up by producing a vast range of scores that can inform us of an individual customer’s propensity to:

· Buy a product again

· Buy a category of products again

· Buy a product/category they’ve never purchased before

· Increase/decrease their spend level

· Increase/decrease their engagement (e.g., browsing, clicking, opening emails, etc.)

· Increase/decrease their share of wallet

· Fade or churn

· Etc.

Routinely deployed ML approaches are often ensemble models of decision-tree algorithms that provide a propensity score to answer a binary question (e.g., Will Jane buy product X in the next two weeks?) Each propensity score is (more or less) a probability, so if Jane has a propensity score to buy jeans in the next two weeks of 0.72 (or 72%), we loosely read it as a high probability of her buying jeans in the next two weeks. The higher the propensity score (as close to 1.0 or 100% as possible), the higher our confidence that she will actually buy the product.

From propensity score to personalized product offer

Once we have our propensity scores figured out, an interpretation of the difficulty matrix could look like this:

Figure 4 — from ML scores to bundle details, illustrative only

An easy bundle is comprised of a product with high propensity (more than 70%) and a product with moderate propensity (less than 40%). Since Jane’s product bundle with more than 70% propensity is likely to be different than Amy’s, we are ultimately creating different recommendations within the same spend/engagement segment of the population. Jane’s product bundle is, in other words, personalized.

Putting it all together

The beauty of this approach is that we are experimentally personalizing an incentive in a timely and cost-effective manner. Which means Jane will receive a message that is truly 1:1.

Let’s revisit the logic. Jane is a high-spend, high-engagement customer. We run two consecutive pilot experiments (for consistency of results) on a subset of the population, where all customers like Jane receive either a medium difficulty bundle with a 5% discount or a hard difficulty bundle with a 10% discount. By measuring against a randomly drawn control group, we conclude that the medium difficulty bundle with a 5% discount generates a higher lift. So, we launch a medium difficulty bundle offer for a 5% discount to all customers like Jane. While the discount (reward) is the same for every Jane-lookalike, each actual bundle will be highly personalized. Jane will be asked to buy jeans and tops, whereas Amy will be asked to buy shirts and blouses. Why? Because that’s what ML “propensities” suggest: Jane has a high propensity for jeans and tops; Amy, for shirts and blouses.

So, while segment-level experimentation helps us learn the right reward, ML helps us translate the concept of “difficulty” into actions. For an overview of the entire Personalization Stack, refer to my article here: How to Build a Personalization Stack

Experimentation, some caveats to keep in mind

Simplification also means we are imposing an opinion on what to test, intentionally reducing the universe of options for the sake of time and cost. Other question marks remain: should a “hard difficulty bundle” be more difficult? Should both products in a bundle below a 20% propensity score? And is 10% the right target?

Just like our algorithms, marketers learn over time, and adjust their experiments based on prior results. These are important elements to keep in mind

Experimentation never ends; we can always learn more, add more complexity, reinforce prior findings. For example, we can test three levels of permutations instead of two, add different channels to the mix, add exogenous factors, create parallel campaigns, etc.

Experimentation is not free; we commonly hear that “experimentation is expensive,” but that’s not the right lens, either. Running controlled experiments is the only path to data-driven personalization, the premise being that we’re spending money now in order to get a higher return later. That money is “spent” in two ways:

· Opportunity cost of keeping customers outside of marketing (control groups receive no marketing and tend to spend less)

· Cost of discounts/incentives required to learn the optimal levels

Experimentation needs to produce measurable results; proper experimentation is designed with statistical significance in mind so that it leads to irrefutable answers — a campaign either generates a return, or it does not. If it generates a return, discounts and allowances are not “costs,” but merely investments with an expected multiplier. The typical return on a marketing dollar using personalization is around 2–3x. Not a bad deal.

Since experimentation is measurable, organizations should consider the cost of experimentation as a predictable investment with an expected and predictable return.

Experimentation to predict “lift” can become extremely complex; while ML has become democratized, doing it right is still challenging. Typical model implementations predict behavior with all else being equal. If Jane never waits for a promotion to buy a new arrival product, her propensity describes her typical behavior without a treatment (incentive). But her propensity might change when the discount is introduced. In an ideal world, we would have not propensity models but “lift models” that predict behavior in relation to receiving a particular treatment. For example, if Jane has a 70% propensity to buy jeans, but a 95% propensity to buy jeans when they are on sale, our take on the bundle difficulty might change as we would find the incentives on jeans to be overly cannibalistic for Jane.

ML is as informative as the experiments through which the models learn. While prior purchase history is a great first step to train a model, more sophistication is required over time to build response models that factor in context and incentives.

2) Optimize the tactic

Defining the right combination of reward and difficulty of Tactic A for Jane is the first step in the personalization journey, and would typically take anywhere from four to eight months. The second step is to understand whether today Jane should receive a bundle offer at all. We know the best Tactic A for Jane, but is Tactic A the right treatment for Jane today? Having an answer implies we already know the impact of the other tactics on Jane (i.e., we have experimentally tested Tactics B, C, D, and E and we have good directional insight into what works best).

Depending on the size of the target market, this stage comes nine to 12 months into a personalization program, for a number of reasons:

· It takes time to get an accurate read on experiments, so we often need to sequence them instead of running them in parallel

· The more tactics at our disposal, the longer it takes to cycle through all of them

· We also want to include seasonal patterns, as winter incentives might require a different setup vs. summer incentives, for example

Provided we are comfortable with the quality/quantity of experimental data for all the tactics, there are other, practical considerations beyond the data science realm to be aware of. One is that organizations typically cannot execute different tactics for different customers at the same time. For example, our data shows that today Jane should receive Tactic A and Amy should receive Tactic B, but the marketing execution platform is not organized to launch two distinct campaigns at the same time. Not surprisingly, it takes longer for an organization to update processes and platforms than to learn how to optimize tactic selection via controlled experiments.

Another consideration is that a personalization program should start with a forward-looking diagnosis of all related technology and analytical solutions and processes to ensure that resulting insight are actionable, and not stymied by issues of organizational or platform readiness. Provided the organization, technology, and processes are in place to personalize the timing of executing a tactic at an individual level, the optimization logic is a combination of the following ingredients:

Prior experimental data by segment. For example, ranking the effectiveness of a tactic for all customers that look like Jane. At this stage, we have all the input we need, as we have cycled through all the tactics two or three times and we can rank them by objective function (e.g., lift, response, sales, etc.)

New experimental data on interactions related to a particular tactic: The question we try to answer is whether customers react better or worse to concurrent offers. Marketing departments have several different options at their disposal; they can push certain communication, use incentives for other communication, overlap in-store and online experiences, etc. For example, learning that customers like Jane do not usually respond to push marketing (e.g., for a bundle offer) after being treated with a triggered offer (e.g., a post-purchase incentive that only kicks in after a transaction) might impact our decision of whether to trigger Tactic A or to suppress it altogether because it’s ineffective or irrelevant.

Business rules and a brand-specific marketing philosophy. This is the most impactful element when it comes to selecting a campaign. We obviously don’t want to always send Tactic A to Jane, nor do we want to spam her, forget about her, or send her irrelevant messages. To that end, organizations develop a rich set of rules that typically fall into these four buckets:

· Eligibility

· Anti-repetition

· Frequency and volume

· Discount threshold

When we put it all together, our experience leads us to conclude that simplification is the right approach. Technical hurdles, business guardrails, context, and budget limitations for experimentation suggest that business rules typically dictate 80% of the final sequence. The most common reason is that organizations only have so many tactics at their disposal, so anti-repetition and suppression logic shape most of the answers — sometimes simply cycling existing offers is good enough. More importantly, return on investment is often elsewhere. Completing the first step (reward/difficulty matrix) is paramount; beyond that, if faced with the choice of expanding to a new channel (e.g., from email to app/web), adding a new tactic, or hyper-personalizing tactic selection, the last option will hardly ever provide the best return on investment.

Organizations that have successfully completed the second step have relied on complex experimental frameworks coupled with the development of response models and lift models. While deep learning approaches such as long short-term memory (LSTM) have moved the needle in this space, the real shift is around transitioning from simple propensity scores to conditional propensities that factor in context (e.g., customer-specific information, habits or explicit feedback gleaned over time) and engagement. This is more a DOE (Design of Experiment) exercise than a simple algorithmic upgrade — proven ML techniques paired with properly designed tests is often a compelling first foray in this space.

3) Optimize the sequence of tactics

In the first two steps, we blended learning at the segment level with hyper-personalization at the individual level (the ML layer). We also saw how processes and marketing execution can pose technical challenges. Finally, we discussed that tactic selection at a point in time is mostly about inference on incremental behavior (e.g. treated customers spend more than untreated customers, all else being equal).

Our desired end state is the ability to understand the right sequence of touchpoints — be they incentive-based or not — that maximize Jane’s lifetime value with the brand. Lifetime value is the only factor that we’re looking to maximize over the long term. And it is possible, as techniques are being increasingly solidified in both academia and industry. Again, there’s a need for simplification, but also for a philosophical shift in how we allow our algorithms “to learn.”

The data science background

Traditional ML and DL (deep learning) applications fall under a category of techniques called “supervised learning,” which require “labeled data” to learn over time. But what does that even mean?

It means we are “learning” how to minimize an error, in this case, the error on a prediction. For example, I present a picture of a cat but the model concludes it’s not a cat. We know it’s wrong, and we tell the model so, by providing a number of cat pictures that it can learn from. That’s the “labeled data” — prior information we already identified as correct. The more labeled data we feed a model, the more the model has room to improve and reduce the chance it will make an error the next time it’s presented with a picture of a cat. These techniques work, but have two limitations. They require labeled data, and they require a significant amount of it — more than people realize.

The parallel in marketing is that we have multiple interactions with Jane where we send her (experimentally designed) offers over time; she responds to those offers, either by making a purchase or by ignoring them; and the model keeps learning until it cannot reduce the error rate any further. But we do not have that luxury; statistically, Jane will churn or unsubscribe before the model has even begun to have an opinion. That’s where we introduce a different approach altogether, called Reinforcement Learning (RL).

RL starts with a blanket opinion about the outcome of different options; for example, our A-B-C-D-E tactics. The opinion could be that it assigns each of them a 20% conversion rate, then starts testing what works (i.e., converts) and what doesn’t. The testing is optimized via a sampling mechanism that tries to converge as much as possible towards tactics that garner the highest response. So, it will iteratively try all tactics once, until one tactic — say, Tactic A — delivers a conversion. At that point, it will start to sample Tactic A more often than the other tactics because it empirically knows it to be (more) successful. After iterating several times, it produces a ranking according to which tactics converge often, which tactics converge rarely, and which tactics never converge at all. Add business rules on top and, voilà, you’ve solved the sequencing problem in marketing personalization.

Kind of.

Practical considerations

The point of this article is not to explain RL, but to lay out the business implications of when and how to employ such techniques to maximize the return from personalized marketing.

Optimizing a sequence of actions can be and is solved routinely with traditional ML techniques and proper design of experiments. However, the beauty of RL is that, for the most part, it does not require labeled data, much less a trove of prior data. So why do we even bother with the first step, which does require labeled data for ML? A number of reasons:

· Order of operations. We cannot optimize a sequence until we have optimized the individual components. Step 1 allowed us to create “the best Tactic A for Jane,” so that should precede any decisions around whether Jane should receive Tactic A at all.

· Talent. RL is a relatively novel approach, whereas, from a talent perspective, ML competency is more pervasive. The average data science team — provided it even exists — will often be comfortable developing an ML layer but will have little to no experience with RL techniques.

· Time and cost conundrum. A more technical consideration is that RL still requires a considerable amount of iterations for it to converge to a viable answer. As such, its first applications were developed online, where traffic is high. There are different approaches we can take to mitigate this requirement, such as employing different sampling techniques or reducing the universe of options to optimize for (e.g., optimize some campaigns but not all of them), but the best approach is to use ML to fast-track the initial implementation, which is the irony of the process.

Let’s go back to our example. RL started with a blanket opinion as to which tactic to deploy, assigning 20% to each of the A-B-C-D-E tactics. But we know, through experiments, that Jane responds more to Tactic A, and through ML we know that the right Tactic A is medium difficulty, 5% discount. This experimental knowledge means we can significantly accelerate the initial opinion-generation task by leveraging the learning of our first step. RL will continue to sample and cycle through the other tactics, but will converge much faster than if we had never experimented before. That’s why it’s Step 1 in our analytical roadmap, because we’re better off starting there.

The conclusion is that while it’s true that RL learns over time and adapts to change, we get much faster results by completing Step 1 before Step 3 vs. trying to solve for two problems at once.

Business considerations

AI-driven personalized marketing is already a reality. Companies approaching it for the first time should view it as a very unique journey that requires the following components to be successful:

Discipline. Building intelligence requires a sequential roadmap with limited room for shortcuts because currently cognitive science still assumes “the brain must have seen what good looks like.” So, embrace experimentation and follow the proper order: Optimize within the tactic, then optimize the tactic, then optimize the sequence of tactics.

Patience. You can’t learn multiplication before you’ve learned times-tables, and if learning times-tables takes 12 months, invest the time; bear with it. Personalization programs deliver ongoing improvements but expect a two-to-three-year journey to exploit their full potential.

Cross-functional talent. Marketers conceive the curricula and the tactic, data scientists translate them into experiments and intelligence, developers adjust the channels and construct the journeys, data engineers build/maintain the data pipes, finance instills rigor to fund the journey, and legal sets the boundaries of what’s appropriate. So before seeking the best-known AI guru in Silicon Valley, build a balanced team that can bring together different parts of the organization, with a focus on an aptitude for change management.

The benefits of using advanced analytics to power personalization

Increased engagement, deeper loyalty, and a measurable uptick in return on their marketing efforts are some of the core reasons marketing organizations are employing advanced analytics to power their personalization efforts. With them come additional benefits:

Unique data. Investing in experiments builds a unique, invaluable asset that cannot be found elsewhere on the market. While run-of-the-mill ML is getting democratized (not so the advanced applications), the curated data powering those engines cannot be purchased or borrowed; they can only be created internally.

Ability to fund the journey. Outsize returns materialize in at least 18–24 months, but personalization programs can (and do) lend themselves to quick wins and fast turnaround times. We have empirically measured jumps of 30%-40% in net incremental revenue from hyper-personalizing a subset of existing segmented tactics in 3-month mini-programs.

First-mover advantage. The sooner organizations start using advanced analytics, the sooner they start digging the moat that separates them from the competition.

Indeed, using advanced analytics to drive personalization is still a relatively new phenomenon. But it will become standard operating procedure. Organizations that manage to make it a core part of their marketing efforts, will be able to reap the subsequent rewards.