Dispelling the Predictive Myth

Joshua Neckes
Simon Data
Published in
4 min readOct 4, 2016

The intuition and understanding of an expert marketer are still the most valuable tools in the marketing “shed”. In the world of modern marketing tools, the right way forward is to enable marketers to validate their intuitions more quickly, not replace that intuition with algorithms. Only you know your market and your customers; your data doesn’t know anything.

The SaaS market is rapidly filling up with black-box predictive products that promise to replace marketers with algorithms. These solutions purport to “automagically” detect the segments of your customers who are most in need of contact, and deploy the ideal remediation to get those customers to come back, spend more, and tell their friends. Unfortunately, it’s not quite that simple.

The hype around data science and predictive marketing is just that: hype. If you’ve never heard a vendor pitch the “predictive dream”, as we’ve taken to calling it, it’s certainly something special. According to these “predictive dream” vendors, the right algorithm (often the one they’re selling, coincidentally enough) just sucks in all of your data, automatically boosting customer LTV while you sit back and relax.

Unfortunately, this predictive dream is only an illusion. Customer data today is simply too large, too complex, too multi-faceted, and too domain-specific for one-size-fits all algorithms to meaningfully interpret.

Let’s talk about why that is.

Today’s marketers have access to significant customer-level data. Thanks to advances in data capture and transformation, this data is rich, annotated with a significant amount of metadata that provides meaningful, domain-specific context.

For example, a customer’s order in your database might be tagged not just with the items bought and price of the purchase, but also the subjective labels you put on those products, the version of your application that was running when they made the purchase, and the coupons they used to make the purchase.

This data is also diverse, often stored in — and generated by — at least a few different sources, and more than likely encoded in several different schemas. For example, fields your web analytics software captures about a given customer’s session are distinct from the customer attributes stored in your CRM database, which are themselves separate from the attributes needed by the addressee list in your ESP software.

Richness and diversity are both good qualities in data — emphatically so. Rich data lets you ask interesting questions, while diverse forms of data allow you to extend your inquiries into every aspect of your business.

However, when it comes to deploying predictive solutions, it is precisely your data’s richness and diversity that make off-the-shelf algorithms ineffective.

Let’s take an example to understand why. A common off-the-shelf predictive offering is “churn prediction” — specifically answering the question, “which of my customers are most likely to churn?”.

A black-box predictive solution to this problem would typically ingest a basic set of standardized variables and spit out cohorts of “likely to churn customers”. These variables are often limited (e.g. total revenue, channel of acquisition). They must be formatted in a highly specific way, and they account for none of the richness and diversity of your data. They are unable to reason or ask questions about many of the nuances of your data that would meaningfully influence their output.

And so off-the-shelf predictive fails not because it asks the wrong questions, but because it is literally incapable of asking the correct questions. What specific interactions with our site, given our product, within the context of our distinct content types and tagging, will prevent churn? More importantly, which of our thousands of emails, ads, SMS, and push notifications should be deployed to prevent that churn? And which other interventions that we’re running in parallel could be confounding that experience?

This is all entirely domain-specific, and off-the-shelf predictive accounts for absolutely none of it. So, at the end of the day, because your data is rich and diverse, uniquely derived from your unique business, it has additional variables and confounding elements that these algorithms don’t understand. These variables render the outcomes of prefabricated predictive models directionally muddled at best, and damagingly inaccurate at worst.

The truth is that the most reliable way to improve outcomes at your business is much simpler than deploying off-the-shelf predictive tech, and it’s already core to every good marketer’s skillset: just ask those very specific questions of your data. Who is likely to churn, given all of the unique facets of your business? What noise may exist in your evaluation, based on other elements unique to your business (e.g. seasonality, competing offers)? Which of your campaigns improve metrics in those segments, and which ones don’t? Experiment, evaluate, iterate. Experiment, evaluate, iterate. And then, of course, automate.

Nothing can or will replace your intuition about which experiments are interesting and which campaigns have a shot at engaging your customers — and nothing needs to. For most marketers who haven’t yet tested, iterated, and automated at scale, getting a predictive algorithm to take a #bigdata guess at which campaigns are going to be effective is just the wrong problem to be solving. The only investment most savvy businesses need is in making it easier for marketers to ask those questions and get clear answers. And this — in comparison to off-the-shelf predictive — is a technical challenge worth investing in.

--

--