AI in the right places: A framework for powering data analytics products
Earlier this year, artificial intelligence yielded a practical insight: people like to drink coffee in the morning, so workplaces should find efficient ways to serve coffee. That raised a question that’s surprisingly deep — and can cost serious money to ignore:
Is AI actually necessary for this problem? is a question that remains largely unasked in Silicon Valley today. — @modestproposal1
We think it’s worth asking. To be sure, modern data products owe a lot of their success to artificial intelligence. Well-considered AI unlocks entirely new types of data-driven insights and cuts the time and money needed for manual data analysis. But ill-considered AI can fail — expensively. Time to market can be long and uncertain. Accuracy can fall well short of unrealistic expectations. End users may not find any special value in the final product.
That’s why, in this article, we take a step back to remember that deploying AI-based techniques is a choice — and we present a framework to help product feature teams decide when to use AI. After all, there are times when artificial intelligence makes sense, times when human intelligence makes sense, and times when a combination makes sense.
We are founders, early engineers, and product leaders from a startup that used AI-based technologies to create an award-winning product, be acquired by the industry leader in our space, and go on to lead much of the AI product roadmap for a multi-billion dollar company. Our story shows how each of those scenarios calls for a different balance of AI and human expertise, and that most importantly, AI is great — when used in the right places.
But first: What AI really is (and isn’t)
It’s easy to embrace AI because it feels magical, or reject it because it feels mysterious. It is neither. A reasoned choice begins with the fact that, at the end of the day, AI is just the result of a machine following a set of human-written instructions. Here’s an example from our domain, legal information extraction. The instructions for the machine can be very simple rules, like:
If “motion to dismiss” and “denied” occur in the same sentence, classify the motion to dismiss as denied.
Or the instructions could be more complex, like:
Compute the probability that “denied” really is modifying “motion to dismiss” and not some other motion in the sentence, given the number of intervening words and the patterns seen in a few thousand pre-labeled training examples.
No matter the mathematical complexity or degree of abstraction, these algorithms — from simple decision trees to deep learning — are all ultimately designed by humans. And like humans, they can suffer vulnerabilities like replicating human biases. Understanding these limitations plus the set of things that machines do really well is the basis of a well-informed choice to use — or not use — AI to power your product. But as we will now explain, technology is only part of the story. It is the combination of user value, business value, and technological feasibility that together determines the right places for AI. (We are indebted to Marty Cagan and his big risks framework for inspiring our thinking here.) By thinking through each of these factors before investing in AI, you can make a well-informed decision as to whether your product, problem, or solution is indeed the right place for AI.
A tale of three products
By way of introducing our framework, let us tell the story of three data analytics products we’ve built, each with circumstances leading to a different, reasoned choice between AI and human intelligence. We’ve consistently found that the best choice is at the confluence of our three criteria of user value, business value, and technical feasibility.
Our new startup: AI path
In 2012, we founded a startup called Ravel Law. Our vision was to discover and share data that show how American law really works — which nuggets of language are empirically most likely to persuade judges, which courts are statistically most likely to grant certain motion types — by digitizing and parsing all 40 million pages of written American caselaw for the first time. As a lean new startup, we needed to get through those 40 million pages and to market quickly. We calculated that it would take 788 person-years to manually annotate each motion outcome in the entire corpus — and realized that it was a systematic enough task that an AI-based solution might handle it well. Equally as important, our initial user base of tech-forward law firms understood the fact that occasional incorrectly-detected outcomes don’t much diminish the value of these new, large-scale motion statistics. For Ravel, then, the AI path had all three key reasons for being: it was technically feasible for the systematic language tagging we needed, accurate enough to create value for our users, and ready quickly enough to start generating revenue for the business.
Our mass-market product: Hybrid path
Fast-forward five years: Ravel was acquired by the large, well-established leader in legal research, LexisNexis. Our mandate was to deliver our Ravel analytics platform to the far wider user base of LexisNexis in a new product called Context. This was a new challenge. Rather than primarily serving the enthusiastic technologists who tended to sign up for Ravel, we would be serving a mass market, with higher expectations of our accuracy and less opportunity for high-touch user training on correctly interpreting probabilistic, AI-generated analytics. (Unintended interpretations or modes of use are a real risk with mass-market AI: despite Google’s warning that users should not replace human interpreters with its machine translation tool, users do just that.) Given these constraints, we predicted that our Context users would balk at the slightest error in our product, unlikely to care whether it was committed by machine or human. Doubling down on our pure AI solution for motion extraction seemed like a fool’s errand under these circumstances, given the reality of irreducible modeling error and the diminishing returns usually seen in prolonged machine learning work. Instead, we devised a human-in-the-loop system: the vast majority of the data continued to be processed automatically, keeping the project timely and within budget, but edge cases — complex or unusual sentences for which our algorithm had low confidence — were automatically handed off to a small team of human experts. Our hybrid solution optimized the three criteria of technical feasibility (the technique works, and yields high accuracy), user value (high accuracy satisfies mass-market users), and business value (costs are low because AI handles the bulk of the workload). The right strategy was not to plunge headfirst into AI, but to leverage the parts of it that worked best.
Our smallest, trickiest dataset: Human curation path
We’ve gone on to launch many more analytics features in our Context product at LexisNexis. One of them is an expert witness module that reports how often an expert’s testimony has been excluded from court, and why: poor qualifications, improper methodology, lack of relevance, or procedural issues. These reasons are not listed explicitly in our source data, so we must infer them from the highly varied prose of judicial opinions (being careful not to confuse them with litigants’ proposed reasons, which are woven throughout the same documents). These are subtle inferences for a machine learning system to make, especially when training on a small dataset. Moreover, when we computed how long it would take to annotate the entire dataset manually, we discovered that it would take a mere eight person-years — or a team of eight people for one year. Considering such a relatively small expense along with the uncertainty of solving this problem well with AI, we realized that in this case AI didn’t have much reason for being. We hired a small team of experts as a surer way to cover this small, specialized dataset, and we redeployed our AI talent to the next, better-suited project — and ultimately produced more value across the business by not building AI where it wasn’t warranted.
Putting AI in the right places: Where to go from here
As we’ve seen across our experiences at Ravel and LexisNexis, knowing whether AI was the right strategy came down to user value, business value, and feasibility.
To be clear, this by no means suggests that either software developers, engineers, or lawyers should not use AI in their products — it just means we should all be thoughtful about the applications and use cases and make sure we are using it when in the right place! For an end user, skipping on tools that are thoughtfully designed using AI is potentially just as bad as not using the technology as an entrepreneur/developer — assuming it is being utilized for the right reasons in the right places, the gains for the user, in terms of cost, and power can be considerable.
To that end, we’ve designed a set of questions to help product feature teams think about different strategies — AI, human, or hybrid — and their consequences for our three criteria of user value, business value, and feasibility. We encourage teams to be able to answer many of these questions for each data strategy under consideration for their project, comparing and contrasting the answers for each strategy before investing in one. This exercise will require answers from members of the whole team, including product managers, user researchers, designers, engineers, and data scientists. (If you’re not sure whether your project needs a data scientist, this is a great way to find out! We encourage you to reach out to us or to data scientists in your organization to consult on this exercise before investing in a strategy.)
We’ve found that this exercise gives teams a clearer idea of what they are getting into with each potential data strategy. Teams emerge with more evidence about whether AI has a good reason for being — and a clear understanding of how their chosen data strategy adds to the value of each new product they build. We think that, even as technology hype cycles begin and end, this is the kind of knowledge that will remain a foundation of successful, well-considered data products for years to come.
Appendix: When is AI the right solution? Questions for product feature teams
We encourage product feature teams to be able to answer many of the following questions for each data strategy under consideration, comparing and contrasting the answers for AI, human curation, and combinations in between. We present three sets of questions, focusing in turn on user value, business value, and feasibility.
User value: Would users get anything special out of an AI solution?
First, make sure that the problem you’re proposing to solve for users is something they actually care about — and then ask: Would AI produce something that’s more valuable to users than humans could? We imagine four ways that it could: volume, velocity, variables, and virtuous cycles. Large data volumes and rapid velocity of new data to be processed seem to give an obvious edge to AI: for many tasks, machines can reduce centuries of manual data processing to minutes. The less obvious, but equally important question is: do enormous volumes of data really create special value for your user? This has a lot to do with the form of the insights you’re extracting: are there interesting variables that your solution — human or AI — uniquely surfaces for users (in the way that Ravel’s AI revealed judges’ motion-granting rates for the first time)? Finally, does your solution create any virtuous cycles that add user value as the product grows, such as machine learning models that get better with additional data and user feedback? If your answer to any of these questions is yes, you may have a good reason to use AI.
But beware the flip side of the user value question: Would AI produce something that’s less valuable to users than humans could? AI can cover large volumes of data but sometimes with lower accuracy than humans: do your users see value in that tradeoff? How likely are users to spot errors in the first place, given the design of the product and the nature of the questions it answers? And what are the stakes of getting something wrong? For applications that demand high accuracy or that make objectively falsifiable claims (e.g. that a judge denied a specific motion), going with the less reliable solution — be it AI or human curation — magnifies that risk to user value. And behind products making subjective, unfalsifiable judgments (e.g. recommendations) there lurks a more insidious risk: replicating human biases under a veneer of algorithmic impartiality. All of these considerations dictate the kind of user value (or harm) your product will deliver.
Questions to assess user value
- The problem. Who are your users and what’s the fundamental problem are you solving for them? What can they start or stop doing, or do better with this new information? Describe, explain, predict, decide, control? What’s the impact of solving this problem?
- Volume. Are you creating value through the sheer scale of data you are working with?
- Velocity. Are you creating value by keeping pace with new data?
- Variables (data model). Is your data model itself — the set of variables and how they are represented and connected — creating value? What is its relationship to the real world? Does it model data in familiar ways or new ways? What value do users get from having the world broken down this way? Does it leverage new paradigms, like combining multiple datasets or revealing entirely new kinds of insights?
- Virtuous cycles. Does your strategy get better as it grows: does it take advantage of scale to train more accurate models? Does it leverage a growing user base to learn and deliver more customized insights, or create network effects? Are you learning from errors that users spot?
- Mental model. Does your user get how they’re supposed to use this? What does your user think your analytics product is trying to do: describe, explain, predict, recommend? Make representations about the world, just the available data, or nothing at all? How will you train your users, your sales force, your marketing team? Is it important that your user understand how the analytics were produced? Should you warn your user of any human biases that might have affected model training? Are there any unintended uses or interpretations, and how will you deal with them?
- Product accuracy. How accurate or relevant is the information you deliver, on the metric(s) that users are likely to judge by? Have any human biases crept into your algorithm? What are the stakes of getting something wrong? What is the theoretical upper bound on accuracy, and if it is not 100%, how likely are users to appreciate that fact? What error rate will they tolerate? How will you handle error reports? Is there an alternative metric, such as some downstream success metric, that is worth emphasizing?
- Falsifiability. Does your product make claims about objective facts, exposing you to the risk of being objectively wrong? Conversely, does it make subjective judgments, potentially replicating human biases under a veneer of algorithmic impartiality?
- Alternative methodology. Is there an obvious alternative way to the same (or comparable) information that your product delivers, exposing you to risk of being perceived to be wrong?
- Answer key visibility. If your product is making claims about objective facts, are the correct (or perceived-as-correct) answers easily visible for your user to “score” your product, either within your design itself or somewhere in the user’s ecosystem?
Business value: What is your data strategy’s competitive edge?
Choice of data strategy can provide business value, by giving a crucial competitive edge in the marketplace. We don’t just mean that AI sells — peddling AI is not enough, especially as consumers increasingly scrutinize its value add and ask whether the AI emperor is wearing any clothes. But there is a compelling set of new moats that can give AI products staying power — and a clear reason for being. For instance, an algorithm can learn about a user’s preferences and give tailored content recommendations, effectively increasing its value over time and raising the cost of switching to a competitor product — and thus protecting your business. It’s hard to imagine achieving this effect at scale via human curation alone. On the other hand, some new moats actually favor substantial human curation: a highly specialized, richly annotated training dataset might power a product that a competitor could never replicate even with the fanciest machine learning. More generally, the more deeply a data strategy depends on outsourced intelligence (including data annotations and machine-learning algorithms), the shallower the moat.
At Ravel and LexisNexis, this motivates us to focus heavily on in-house data enrichment and machine learning, and on creating new value from difficult-to-imitate combinations of data — such as a company analytics product that jointly draws insight from the law, news, and finance. Consider whether your data strategy — AI or human curation — creates business value in these ways or others, such as creating synergies across the business or tactically deploying AI developers to the projects across the business where they will have the highest impact.
Questions to assess business value
- Moats. What “moats” does your strategy dig? Combining datasets in ways that are hard for competitors to imitate? Replacing or creating new value-added workflows? Individual user-tailored algorithms? Switching costs? Data network effects? Does your training data require experts and how hard is it to source?
- Synergies. Will your AI solution power other systems across the business? Will your hand-curated data be useful elsewhere in the business?
- Strategy over time. Does your strategy become more or less valuable over time, given the rest of the business?
- Opportunity cost. What would your data scientist be working on if you went fully manual? What would your annotators be working on if you went fully automatic?
- AI sizzle. Does “AI” generate buzz and sales, or do buyers not particularly care what’s under the hood?
Feasibility: Do we have a team that can really build this?
Lastly, along with user value and business value of various data strategies, there is the critical feasibility question. Among other things, if human curation is too slow or expensive, then AI might have a reason for being; if the dataset is too small or intricate for machines to learn from, then human curation may be needed.
Two critical factors often get less attention than they deserve. First, even after the most extensive due diligence on AI technical feasibility, AI comes with substantially more uncertainty in time, money, and accuracy than human curation. Machine learning models take time to build, give probabilities as output, and don’t promise high accuracy on their first iteration — or, for some scenarios, ever. Contrast that to a steady, human-curated march to fully annotating the entire dataset, where we can linearly predict how much sooner the job will be done with each new hire to the annotation team.
This leads to our second critical point: the kinds of uncertainty involved in building AI need to be managed with well-grounded data science talent — and without the right data scientist, AI paths are off the table. Take any of the feasibility questions we’ve listed in the box below, like How much data (and time and money…) will we need to train this model? The most responsible approach for most of these questions is to use a probabilistic technique that gives a probabilistic answer. In our experience, this is a good data scientist’s wheelhouse: managing uncertainty effectively, by formulating and testing hypotheses, precisely communicating risk, and leading productive discussion of alternatives with non-experts. Just as important, given the kinds of problems our data scientists work on, they have runway and latitude to build feasibility prototypes, iterate on hard problems, and propose product design changes based on these learnings.
The bottom line: AI is not feasible without good access to highly-skilled data science talent and the ability to manage substantial uncertainty.
Questions to assess feasibility
- Time to market. Can you afford an upfront investment of time to try to train a successful machine learning model? Would you be safer with the slow, linear progress of human curation?
- Monetary cost. How much would you pay your annotation team, including supervisors? How much would you pay to use a machine learning compute platform, both for initial training and ongoing inference?
- Training requirements. How much training data will you need for a successful algorithm? How many annotators will you need to onboard and train, either to produce training data or curate the entire dataset?
- Data model. Is your data model a good fit for the manual or algorithmic solution you choose? Can it be populated easily, consistently, and coherently?
- System upkeep and extensibility. How will you keep models tuned or annotators current, add functionality, debug, and integration-test? Who will be responsible for ongoing devops work on the annotation tool or machine learning platform?
- Qualified and empowered talent. If you are building AI, is your data scientist a full collaborator on the product vision and strategy? Are they invested and empowered to propose and critically assess various algorithmic solutions? Do they have the communication skills and statistical background to devise appropriate metrics, formulate and test hypotheses, quantify and precisely communicate uncertainty, and lead productive discussion with non-experts of various algorithmic alternatives? Do they have latitude to build feasibility prototypes, iterate on hard problems, and propose product design changes based on these learnings?