No Silver Bullet: Managing Enterprise AI

Published in

the integrate.ai blog

6 min readApr 10, 2018

That’s me sharing some lessons learned from a project applying AI in an auditing firm.

Last week I had the privilege of giving the opening keynote at Rational AI: Demystifying Machine Intelligence in the Enterprise, hosted at the MaRS Discovery District in Toronto. Many thanks to my friends Steve O’Neil, Anna Tyndale, and Krista Jones for the invite! The event was designed to shift the conversation about AI from the what to the how and the so what. My fellow keynoter, the hilarious, punchy, and eloquent Adam Drake, the participants on the startup panel, and I all shared war stories of what it takes to make AI worthwhile in the enterprise.

The slides are here. Here are a few key insights:

The core cultural challenge enterprises face in applying AI is that they are not accustomed to working with probabilistic systems, developing processes geared to making decisions under uncertainty, and acting on probabilistic outputs. To illustrate how tricky this is, I started my talk with one of my favourite anecdotes: Obama’s frustration in interpreting various advisors’ estimative probability of the likelihood that bin Laden was indeed in Abbottabad, Pakistan in 2011.

In May, 2016, the CIA had a totally bizarre tweet series narrating bin Laden’s capture as if it were happening on that day.

In their excellent paper, Jeffrey Friedman and Richard Zeckhauser show how, when presented with confidence rates of bin Laden’s location ranging from 30% to 95%, “ President Obama reportedly complained that the discussion offered ‘not more certainty but more confusion’, and that his advisors were offering ‘probabilities that disguised uncertainty as opposed to actually providing you with more useful information’”. The authors go on to argue that it’s only useful to provide a range of uncertainty around some prediction if it’s possible to give context around what that uncertainty signifies and how it could be impacted or changed with additional information. This gets at the heart of why we care about explainability and interpretability in machine learning systems: it’s not that we need to understand why some output was produced, it’s not that we mistake correlation for cause (although we do that all the time), and it’s certainly not good enough to say that human decision makers retrospectively rationalize what was effectively brute intuition anyway and therefore it’s silly to hold software systems to different standards (because risk management teams can’t base pragmatic decisions on rhetoric). What matters is the ability to enable a rational decision, to align the use of a system to an intended outcome or designed policy. In Obama’s case, the stakes of not acting when he had the chance we so high that he decided to take a chance even though his confidence that bin Laden was indeed in Abbottabad was no stronger than a coin toss. He got lucky.

While enterprise executives can’t build their machine learning strategy on luck, they can recognize that many of the processes built for the deterministic software paradigm need to be tweaked to answer questions raised by AI.

How can you scope an initial experiment and business opportunity to ensure you don’t waste too much time and money if a prototype model doesn’t work? (For more, see my colleague Megan Anderson’s excellent posts.)

Machine learning systems require way more than just algorithms. It’s useful to break the process into pieces to manage the risk of any component not delivering impact.

Do the subject matter experts in our business process require a subjective sense of certainty in carrying out their task? If so, you will face resistance in adoption of a machine learning model that outputs probabilities.

Does your data set have the structure required to solve a machine learning problem? Most enterprises have what the data science community calls “imbalanced class” problems: tons of examples of one thing you want to classify and very few examples of the other thing. Consider optimizing conversions in a marketing funnel: tons of people come to a website and browse and very few people actually buy. That means, data scientists will have very few examples of the patterns of behaviour that align with the desired outcome. They have to do tricks to manage these classes. Data is never ideal.

Often the best technical solution to a business problem doesn’t require machine learning, but discerning structure that can lead to efficiencies using rules or trees

Does your model actually need to be explainable? There are debates afoot in the AI community about the need for system explainability and transparency. While explainability is key for certain use cases (e.g., whether to grant a consumer a loan or even how long a jail sentence should be to reduce recidivism risk), in other use cases it’s not as important. Often a machine learning model or product is only one little part of a much larger business process. For example, we help our partner Kanetix increase conversion rates for prospective customers using machine learning. We don’t power the underwriting models that lead to insurance quote prices. The second part of the overall process is much more sensitive, and would require greater algorithmic accountability than our use case. It’s important to map out the entire process and understand just what you’re predicting.

What is your proxy metric actually measuring and optimizing for, and what risks does that pose? Yonathan Zunger has a must-read article about asking the right questions about AI. It includes an illuminating analysis of the essence of what went wrong with the reputed COMPAS algorithm: “But what the model was being trained for wasn’t what the model was being used for. It was trained to answer, ‘who is more likely to be convicted,’ and then asked ‘who is more likely to commit a crime,’ without anyone paying attention to the fact that these are two entirely different questions.”

Yonathan Zunger shows how the devil lies in the details, and why it’s crucially important to know what questions proxy metrics help answer

There is always a gap between the qualitative, strategic outcome a business wants to achieve and the quantitative metrics we design machine learning systems to optimize for. Many of the risks in enterprise AI lie in the fault line between the complex world and the relatively simple proxies we use to model it (even with the richer representations of data afforded by deep learning and the automated machine learning approaches we discussed in our latest In Context podcast). Managers need clarity on what this gap amounts to to manage the risks.

There’s a lot more to say on this topic and few great resources available to frame a pragmatic, outcomes-oriented approach to managing the risks and rewards of applied AI projects. I’ll share further ideas on May 1 at the O’Reilly AI Conference in New York. And stay tuned for the release of our trustworthy AI framework!

Kathryn Hume is integrate.ai’s Vice President Product & Strategy. As the former Director of Sales and Marketing at Fast Forward Labs (Cloudera), Kathryn helped Fortune 500 companies accelerate their machine learning and data science capabilities. Prior to that, she was a Principal Consultant in Intapp’s Risk Practice, focused on data privacy, security, and compliance. She has given lectures and taught courses on the intersections of technology, ethics, law, and society at Harvard Business School, Stanford, the MIT Media Lab, and the University of Calgary Faculty of Law.

No Silver Bullet: Managing Enterprise AI

Written by integrate.ai