Trusting AI with important decisions: capabilities and challenges

Artificial Intelligence has become increasingly present in our lives in the form of tools like smartphone apps. It can also be found in high-stakes autonomous systems where it makes decisions that involve the lives of human beings — such as Autonomous Vehicles (e.g. the “Google Car”) — or that involve important amounts of money — such as automated investment systems. AI can increase our productivity and creativity, or it can replace human intervention altogether by making better decisions, both in everyday life and in business. There is strong potential in AI-powered automation, but also important issues to address such as control, morality, and market uptake. Let’s dive in…

Note: this article is based on a talk I first gave on December 9, 2015 at APIdays Global (see slides).

Artifical Intelligence today

In everyday life

If you’re looking for examples of AI in today’s world, look no further than your smartphone. Even if you haven’t installed fancy apps, there is AI built in your phone’s operating system. On iOS for example, you’ll get smart suggestions of applications to use or people to contact, based on the current context.

Hopper predicting the price evolution of PAR-BUE return flights

AI is also used as a differentiating factor in everyday apps such as Google Inbox for emails or Evernote for taking notes. Inbox can automatically suggest reminders by analysing your emails’ content (e.g. when people write to you “Please send me this”), or it can suggest appropriate short responses (e.g. “You’re very welcome” or “Will do”). Evernote’s Context can display notes and web articles that are related to the note you are currently viewing. In other apps, AI is at the core of the app’s value proposition. For instance, Hopper is an app that predicts how flights prices will evolve in time (so you can book at the optimal time).

“More than half of the apps on a typical iPhone screen are predictive applications.” — Lars Trieloff

In business

AI is used to unlock great value from organizations’ data by predicting…

  • customers who may not renew their subscriptions (so we can reduce attrition)
  • products that will be of interest to a customer (so we can better cross-sell products)
  • demand for a given product, at a given price, in a given context (so we can optimise pricing and replenishment)
  • etc.

For that, data is analysed by Machine Learning techniques that build predictive models. In general, these models can also be used to automate all sorts of classification tasks, which turn out to be very common at work: text or visual document tagging, priority filtering, customer support message routing, etc. The models are simply used to predict the class corresponding to given objects to be classified (documents, messages, etc.).

Sometimes, they empower new applications that were not thinkable before, such as sentiment analysis on hundreds of thousands of pieces of text (e.g. tweets mentioning your brand), or social media filtering where tweets are classified into actionable or not.

Katherine Barr of VC-firm MDV predicts that “Pairing human workers with machine learning and automation will transform knowledge work and unleash new levels of human productivity and creativity.”

Types of AI: “weak” vs “strong” — or “tool” vs “high-stakes”

All AIs are not equal. The usual distinction is between “weak” AI (a.k.a. “narrow” or “applied” AI) and “strong” AI (a.k.a. “general” AI): the former can only be specialized on a particular task (e.g. one of the classification tasks mentioned previously) and cannot be used for anything else; the latter is not specialised, and may thus be more similar to our own intelligence. All the AIs that exist today are weak, but there’s a lot of work on creating strong AI. It’s kind of seen by some people in the field as the only noble goal… However, there’s already huge value to be created from weak AI.

Machine Learning is the subfield of AI which is getting the most attention today, due to recent advances, real-world success, and the fact that it feeds on all this Big Data we’ve heard so much about. There is also a lot of hype around Deep Learning, which is itself a subfield of Machine Learning.

Instead of distinguishing AI based on how it’s built and on its “strength”, Prof. Tom Dietterich makes a more pragmatic distinction which is based on the stakes of the decisions we let AI take automatically. We can speak of “tool AI” when these decisions are somewhat inconsequential (think about a spam filter that decides that an incoming email should skip the inbox). When decisions are taken by a lone machine and involve important amounts of money, or even human lives, we can speak of “high-stakes autonomous AI”.

Why bother with giving so much power to machines, you may ask? It turns out that AI can help us make significantly better decisions. One example is Autonomous Vehicles, which could greatly reduce the number of traffic accidents if they become widespread — more on that in a little bit.

AI vs. Human vs. Human+AI

Chess is perhaps the best known example of an area where Artificial Intelligence is superior to Human Intelligence. This is not the case with all board games though: in the Asian game of Go, the number of possible moves at each point in the game is of the order of 10 times that in Chess, and AI is far from the level of Human experts.

Fairly recently, people started playing the game of “Centaur Chess”, which consists in playing Chess with the help of an AI. It turns out that in Chess, Human+AI beats AI alone.

Icons by Filipe de Carvalho — Star Wars: The Flat Awakens

There are cases however where an AI alone is going to take better decisions than a Human + AI. This is what the folks at Blue Yonder discovered when trying to use data to improve on the way that replenishment decisions are taken in the food industry.

Beyond “prescriptive analytics”: automating decisions

Big Data professionals make a distinction between 3 phases of data analysis: descriptive, predictive, and prescriptive. Let’s illustrate with the replenishment use case:

  • In the descriptive phase, we’d want to show past demand for products against a calendar, or to make various plots and create various graphs in order to get insights into how demand evolves based on the context (time, location, etc.).
  • The predictive phase would typically be about learning a model from data, that would predict demand in the next 2–3 days for a given product at a given store at a given time.
  • The prescriptive phase would consist in suggesting how much goods to ship to stores. This should be based on the previous predictions but also on various domain-specific elements such as constraints on order size, truck volume, capacity of people putting stuff into shelves, trade-off between cost of storage and risk of lost sales, etc.
An Amazon warehouse — photo by Drew Kelly for Wired

Blue Yonder showed some improvement on the quality of decisions that were made when using the prescriptions made by the machine. But the biggest improvement came when they took humans out of the loop and turned remaining business rules into computer programs: completely automated decisions (i.e. made by an AI alone) significantly outperformed those made by humans based on the machine’s prescriptions (i.e. made by Human+AI).

This suggests that our cognitive biases can make machines more trustworthy than us humans for certain types of decisions (see “Thinking, Fast and Slow” by Nobel prize-winner Daniel Kahneman for a review of cognitive biases). Besides, AI-powered decisions are faster and cheaper than those made by humans…

Predictive analytics are reaching a nice level of maturity in the industry, and we are now starting to hear about prescriptive analytics. Automating decisions is the next step, and we’re already seeing technology solutions that go in that direction. For instance, goes beyond simple churn analysis and gives the ability to automate the actions to take on customers that are detected as fragile, as it plugs to automation platforms like Intercom (email marketing) and Zapier.

I’m not sure it will make sense to automate all decisions in all domains but we’ll certainly see more of that in the very near future.

AI for Automated Investing

Another area where decisions are strongly biased is investment — see for instance Airbnb’s 7 rejections, which prompted Arlo Gilbert to write about Silicon Valley’s Dirty Secret. Gilbert points out that Silicon Valley venture capitalists have had a big hand in funding startups that are disrupting all types of industries with machine learning-powered automation, but strangely enough, automated decision making in early stage investing itself has remained absent.

“The ability to get funded […] is largely about who you know and how well you present. This is the problem. What is the solution? Automated early stage investing.”

Deep Knowledge Ventures is a company which gained attention in 2014 for having an AI on its board to recommend — or to veto — investment decisions. Its AI can “automate due diligence and use historical data-sets to uncover trends that are not immediately obvious to humans”. This year, Telefonica Open Future (the telco giant’s startup investment arm) is going one step further by replacing the whole board with an AI. The objective is to “reduce the bias that subjectivity and intuition can add to decisions while improving the chances of predicting a venture’s success”. (Note that the definition of success can change depending on who’s funding: future valuation in X years, or number of jobs that will be created, or something else…)

The World’s First Automated Early Stage Investing Platform

These automatic early stage investing decisions will materialise in startup battles hosted at the PAPIs conferences throughout the world, where the jury that selects startups to pitch, and winners, is TOF’s AI. The 1st conference to host an AI Startup Battle will take place this March in Valencia, and world finals will take place in October in Boston.

[Disclosure: I am the General Chair of PAPIs]


Constraining AI —and why high % of accuracy can be dangerous

“If a machine is expected to be infallible, it cannot also be intelligent” — Alan Turing

As we’ve just seen, decisions taken by an AI are based on predictions. For the decisions to be good, the predictions have to be somewhat accurate. It is tempting to think that a system which is accurate 95% of the time, or even 99%, is a good system. For “tool AI” this would probably be the case. For “high-stakes AI”, this requires more thinking: what could happen when the predictions are wrong?

It may be a little extreme, but consider this example: a system analyses images to understand what they represent, and is used by robot police officers on images captured by their camera-eyes. Actually, such a system has been built at Stanford and works great on most of the images thrown at it. It only has a few quirks:

“a young boy is holding a baseball bat.”

Imagine that instead it would say “a young boy is holding a weapon”. How do you think the robot-cop would use this piece of information?

Ed 209 in Robocop

When thinking about automating important decisions and giving high-stakes autonomy to machines, we should pay particular attention to constraining their behaviour by defining what is desired, but also what is acceptable and what is not acceptable. This is what the Three Laws of Robotics of science-fiction writer Isaac Asimov do:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey the orders given it by human beings, except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

By the way, do you know what would be a simple answer to ending poverty and cancer? Ending humans. Fortunately, this is forbidden by the First Law, which specifies what behaviour is not acceptable, while the Second and Third Laws specify what behaviour is desired.

Besides, when important decisions are to be made by an AI, one could implement mechanisms that ask for human intervention when unusual situations are detected, or when the computed uncertainty in predictions/decisions is above a certain threshold.

AI in Autonomous Vehicles: morality vs. market uptake

Obviously, morality comes into play in AI decisions where human lives are at stake, as is the case with Autonomous Vehicles. These vehicles have been shown to be super safe and it is expected that wide adoption would significantly reduce the number of traffic accidents. But the first AVs to appear on the streets won’t be putting an immediate end to accidents, and one can imagine that AVs could be involved in some fatal ones (even if few). When that happens — or rather a few (milli)seconds before that happens — the AI driving the car will have to make a decision, which will be predicted as resulting in the loss of life. This can be the life of a pedestrian, or the life of the car’s owner/passengers.

A variant of the trolley problem with Autonomous Vehicles (taken from Bonnefon et al.)

A recent study by Bonnefon et al. showed strong consensus among people on a very simple principle: minimising the loss of life. One may argue that things are not that simple since we should also account for the age of individuals at risk in an accident (e.g. baby vs. older person) and for the probabilities of survival; but let’s put that aside for a moment. The study also showed something else: “People are in favour of cars that sacrifice the occupant to save other lives — as long they don’t have to drive one themselves.” (Why Self-Driving Cars Must Be Programmed to Kill)

AVs can bring huge benefits to society as they can save thousands of lives, but for this technology to reach its full potential and save the most lives, it needs to be widely adopted. Which is only going to happen if people want to buy AVs, and so far, it’s not that clear that they will. According to Bonnefon et al., “The public is much more likely to go along with a scenario that aligns with their own views”. The authors’ work consists in applying experimental ethics to find out what this scenario is.

Figuring out what people will tolerate

AI for scaling transportation infrastructures

Tranquilien predicts how crowded trains will be in the Paris suburbs

I think we will see similar issues in other AI use cases where value can be created by having AI make decisions for us, but is conditional to adoption. One example I have in mind is going to work, back home, or on holidays at the least busy times. Today, there are several predictive apps that give you predictions on how busy transports will be at any given time in the near future.

It seems to be fine for now, because users of these apps are a very small percentage of all users of transportation systems. But how would these apps use the power of AI if they had a much bigger user base? They shouldn’t just show “raw” predictions since everyone would react to these predictions in more or less the same way. Maybe they should instead tell each user what he or she should do — but would that be acceptable to them?

AI for Human Resources

Not sure how much you can trust this guy’s decisions…

You could have an AI decide who gets fired or hired or promoted in a company, in a way that’s beneficial to it. But for this to be viable it should also consider benefits and detriments at the individual level. Experimental ethics may also be applied in HR to figure out what employees are in favour of and what they deem to be “fair”.

As with any other technology, AI in itself isn’t good or bad. We’re just beginning to go past our initial, vague apprehensions, and to look concretely into how AI can benefit society while ensuring that it doesn’t hurt us. You can probably extend the examples and considerations above to any domain where decisions need to be made on how to allocate scarce or valuable resources. The promise is to improve our lives by better allocating these resources. The core technology is already here. We just have to figure out how to use it…

Thanks to Lars Trieloff, Jean-François Bonnefon and Francisco Martin for very insightful discussions.