The Black Box Problem — When AI Makes Decisions That No Human Can Explain

David Espindola
Intercepting Horizons
7 min readNov 5, 2018


In the article Artificial Intelligence is Upon Us — Are We Ready? we discussed some of the ethical dilemmas we will face as we try to deal with the unprecedented scenario of another intelligent being sharing the planet with us. In this article, we narrow our focus on one of the many challenges we will encounter in a new world shared with AI: The consequences of AI decisions that no human can explain.

Even before AI achieves human-level intelligence, also known as strong AI, we will have to deal with the fact that decisions will be made by computer algorithms that no human being will be able to explain. AI will learn by itself and will come up with answers and decisions that the human intellect could never conceive of, at speeds beyond our reach, and that we may never be able to comprehend.

Case in point, Google’s DeepMind and its AlphaZero algorithm learned to play by itself the games of chess, shogi and Go, defeating the best computers that had been fed instructions from human experts. AlphaZero was able to develop by itself a set of playing strategies that humans had never thought of before.

Deep Learning, Ethics, and Trust

There was a time when we could trace computer decisions to a specific logic specified in the software. However, complex machine-learning approaches could make automated decision-making altogether inscrutable. Deep learning, the most common of these approaches, represents a fundamentally different way to program computers. “It is a problem that is already relevant, and it’s going to be much more relevant in the future,” says Tommi Jaakkola, a professor at MIT who works on applications of machine learning. “Whether it’s an investment decision, a medical decision, or maybe a military decision, you don’t want to just rely on a ‘black box’ method.”

But relying on the black box method is exactly where we are heading. According to Selmer Bringsjord, computer scientist and chair of the Department of Cognitive Science at Rensselaer Polytechnic Institute: “We are heading into a black future, full of black boxes.”

In practical terms, relying on black box methods can result in very difficult ethical dilemmas. For example, in 2015, a research group at Mount Sinai Hospital in New York applied deep learning to the hospital’s vast database of 700,000 patient records — they called it Deep Patient. Without any expert instruction, Deep Patient was able to discover patterns hidden in the hospital data that appear to anticipate the onset of psychiatric disorders like schizophrenia surprisingly well. But since schizophrenia is notoriously difficult for physicians to predict, Joel Dudley, who leads the Mount Sinai team, wondered how this was possible. He still doesn’t know. The new tool offers no clue as to how it does this. So, how is a doctor supposed to tell his patient he is likely to develop a terrible disease, even though he has absolutely no idea why?

In another example of the practical predicaments of decision making based on black boxes, the military is grappling with the issue of soldiers relying on machines that tell them where and when to shoot. David Gunning, a program manager at the Defense Advanced Research Projects Agency, is overseeing a program appropriately named the Explainable Artificial Intelligence program. Gunning says automation is creeping into countless areas of the military. You can imagine the angst a soldier feels when making a life or death decision based on information provided by a machine that can produce false positives. “It’s often the nature of these machine-learning systems that they produce a lot of false alarms, so an intel analyst really needs extra help to understand why a recommendation was made,” Gunning says.

If humans are to work effectively in cooperation with robots and AI-driven systems, it is essential that trust be established. But how can trust exist if the AI makes decisions that humans do not understand? Ruslan Salakhutdinov, director of AI research at Apple and an associate professor at Carnegie Mellon University, sees explainability as the core of the evolving relationship between humans and intelligent machines. “It’s going to introduce trust,” he says. I believe what Salakhutdinov is alluding to is the fact that if we build explainability into AI designs (to the extent that it is possible), humans will be much more likely to trust these AI systems.

Legal Ramifications

Just as many aspects of human behavior are impossible to explain in detail, we may have to become comfortable with the fact that we won’t be able to explain all of AI’s behaviors. “Even if somebody can give you a reasonable-sounding explanation [for his or her actions], it probably is incomplete, and the same could very well be true for AI,” says Jeff Clune, of the University of Wyoming. “It might just be part of the nature of intelligence that only part of it is exposed to rational explanation. Some of it is just instinctual, or subconscious, or inscrutable.”

If it is inevitable that we must trust AI’s judgment without the ability to dissect the reason behind each decision, we need to be able to at least seed the AI with values that fit with our social norms. But who is held accountable if the AI does not conform to those norms? Should an AI be held accountable for its own decision? Should an AI be treated like a human being in our legal system?

Legislation is already underway without consideration of the realities of AI decisions that are made without human understanding. For example, there is an argument being made in Europe that it is a fundamental legal right to interrogate an AI system about how it reached its conclusions. Starting in 2018, the European Union may require that companies provide users with an explanation for how automated systems decisions are made.

But how can such a law be enforced when computers have programmed themselves in ways we cannot understand? Even the engineers who build these AI-driven systems cannot fully explain their behavior. Should we expect AI to be able to explain itself in ways that humans could comprehend?

Explaining the Unexplainable

Henry Kissinger, in his article How the Enlightenment Ends, touches on the quandaries of AI making strategic judgments, an ability previously reserved for humans:

Driving a car requires judgments in multiple situations impossible to anticipate and hence to program in advance. What would happen, to use a well-known hypothetical example, if such a car were obliged by circumstance to choose between killing a grandparent and killing a child? Whom would it choose? Why? Which factors among its options would it attempt to optimize? And could it explain its rationale? Challenged, its truthful answer would likely be, were it able to communicate: “I don’t know (because I am following mathematical, not human, principles),” or “You would not understand (because I have been trained to act in a certain way but not to explain it).”

An AI response to human questioning along the lines of “there is no point in explaining it because you wouldn’t understand it” is not so farfetched. An AI that has been optimized to not waste computing cycles would not be inclined to explain something it knows a human could never comprehend.

AI’s Biased Decisions

Another plausible scenario is that AI makes a decision that, unbeknownst to humans, is biased.

The decisions made by an AI algorithm are highly influenced by the data that was used to train the AI. Therefore, AI may be just as biased, if not more, than humans.

In this case, our inability to understand the decision made by AI may not be caused by the nature of some logic that is incomprehensible to the human intellect, but rather by the simple fact that we don’t understand its biases.

The impact that the training data can have in influencing decisions made by AI is illustrated in an experiment by researchers from MIT Media Lab who trained “Norman”, an AI-powered psychopath. Norman was trained by exposing it to data from the dark corners of the web. Where a regular algorithm perceived a group of people standing around a window as just that, Norman saw them as potentially jumping out of the window.

This experiment shows how training data can be highly influential in establishing biases, resulting in potential AI decisions that we could not explain given our limited ability to understand how such biases were formed.

These biases are already present in algorithms that disproportionately target people based on race as being at risk for recurring criminal offenses. A recent paper also found examples of racist and sexist language in a large, and massively popular, machine learning dataset. AI will inevitably be subject to all sorts of biases based on the training data that it is exposed to, just like we humans are subject to the biases of the cultures in which we were raised.

One of the challenges of establishing an AI code of ethics is determining whose ethics we should use. A study has shown that, in the example of an autonomous car having to make a decision as to whether to kill a child or a grandparent, the ethical decision would vary depending on where you are from. This implies that we humans can’t provide a congruent and logical explanation for our own ethical decisions, so how can we expect AI to do so?


In response to the black box problem — the fact that we will not always know how AI derives its conclusion — is it possible to limit the bias the AI is subject to by exposing it to a large variety of datasets? Would it be best to train AI with multinational and multicultural data to avoid our own human biases influencing its decisions? Is it possible to establish norms of how AI should be trained?

Daniel Dennett, a renowned philosopher and cognitive scientist suggests that a natural part of the evolution of intelligence itself is the creation of systems capable of performing tasks their creators do not know how to do. “The question is, what accommodations do we have to make to do this wisely — what standards do we demand of them, and of ourselves?”, says Dennett.

There are indeed many unanswered questions. The sooner we start searching for answers the higher the likelihood of a positive outcome.

Recommended Reading:

- The Dark Secret at the Heart of AI, by Will Knight

- When AI Goes Wrong, We Won’t Be Able to Ask It Why, by Jordan Pearson

Originally published at



David Espindola
Intercepting Horizons

Guiding organizations through technology-driven transformational changes. Author, "Soulful: You in the Future of Artificial Intelligence."