The Artificial Intelligence Black Box Problem & Ethics

Exploring the best kept secret in the artificial intelligence community.

Recently, a friend of mine was writing a piece on the ethics of AI and questioned me about the black box problem in artificial intelligence. Realizing there isn’t a lot of literature on it, I am going to try and explain this lesser known problem in artificial intelligence.

The Black Box problem arises from the way we train our artificial intelligence systems. Most (if not all) successful AI systems are trained using back-propagation. The basics of continuous back-propagation were derived by Henry J. Kellery* in 1960.


Within the context of Artificial Intelligence, back-propagation is a method of convex optimization for a continuous function called the loss function. The loss function can vary, but is generally the difference between the base-truth output vector and the AI output. It represents how “wrong” the AI is for any prediction. By computing the gradient we can “back-propagate” the error and make the AI “smarter”. This also introduces the constraint that all parts of the neural network must be differentiable. Learn more about Backpropagation

This leads to finding the best parameters (weight matrices in the neural networks) for which a function (here the loss function) is minimized.

Once back-propagation is done, the AI is “trained”, meaning that its inside matrices are fined tuned to perform a task. For example: recognizing objects in an image, translating text, doing predictive analysis…

A trained Artificial Intelligence

The Black Box Problem

One problem arises from back-propagation: we cannot explain what the values inside the matrices actually represent.

We know the black box works by measuring accuracy on a test data, but we can’t explain how it works. This is the reason why artificial intelligence don’t explain their reasoning. We literally have no idea how they do what they do.

Moreover, for the same task, the matrices are different at every training. This uncertainty makes debugging AIs extremely difficult. Thus, monitoring the progress of an AI learning over time is impossible, effectively leading to highly biased and “evil” AIs such as Tay.

Impact in Research

Until we figure this out, Artificial Intelligence research will be slowed down. It is like trying to make micro-processors without knowing how electricity works.

Some researchers have tried other ways of learning tasks like Genetic Algorithms (more on this later) and Learning to learn by gradient descent by gradient descent which advocates for task-specific trainers. While this doesn’t solve the black box problem, it tries to contain it.


One might think that the black box problem makes monitoring the “evilness” of an AI impossible. However, conceptually, humans also suffer from the Black Box problem. Our trillions of synapses and billions of Neurons renders any objective analysis of our brains’ structures impossible. Even if we could, the chaotic firing rate of our Neurons makes long-term predictions impossible.

Nevertheless, we try and approximate this analysis by heuristic methods in domains such as Psychology. Effectively, this is the only way we have to minimize the doubt in measuring an individual’s “evilness”.

Recently during a talk on the Ethics of AI, a sentence grabbed my attention: “You don’t need to run an analysis on the synapses of an accountant to know if he is trustworthy, just look at the numbers.”. This is the same idea we should apply to AIs. The world is nondeterministic, all we can do is increase the chances of certain outcomes.