Machine Learning, Trust, and the Whole Transparency Thing

Don’t think transparency will solve all your AI trust issues, we need more complex AI systems. — by Olaf T.A. Janssen and Gerard Schouten

31 min readApr 23, 2020

The recent cry for transparent and explainable AI originates not only from our curious nature and a sense of fairness but also because we fear losing control over what we once thought we could control (algorithms) and the need to blame someone for mistakes. Machine learning algorithms are inherently opaque. Not only because of intentional corporate secrecy, but more importantly owing to the technical illiteracy of most people and the dismal human mental capabilities we all have to deal with. We try to mitigate these issues using laws (GDPR) for privacy and fairness, deploying auditing systems, creating better algorithms and defenses against attackers, and also by reverse engineering human understandable explanations for machine learning decisions.
These solutions miss two things. Deep learning algorithms are complex dynamic systems that by their very nature are prone to be misled by attackers, exhibit catastrophic errors and are impossible to predict, limiting the usefulness of current strategies (giving us a false sense of security). In addition to improving current strategies, we should embed AIs in even more complex systems where adaptive fail-safes, redundancies and checks and balances are in place to stabilize most catastrophic machine learning failures and where we involve humans. This caters much more to our human sense of trust than cold explanations alone.
Modern forms of democracy, the economic system and banks, and our relatively crime-less society are evidence that humans trust complex institutions even though they do not fully understand them and are aware that politicians may creatively bend the truth, bankers may be grabbers and humans may be deceitful. Once the fail-safes of these institutions crumble and trust disappears, the results are disastrous. Sadly, we see this in recent development in some modern democracies and our response to the recent Corona pandemic. With AI-assisted institutions, we can do better.

Introduction

We inhabit a society where we not only encounter so-branded smart thermostats but also smart toothbrushes, umbrellas, forks, and egg trays (examples here). After being notified of our dental state by our toothbrush, we turn to our smartphone and read updates from our friends that social media platforms decided interested us. When we venture outside, we are sharing the road with more-and-more autonomous vehicles that try to learn their way around traffic. Unaware, your online resume is scraped and you are matched to a job that according to some machine fits who you are.

To be fair, most serious applications are developed with the best intentions. Cities and governmental bodies may look to enhance the safety of citizens by predicting crime or plan urban facilities (Goldsmith, 2014; Sisson, 2018). Science tries to discover patterns in a myriad of collected experimental data to unravel new mechanisms (CERN; Kleinberg, 2016). Banks may determine whether you are eligible for a loan (Jia, 2018) so that both you and the bank don’t get into financial problems. The police may decide where it would be best to patrol the streets. Insurance companies may want to determine a personal monthly rate on your behavior. Self-driving cars should make decisions that will ensure the safety of its passengers. The military should have control over its weapons systems to prevent collateral damage and to prevent inadvertently going to war.

Despite the best intentions, it is far from surprising that these developments elicit a whole range of well and lesser substantiated critical reactions. Some responses, fueled by fear, are about AIs taking over our jobs or worse: human extinction in an event ominously called “the singularity”. Some reactions understandably focus on privacy concerns. It does not sit well that companies eavesdrop in on our actions and profit from them. Somehow, we should be in control of how machines get to see us. The most neutral reactions are those that question the accurateness or validity of these algorithms, wonder what reasoning these algorithms use, and who is to blame if things go bad? Is the decision fair or discriminatory? With what data was the model trained? Can it be hacked?

What does not acquiesce the turmoil are statements about machine learning algorithms operating like black boxes that conceal their inner workings. Or that marketers and self-acclaimed innovators try to brand anything as AI for a whole range of ‘intelligence’ levels. This makes it hard for us to differentiate the admirable technology in complex systems such as a self-driving car and the less sophisticated AI in a smart fork that warns us we are eating too fast. To make matters worse, deep fakes and similar experiments show the average human that they should no longer trust their senses about what is real and what is fake. And most machine learning models can be attacked in such manner that an attacker can produce any desired outcome.

A lot of recent developments therefore trigger very basal feelings of fear, anxiety, of being out of control, lack of trust and diminishing of autonomy. Being in such a state of mind often results in tunnel vision and simplified world views about how to deal with this new state of the world, which is reflected in the extreme opinions of media personalities and pundits. This has been the case for all technological breakthroughs, but for AI it seems even more prominent. Nonetheless, around the world serious initiatives are undertaken in response to the cries for more fairness, accuracy, confidentiality and transparency (FACT) (Dwork et al, 2011; VWData).

It all starts with our beliefs

The remainder of this text explores what concepts such as fairness, transparency and trust mean in the context of machine learning applications.

Dictionary definitions teach us that something opaque is literally something that you cannot see through and figuratively is something hard to understand. Conversely, something transparent is free from deceit, readily understood and accessible. This means that trust and privacy play a role. Therefore, we see fairness, accuracy, confidentiality as aspects of this broad definition of transparency.

In the above we have acknowledged that some underlying human desires are:

a general curiosity of how the world and systems work and can be improved
fear of losing control over our algorithms and a sense of control over our world
a sense of justice; we like to be treated fairly (to our definition of fairness)
the need for accountability, we have a desire to be able to blame someone or something (preferably not ourselves)

In modern, connected societies these desires are almost treated as human rights, taking for granted the complex systems in place that try to keep us safe from injustice, illness, ill-fortune and natural disasters. In remote areas of the world, where nature rules, people are more accepting of the fact that they can die alone from a heart attack because a hospital is out of reach. Our fake sense of control becomes painstakingly clear when you see part of the Western world react too late and in disbelief to a force such as the recent Corona pandemic. Something we will address towards the end of this article.

Section 1 discusses three fundamental bumps on the road towards transparency. Section 2 focuses more on the deceit aspect relating to vulnerability to attacks. Subsequently, in Section 3 the focus shifts to how humans make decisions and experience trust. The section both links human and machine decision making and the relation between man and machine, a necessary background for trying to come up with solutions to make machine learning algorithms more explainable and trusted. In Section 4, existing and future solutions are described and compared. In Section 5, we question whether the current strategies are enough by exploring the nature of complex system dynamics, leaving our ideas for further research in Section 6.

1. What is the opacity issue?

Before making a case for more transparency of opaque machine learning systems, we should first establish what it exactly means for an algorithm to be opaque. We discuss three types of opacity (Burrell, 2016) that each have their origin and that consequently require different strategies if more transparency is wanted.

Intentional secrecy

The first origin of opacity is that of intentional secrecy. It can be corporate secrecy: a company trying to have a competitive advantage using their proprietary models and algorithms or state secrecy for national intelligence agencies. Training and tweaking the hyperparameters, usage of exotic (non-common) cost functions, of a good model takes effort and money and others should not be able to take advantage of this prior work by stealing the model. Also, the model is trained with painstakingly obtained data. The training data is usually also kept secret because it is the basis for training the model. It is also often private user data that should be protected under privacy laws. However, the data could be severely biased. If the training set is intentionally kept secret this is hard to judge and generalizations might be hard to trust.

We will briefly describe the different ways machine learning systems can be attacked in a later section. For now, it suffices to say that as a general rule of thumb intentional secrecy is fallible and we should always assume that to an attacker all parts of the system are known (Ateniese et al., 2013).

Technical illiteracy

The second type of opacity rises from the technical illiteracy of the public. A verdict by a judge can be understood by his explanation and verified by consulting related laws and legal precedents. Ambiguities and biases of the judge can be discussed. Reading legal documents is already quite a high threshold for most people and familiarity with legal jargon and procedures is required. If those same laws and rules are coded in programming source code, there is an even smaller group of people that would be able to understand the decision that is made, because fewer of people that are well versed in law can read source code.

For complex systems, even programmers will not immediately be able to understand the full decision process. Statistical machine learning models and algorithms are written in source code but the algorithms themselves require a certain level of mathematical understanding to appreciate the results. Also, they require a feeling about the limitations of the algorithm and when results can be trusted or not. In general, the person who is being affected by a machine-learned decision does not have the technical ability to understand how the decision was formed, even if there was no intentional secrecy with classifier model, training data, and algorithm source code made public. We are in a familiar position of needing to trust a system that we cannot understand technically. There is enough trust in cars, trains and airplanes, and software that we use them without understanding their exact functioning. However, a lack of understanding can easily feed conspiracy theory thinking about the designers of the product. In some cases we can event attribute magical or omniscient properties to systems of which the boundaries are not understood.

While making the public more literate and aware of the systems currently in operation is a good initiative, it can never be expected that all people will understand all systems completely. It is attainable and feasible to have a global understanding of such systems, similar to how we have a simplified mental model of how cars function and how they can be operated safely.

Insufficient mental capacity

The third type of opacity of machine learning systems arises from a fundamental characteristic of these algorithms. Often the decision model depends on a complex interplay between the input variables that humans just cannot fully grasp in a mental model, no matter their education or training. Also, the way machine learning algorithms treat data and relations does not match human-style and scale of reasoning and semantic interpretation. We can easily understand the decision to supply a loan based on the height of a person’s income. We can imagine how a factor of risk of still having that income in a few years might weigh in. These factors are monotonous and more or less linear, meaning: a higher income is always better. In practical machine learning algorithms, factors may not be monotonous and linear and might be impacted by tens or hundreds of additional factors. Such a model goes beyond our human understanding, even if the decision is otherwise made fully transparent to us.

To mitigate this type of opacity, the computer system should be able to reduce its decision-making process into bite-size steps and with concepts that humans can understand. For a good overview of such interpretable models read Hall and Gill, 2019. We also touch them in Section 4.

2. What does it mean for a learning machine to be attacked?

The three reasons of the previous section are fundamental in the sense that we cannot solve them, only be mindful of them. In practice, the transparency issue deals mainly with trust in the workings of the system. Let us first assume that machine learning models are created with the best intentions and trained with the best data, giving very accurate results. Can we still trust them? How do we know whether a machine learning algorithm is tampered with and hacked? Before turning to real-world examples, it is interesting to observe in what manner a machine learning system can be attacked.

A first type of attack, or intrusion of a machine learning system is one that targets the computer system itself. One can try to break into a system and steal knowledge about the system, ideally by obtaining the classifier model, the algorithm source code and the training data. Another way to attack a system is to cause, for instance, an overload so that the system can no longer adequately perform. Such system-stress attacks are not specific to machine learning so we will not further discuss them. Instead, we focus on two types of attacks specific to machine learning systems, which are exploratory and causative attacks (Barreno, at al., 2006).

Exploratory attacks

In exploratory attacks, the attacker tries to gain knowledge of a system by exploring how a system responds to given input without trying to change the system itself. If for a spam filter an attacker can test whether his message is considered spam by the system, he can continuously alter his message until, finally, he gets his spam message passed as regular mail.

A targeted attack would be one where a specific input, a single spam message in the last example, would pass the spam filter. An indiscriminate attack would be one where the attacker finds that a particular combination of words or headers in his message will ensure that he can pass the spam filter with any message of his choosing.

Causative attacks

In causative attacks, the attacker tries to alter the system by influencing the training data. Suppose an attacker wants to bypass a spam filter with a spam message, he can flood the system with emails that are regular emails but that resemble the desired spam message. If the system uses this data to train the spam filter, it may consider the spam message to resemble the slur of regular mails and the attacker has thus bypassed the spam filter. Conversely, the attacker can try to disrupt the spam filter by sending spam that resembles existing real messages to where the system will learn to treat regular mail as spam. Again, this can only occur if the model is trained continuously or if at least the model is retrained regularly and includes the attacker’s data.

Systems that learn continuously, so-called online learners, are more susceptible to causative attacks. Usually, systems are only trained once, and then only updated regularly with a selected training set, so that causative attacks are harder.

Examples of adversarial interventions

Most recent examples of adversarial interventions let the machine learner recognize something utterly different from a human observing the same input. Most of these are based on exploratory attacks.

Some attacks are so-called black-box attacks, so the attacker does not know the system other than the interface available to the end-user. For so-called white-box attacks, the attacker has full knowledge of the machine learning system. White-box attacks are easier, and the level of trickery is higher than black-box attacks. Tutorials to easily create an adversarial network are readily available (Athalye, 2017) for anyone to try.

If a machine learning system can be probed, even as a black-box system, a classifier can be trained based using the output of the attacked system as input. It is shown that in this way, an attacker can learn about the training data of the original system (Ateniese et al., 2013). For instance, it was found that a particular voice recognition system was trained for 95% with English-speaking people with an Indian accent. More specific queries could be possible where it could be deduced whether the data of an individual is part of the training set or not, breaking differential privacy rules.

Voice-activated devices and voice assistants such as Apple’s Siri, Google Home, and Amazon Echo, are now commonplace. It is possible to hide voice commands in sounds to activate such systems, where a human does not recognize the voice command, but the device does (Song, 2017). Demonstrations can be found here.

Deep neural networks are successfully used in image recognition and classification learning systems. Often a system returns a series of labels of possibly detected objects including its accuracy. Attackers can use this information to create a gradient: how does a small perturbation of the input image affect the labeling and the accuracy of this labeling. In another way, the attacker can use a guide image that the learner should detect instead of the original input image. The attacker now has to solve an optimization problem, in which he continuously perturbs the image into the direction of a different label than the correct one. While many optimization algorithms can be applied with mixed success, it is possible to create images that with a minimal perturbation undetectable by humans, are consistently labeled by a learning system as a different object (Goodfellow, 2015). Because of the similarity between deep neural networks and classifiers with the same task, an adversarial example can sometimes fool even systems for which the example was not designed. This means that an adversarial image created to fool one neural network, will often also misclassify on other networks.

As an example of how dangerous this might be, research has shown that by small alterations (stickering) of road signs a STOP road sign could be interpreted as a speed limit road sign (Evtimov et al., 2017). But more recent examples are plenty, fooling face recognition software into thinking you are someone else and faking entire videos (deep fakes) that show people saying things they have not said with gestures they have not gestured in locations they have never been.

3. Human and machine decision making

Bad actors may hack machine learning systems and this has a detrimental effect on the trust we have in such systems. But trust is about much more than that, so we should dive a bit into human-machine interaction.

Machine learning is part of the larger field of Artificial Intelligence (AI) that in its broadest definition deals with mimicking human intelligence using a machine. This means that while current AI techniques do not accurately simulate human intelligence, it may be illuminating to draw parallels between human and machine decision making. Also, the decisions of AI have an impact on humans, sometimes directly in human-machine interaction and sometimes indirectly.

Truth and decision-making in humans

We demand machine learning algorithms make decisions given a particular situation parametrized as a set of input parameters. We also demand that the decisions be accurate, or more in human terms be truthful.

We can discern three interpretations of truth: factual, logical, and ethical (Hume, 1739). In correspondence theory, a statement is true if external reality supports it. Factual truths rely on historical or sensory data and depend on the world we live in. Instead, logical truth is true in all possible worlds. Using logical reasoning, new logical truths can be deduced. Yet other truths are neither factual nor logical, but are considered being true in a socially constructed manner. These social or ethical truths are widely accepted within a culture but are essentially freely chosen. Examples of these are many and range from strictly local or cultural rules (traffic regulations and signs, table manners) to highly personal or ethical rules (religious doctrines, utilitarian ethics).

In our decision-making process, we make use of all interpretations of truths; we balance factual knowledge and previous experiences with logical reasoning and are sensitive to our ethical framework. Our brains have a fast and intuitive system of making decisions based on how our brain is wired and trained by experience. But we also have a slower, rational system of making decisions that we can train by learning reasoning skills. Because this second system is slower and requires more energy, many decisions we take are made by the faster intuitive system that is based on factual truths and less by logical truths that can be uncovered by the slower rational system (Kahneman, 2011). Moral (or ethical) truths can be part of both systems. As social and culturally constructed rules, they are embedded in the factual experiences of the world. At the same time, we can use moral truths as ground truths in reasoning about new problems.

More often than we would like to admit, our intuitive system makes us pick the wrong choice from a rational perspective. Such deviating ways of thinking are called cognitive biases. We overestimate small probabilities, prefer statements that align with our present beliefs and so on. Other intuitive systems in our brain are also susceptible to being fooled, such as our visual system by optical illusions, although this occurs rarely in everyday life. More often, our logical thinking is flawed. Such errors in logical thinking are called logical fallacies from rhetorics. These errors occur from both lack of training in logic and interference of our logical thinking by the intuitive system.

Untangling arguments in public discourse becomes interesting. Imagine two people agreeing based on entirely different grounds: one using magical (irrational) thinking and the other dismissing rational arguments on moral grounds. While agreeing with each other, both disagree with a third person who appears to be the only one free from biases and logical fallacies but supports an ethically deplorable position. Which of the three would you trust?

Because we don’t have the time, energy, or will-power to make a balanced decision we may simply decide all politicians cannot be trusted, or we support one party no matter the discussion. Who we trust is then not fully determined on standpoints but other aspects that we will discuss later on.

Truth and decision-making in AIs

AI systems can be interpreted along similar lines considering the different forms of truth and levels of decision making. In the last ten years, machine learning classifiers are trained using ever more amounts of data. While the training of the classifier can take a long time, once it is trained it can almost instantly predict from new input data. These statistical algorithms are based on factual data. The choice of training data determines the truths it uncovers about the world. An AI system based on factual data can make accurate predictions on new input data that is not part of the training set as long as it originates in the same world as the training data. When the training data is biased, the predictions will be too.

Unfairness in algorithmic decisions can often directly be traced back to an imbalance or bias in personal data–for instance concerning gender or ethnicity–that are fed into these algorithms. Consider the news headline of October 10th, 2018: “Amazon scraps secret AI recruiting tool that showed bias against women”. The algorithm appeared to put women at a disadvantage compared to men. It was not rating candidates for software developer jobs and other technical posts in a gender-neutral way. The outrage was profound. Understandable, but also short-sighted, because discrimination in the Amazon case primarily showed Amazon HR’s prejudices against women. A fun read related to this is Hannah Fry’s Hello World: How to be Human in the Age of the Machine.

In summary, these machine learning algorithms learn from historical and factual data. They are characterized by slow learning, but fast decision making, and they are prone to biases. We can draw a parallel with our fast human intuitive decision-making process based on factual truths and proneness to cognitive biases.

During the AI winter of the 1970ies and 1980ies, much research has been conducted in AIs that try to analyze natural languages, so-called natural language processing. Such symbolic AIs use algorithms and programming languages such as Prolog and Lisp that are based on formal logic. Resulting systems are, for instance, expert systems that mimic the decision-making process of a human domain-expert. It does this by representing the knowledge of the human expert in a knowledge base and then applying a relatively slow inference engine to deduce new facts or rules based on the existing knowledge. Because knowledge bases had to be filled by human experts, expert systems that are trained by data instead have taken their place. (But often, in supervised learning, labels associated with data have to be provided by experts as well. This is costly and time-consuming.) Initiatives such as the semantic web or Google knowledge graph, which try to create a relational base of all knowledge, may eventually bring back logical inference AIs.

In summary, these algorithms use a fixed knowledge-base to deduce new logical truths using an inference engine. It is relatively slow and may fall to logical fallacies when the knowledge base or its relations are ill-defined or incomplete. Drawing a parallel to our slower, rational decision-making system is easy.

Moral truths are not an integrated part of AIs. Some moral truths are trained implicitly by the (morally biased) training data and may also be encoded in the knowledge base of symbolic AIs. Until, if ever, AIs are developed that can learn to be ethically acceptable citizens in our society, we can resort to manually applying a set of base moral rules on AIs as boundary conditions. Most famous are the Three Laws of Robotics devised by Isaac Asimov. More recently, others suggest that moral AI is feasible Pearl, J., 2018.

What is our reason for drawing parallels between human and machine decision making? It clarifies that there are multiple ways to represent knowledge, and in practical cases, there is no best way to represent knowledge. Just as we use multiple kinds of knowledge to solve commonsense problems, we should also seriously consider using multiple kinds of knowledge representation for machine decision making.

How can AIs induce trust in humans?

“Trust involves the juxtaposition of people’s loftiest hopes and aspirations with their deepest worries and fears.” (Simpson, 2007). This quote tells that trust involves intimate interpersonal relations that balance expectations with the risk and fear of betrayal of this intimate relationship. Fear and uncertainty are often seen as the opposite of trust. We do not trust people for which it is uncertain that they will say or do the right thing, nor do we trust people consistent in doing the wrong thing.

Simpson describes a two-fold model of trust. It describes that both parties in a trust relationship should be aware of each other’s disposition toward working together. To enter a trust relationship, there should be a willingness from both parties. Then they can engage in trust diagnostic situations. In trust-situations, both parties make decisions that are mutually beneficial and in this way build more and more trust, while decisions that only benefit one party will decrease trust. In stress-strain situations, trust can be lost or built when one party sacrifices, or refuses to sacrifice, his gain to benefit the other. The exact dynamics depend of course on the disposition of each partner, and attachment issues.

Statistical machine learning methods will make mistakes. Often, mistakes may not be a big problem for an end user, such as a misplaced recommendation in a movie recommendation engine. For other applications, accuracy is more critical. Either way, if an algorithm is consistently making mistakes, it will erode the trust of the end user (Stumpf et al., 2009). This erosion of trust can be mitigated by involving the end user in the entire decision-making process. Indirectly, the accuracy of the system will also improve and help maintain trust in the system.

Enhanced user interaction with a decision-making algorithm is key to building trust. This interaction can be broken down into three steps. First, the machine learning algorithm should be able to explain the decision it has made. Second, the user should be able to give feedback about the decision. Third, the system should be able to use the feedback to improve the system.

Implementing these steps is not trivial. Explaining the decisions of a machine learning algorithm is not trivial. Even decision trees, which allow for a clear insight into the decision-making process, are not easily presented to end-users. Opacity because of technical illiteracy and our limited mental capacity is a real concern here. Explanations by the machine learner must be usable and comprehensible by the end user. It should also be represented in such a way that the end user can give meaningful feedback and show where in the decision-making process a mistake was made (according to the user). Finally, the machine learning system should be able to translate the user feedback into trainable new information. This is also not trivial since the domain language of the end user may differ from that of the domain expert that trained the model and the training data it is used to train.

4. Strategies for increasing transparency

More accuracy

An engineer’s first response to increasing transparency and trust in machine learning models is simply to improve the algorithms to even better capture hidden patterns or generalize from a dataset. Our inquisitive nature fuels the quest for different and better algorithms almost automatically. Sadly, increasing accuracy and increasing the amount of data does not fully address the trust issue as explained in the previous section.

Fight fakes

Around the same time that the existence of adversarial attacks was discovered, defenses have been proposed. Hiding training data or the algorithm is not enough. One can try to detect when an adversarial network is probing its system, or provide misleading feedback. In the last few years, many defenses have been devised particularly for deep neural networks, ranging from blurring input images to remove the harmful perturbations in adversarial examples to using Principal Component Analysis to detect images that are not natural, or by looking at the internal structure of the networks. All these defenses were bypassed (Carlini & Wagner, 2017) showing that while looking for better algorithmic defenses, we should also consider other options at the same time. An arm’s race follows which has been going on for many years for spam filters. We see this now in efforts to fight fake news, although in practice such efforts still rely on human fact-checkers.

Remove biases and ensure privacy

A more straightforward task than staying ahead of bad actors is devising ways to decrease bias and discrimination in our machine learning models. While humans are susceptible to biases (and sadly often downplay that they are), computers can be programmed to detect and then mitigate biases in datasets. A good effort is done in the IBM AI Fairness 360 Open Source Toolkit that brings together 70 fairness metrics and multiple bias mitigation algorithms. In addition, privacy is a battleground fought with, for instance, the GDPR and ideas such as the decentralized web (Solid).

Explain decisions

The hardest route towards more transparency is finding out how a model decides by letting it explain itself. While in the previous solutions we can unleash the power of math and brute computational force, now we have to deal with the technical illiteracy and limited mental capabilities of humans.

First, we discern two different target audiences. They are the experts that have created or are auditing the quality of a specific algorithm and end users that are affected by decisions of the algorithm. The experts have enough technical knowledge to understand the working of the algorithm, and the quality of the algorithm can be assessed by statistics or abstract visual representations. Trusting the quality of a model will be a more rational decision based on these parameters. End-users cannot be assumed to have the same technical know-how and for them to understand the model a translation has to be made that befits their level of understanding. Also, trust to an end-user is based more on the interaction with the algorithm as described earlier. Models are easier to understand when they are linear and monotonic, because they can be translated into simple rule-based decisions.

Second, we can differentiate between understanding and trusting isolated decisions made by the machine learning model and understanding and trusting the model as a whole. This is often referred to as local interpretability and global interpretability, respectively. While experts will often consider global interpretability, by checking if the model as a whole seems sound, end-users will mainly be concerned with the decisions that affect them and demand local interpretability.

Third, some strategies are model agnostic and can be used independently of the underlying machine learning algorithm, while others are specific for an algorithm or type of input or output.

To test if a model gives sound and trustable output, experts can look at many parameters (Hall, 2017). They can look, for instance, at a 2D representation of the data using dimension reduction (PCA, MDS, t-SNE, autoencoder, SOM) to see if categories are smoothly distributed. They can look at partial dependence plots to see whether the response of the system changes smoothly for specific parameter changes. They can check whether the difference between predicted and recorded values (the residuals) is randomly distributed so that there are no hidden relations left in the model. They can do sensitivity analyses and check for basic measures to check for over- or underfitting. They can also check whether their algorithm can withstand documented forms of adversarial attacks.

Some machine learning methods are more straightforward to interpret than others. If possible, a decision tree or Naïve Bayes model should be used because it allows for direct interpretation. When problems become too complicated and need to be solved with Deep Neural Networks (DNNs), such solutions do not exist. The internal representation of the neurons often bears no contextual resemblance to features understandable by humans. One solution is to fit the model (locally) with another model that is easier to interpret such as a decision tree. This interpretable model will often be less accurate and only offers an alternative view of how the DNN can also be interpreted in cases where both models agree, not how the DNN actually interprets the input data to reach a particular decision.

Locally, however, this may be enough. Small parts of the input space around a given input can be represented by a simpler model. A decision tree or linear regression may work fine locally to explain an individual decision. For end-users, proper visual encoding and representation should be chosen so they can easily understand the local decision. A decision tree is already hard to read for a typical end-user. One can simplify the tree, called pruning the tree, or find smart other ways of representing the data. A particular model that can locally create interpretable insight into machine learning decisions is LIME (Ribeiro, 2016), and is compared to other methods (Budzik, 2018). While it chooses a simple list of keywords to explain an e-mail categorization decision, for complex image labeling LIME shows which pixels contribute to a particular label. This gives the user a direct insight into whether the algorithm bases its decision on the right data. For instance, users were shown that a model labeled an image containing a wolf (instead of a husky) because of background snow pixels and not inherent wolf features. The users decided the model could not be trusted to make correct decisions, even though the accuracy for the training set was very high.

In such local explanations, global understanding or trust can be achieved by taking a sample of representative inputs. However, for models with a large input space or concern less intuitive features than pictures of animals, this only gives limited insight.

5. Dealing with complex systems

So far, the different aspects of transparency, trust, and corresponding strategies have been discussed. Good progress is being made everywhere.

However, there is one elephant in the room that has so-far eluded the discussion and may cast a different perspective on the issue of transparency. Many machine learning systems and deep learning systems can be classified as complex systems. Here, we do not mean complex in the colloquial sense of being hard to understand but as defined in formal complex system theory. Such a complex system comprises interdependent small elements that are adaptive and have non-linear responses. We know that a deep learning neural network of small artificial neurons can be used for classification and prediction. But this is an emergent property of the network that is not obvious from the elements by themselves.

Complex systems share a few other properties that are worth noting. They are very stable for the known context, in our case, the trained dataset which makes them generalize well and so successful in modern AI systems. But their non-linear dynamical behavior allows them to be brought off balance into a very unstable regime. This is what adversarial networks exploit to obtain any desirable outcome.

Even worse, complex systems are known to produce extreme events. Often we expect a normal distribution of events, where outliers far away from the average can be neglected. Complex systems follow a power law instead, which means that extreme outliers can still occur. A strange and unforeseen combination of circumstances can cause a self-driving AI to make a fatal mistake. When such events seem impossible beforehand, they are sometimes called Black Swans. These Black Swans remain even if you try to exert more control on the system by increasing the amount of data or tweaking the hyper-parameters to improve accuracy (What are black swan events?)

This means that a machine learning system alone can never be 100% trusted to not make any severe mistakes or be immune to hacking. Efforts to prove the contrary can only lead to a false sense of security.

But hopefully, our efforts to better explain machine learning decisions are not in vain? A fundamental property of complex systems is their unpredictability. To understand and predict its behavior, you must compute every step of the system, which is basically what you do when you infer a classification from a deep learning model (Mok, 2017). You can, luckily, understand its general behavior by deducing basic laws. This is what physicists have done in describing gases in the laws of thermodynamics or complex flow dynamics in the Navier-Stokes equations. And while even these explaining models can become increasingly elaborate, they will only ever describe the system partially where behavior is idealized and predictable. They will be of limited value when the system classifies or predicts in unexpected ways.

Because complex systems are unpredictable by nature, any explanation of a deep learning decision is made after the decision was made and not 100% trustworthy. Compare this to our mind’s desire to think up explanations for the decisions and actions we take. We may be perfectly capable of coming up with plausible explanations for why we ate that scrumptious cookie while trying to follow that strict diet, but will those convince anyone else? We admit, in most current systems understanding a models’ behavior with tools such as LIME is very useful, but with increasingly complex models local explanations may only give a false sense of understanding why a decision was made. It will perhaps satisfy the need to find a scapegoat for a Black Swan event. However, the explanation may be understandable, it is not the full and true story.

The cry for more transparency in machine learning follows from the human desire to turn complex systems into simple models to give us the illusion of being in full control. The success of traditional hand-coded rule-based algorithms has been in the control that programmers have over them and that they can be relied on to always work as intended. With machine learning models becoming more and more complex we should not get too hung up on the fallacy that full transparency should be attainable, or that it is the definitive way to enhance trust by the general population.

6. Beyond the transparency fallacy

Progress on developing defenses in adversarial attacks and interpreting machine learning decisions should continue just as developing better algorithms and new types of networks that are better suited to specific tasks, but this should not get our only attention.

By realizing we are dealing with complex systems, we should embrace rather than work around their properties. Instead of trying to harness a specific machine learning algorithm to prevent Black Swan-like events, we should learn from how we deal with extreme events in other complex systems. This means adding fail-safes, redundancies, checks and balances. We should strive for even more complexity instead of less.

Combining strategies

Ensemble learning is a strategy where many models are trained and policies are used to combine the many outcomes into a final decision. An ensemble of decision trees is called a random forest, but it is also possible to stack or blend machine learning models of entirely different types. While transparency is not increased by increasing the number of models used, the amount of consensus between models can indicate the soundness of a decision. However, such ensembles are not better protected against adversarial attacks since they are all trained with the same data.

Taking it a step further, we should combine different types of knowledge to solve machine learning problems. While our visual cortex handles most image processing, we use rational thinking to not be completely fooled by optical illusions once we are made aware of them, and we use symbolic representations of objects and relations to assess what we see and explain why we see what we see. Minsky already said that there is no best way to represent knowledge. To solve most real-world commonsense problems, a mind must have at least several kinds of knowledge (Minksy, 1990).

Let’s propose a simple machine learning algorithm for detecting street signs. An adversarial attacker may fool a deep neural network into classifying a STOP road sign as a speed limit sign instead. However, a symbolic representation of the image should raise a red flag that a sign that is mainly red and octagonal cannot be a speed limit sign, or that logically a speed limit sign should not be spotted at an intersection where one would expect, among others, stop signs to occur. Such Deep Symbolic Networks (DSN) are proposed that would allow for image analysis based on a symbolic interpretation of the images which would be more easily interpretable (Zhang, 2017). Similarly, probabilistic logic reasoning has been proposed to allow for learning machine learning models with logic relations (Orsini et al, 2017).

It is not that we should focus only on symbolic learning because those systems are also prone to biases and mistakes. The main point we try to make is that we can increase stability and resistance against biases and attackers, and increase the explainability and trust, by linking several adaptive elements into a new whole. This can encompass: several machine learning algorithms able to decide based on different input and different principles, (ethical) ground rules, and adversarial algorithms trying to find flaws in the above algorithms and to check compliance to the ground rules.

The success of Alpha Go Zero is not solely in its sheer computational power and reinforcement learning algorithm, but also in its use of ground rules about the playing board and proven search tree algorithms. Critics may call this cheating or a weakness, but it is the strength of the system, that allowed it to be the most powerful and stable system thus far.

The human factor

Finally, as has been said already in the section on human decision making and trust: the relation between the complex dynamic system and humans is of utmost importance. The tendency is to put forward machine learning models as smart, more accurate than humans, and eventually infallible. If not now, then soon when we tweak the algorithms and add better data. Systems portrayed as omnipotent and omniscient will only scare us more and erode trust in the long run.

Instead, we propose to put forward developing humble AI: algorithms that try their best but know that they have biases and are prone to make ludicrous and potentially lethal predictions. Such algorithms exhibit doubt, ask (humans) for help, second-guess, allow to be reprimanded, and critically assess feedback to learn from their mistakes. But in essence, it is us that has to change. We should develop a different perspective towards AI, so we can build AI with a different self-image.

Conclusion

We have always struggled to make sense of the complex world around us, trying to create understanding by modeling and simplifying what we see around us. In the last century, we have been helped enormously by the advances of computers allowing us to compute, simulate, and automate a lot of the world with algorithms written using those simplified models. Our understanding of the physical world, economics, and societies have grown immensely because of it. But now, with deep learning algorithms, we are losing the control we thought we were gaining on the world by introducing inherently unpredictable complex systems into our software. Once again, we bump into the limits of our understanding. It is no surprise that we try to force these new models to become controllable again, demanding transparency: fairness, accuracy, confidentiality, and explainability.

Despite all good efforts into improving and constraining single machine learning models, we should not forget what makes us trust (properly organized) institutions, each other and in a possible far future AIs as moral agents. It is because they are all embedded in a stable, adaptive, network of moral agents.

Educating the best judges that can explain their decisions well is an asset to our judiciary system, but we trust the legal system for its hierarchy of appeal and its grounding in law and constitution. We should also still be wise to trust a democratic system of checks and balances over the rule of a genius and eloquent despot. Comparing the ways in which certain individual world leaders and governmental institutions react to a Black Swan event such as the covid-19 pandemic exposes where we should put out trust.

We should embed AI systems into AI-assisted institutions. We distribute responsibilities over several elements. Transparency is now required of the way this institution works, which is more compact and within our range of understanding, while relaxing the constraints on the individual AI elements that are now allowed to be humble and incomprehensible.

( This work was supported by NWO as part of the project “Verantwoorde Waardecreatie met Big Data (VWData)”.)