TL;DR: Machine learning and artificial intelligence (AI) are beginning to govern ever-greater parts of our lives. If we want to trust their analyses and recommendations, it’s crucial that we understand how they reach their conclusions, how they work, which biases are at play. Alas, that’s pretty tricky. This article explores why.
As machine learning and AI gain importance and manifest in many ways large and small wherever we look, we face some hard questions: Do we understand how algorithms make decisions? Do we trust them? How do we want to deploy them? Do we trust the output, or focus on process?
Please note that this post explores some of these questions, connecting dots from a wide range of recent articles. Some are quoted heavily (like Will Knight’s, Jeff Bezos’s, Dan Hon’s) and linked multiple times over for easier source verification rather than going with endnotes. The post is quite exploratory in that I’m essentially thinking out loud, and asking more questions than I have answers to: tread gently.
In his very good and very interesting 2017 shareholder letter, Jeff Bezos makes a point about not over-valuing process: “The process is not the thing. It’s always worth asking, do we own the process or does the process own us?” This, of course, he writes in the context of management: His point is about optimizing for innovation. About not blindly trusting process over human judgement. About not mistaking existing processes for unbreakable rules that are worth following at any price and to be followed unquestioned.
Bezos also briefly touches on machine learning and AI. He notes that Amazon is both an avid user of machine learning as well as building extensive infrastructure for machine learning — and Amazon being Amazon, making it available to third parties as a cloud-based service. The core point is this (emphasis mine): “Over the past decades computers have broadly automated tasks that programmers could describe with clear rules and algorithms. Modern machine learning techniques now allow us to do the same for tasks where describing the precise rules is much harder.”
That’s right: With machine learning, we can learn to get desirable results but without necessarily knowing how to describe the rules that get us there. It’s pure output. No — or hardly any — process in the sense that we can interrogate or clearly understand it. Maybe not even instruct it, exactly.
Let’s keep this at the back of our minds now, we’ll come back to it later. Exhibit A.
“Machine learning techniques — most recently and commonly, neural networks — are getting pretty unreasonably good at achieving outcomes opaquely. In that: we really wouldn’t know where to start in terms of prescribing and describing the precise rules that would allow you to distinguish a cat from a dog. But it turns out that neural networks are unreasonably effective (…) at doing these kinds of things. (…) We’re at the stage where we can throw a bunch of images to a network and also throw a bunch of images of cars at a network and then magic happens and we suddenly get a thing that can recognize cars.”
Dan goes on to speculate:
“If my intuition’s right, this means that the promise of machine learning is something like this: for any process you can think of where there are a bunch of rules and humans make decisions, substitute a machine learning API. (…) machine learning doesn’t necessarily threaten jobs like “write a contract between two parties that accomplishes x, y and z” but instead threatens jobs where management people make decisions.”
“Neural networks work the other way around: we tell them the outcome and then they say, “forget about the process!”. There doesn’t need to be one. The process is inside the network, encoded in the weights of connections between neurons. It’s a unit that can be cloned, repeated and so on that just does the job of “should this insurance claim be approved”. If we don’t have to worry about process anymore, then that lets us concentrate on the outcome. Does this mean that the promise of machine learning is that, with sufficient data, all we have to do is tell it what outcome we want?”
The AI, our benevolent dictator
Now if we answered Dan’s question with YES, then this is where things get tricky, isn’t it? It opens the door to a potentially pretty slippery slope.
In political science, a classic question is what the best form of government looks like. While a discussion about what “best” means — freedom? wealth? health? agency? for all or for most? what are the tradeoffs? — is fully legitimate and should be revisited every so often, it boils down to this long-standing conflict:
Can a benevolent dictator, unfettered by external restraints, provide a better life for their subjects?
Does the protection of rights, freedom and agency offered by democracy outweigh the often slow and messy decision-making processes it requires?
Spoiler alert: Generally speaking, democracy won this debate a long time ago.
(Of course there are regions where societies have held on to the benevolent dictatorship model; and the recent rise of the populist right demonstrates that populations around the globe can be attracted to this line of argument.)
The reason democracy — a form of government defined by process! — has surpassed dictatorships both benevolent and malicious is that overall it seems a human endeavor to have agency and freely express it, rather than be governed by an unfettered, unrestricted ruler of any sorts.
Every country that chooses democracy over a dictator sacrifices efficiency for process: A process that can be interrogated, understood, adapted. Because, simply stated, a process understood is a process preferred. Being able to understand something gives us power to shape it, to make it work for us: This is true both on the individual and the societal level.
Messy transparency and agency trumps blackbox efficiency.
Let’s keep that in mind, too. Exhibit B.
Who makes the AI?
Andrew Ng, who was heavily involved in Baidu’s (and before Google’s) AI efforts, emphasizes the potential impact of AI to transform society: “Just as electricity transformed many industries roughly 100 years ago, AI will also now change nearly every major industry — healthcare, transportation, entertainment, manufacturing — enriching the lives of countless people.”
“I want all of us to have self-driving cars; conversational computers that we can talk to naturally; and healthcare robots that understand what ails us. The industrial revolution freed humanity from much repetitive physical drudgery; I now want AI to free humanity from repetitive mental drudgery, such as driving in traffic. This work cannot be done by any single company — it will be done by the global AI community of researchers and engineers.”
While I share Ng’s assessment of AI’s potential impact, I got to be honest: His raw enthusiasm for AI sounds a little scary to me. Free humanity from mental drudgery? Not to wax overly nostalgic, but mental drudgery — even boredom! — has proven really quite important for humankind’s evolution and played a major role in its achievements. Plus, the idea that engineers are the driving force seems risky at least: It’s a pure form of stereotypical Silicon Valley think, almost a cliché. I’m willing to give him the benefit of the doubt and assume that by “researchers” he also meant to include anthropologists, philosophers, political scientists, and all the other valuable perspectives of social sciences, humanities, and other related fields.
Something as transformative as this should not, in the 21st century, be driven by a tiny group of people with very homogenous backgrounds. Diversity is key, in professional backgrounds and ways of thinking as much as in gender, ethnic, regional and cultural backgrounds. Otherwise, algorithms are bound to encode and help enforce unhealthy policies.
Engineering-driven, tech-deterministic, non-diverse expansionist thinking delivers sub-optimum results. File under exhibit C.
Bezos writes about the importance of making decisions fast, which often requires making them with incomplete information: “most decisions should probably be made with somewhere around 70% of the information you wish you had. If you wait for 90%, in most cases, you’re probably being slow. Plus, either way, you need to be good at quickly recognizing and correcting bad decisions. If you’re good at course correcting, being wrong may be less costly than you think, whereas being slow is going to be expensive for sure.”
This, again, he writes in the context of management — presumably by and through humans. How will algorithmic decision-making fit into this picture? Will we want our algorithms to start deciding — or issuing recommendations — based on 100 percent of information? 90? 70? Maybe there’s an algorithm that figures out through machine learning how much information is just enough to be good enough?
Who is responsible for making algorithmically-made decisions? Who bears the responsibility for enforcing them?
If the algorithmic load-optimizing (read: overbooking), tells airline staff to remove a passenger from a plane and it ends up in a dehumanizing debacle, who’s fault is that?
More Dan Hon! Dan takes this to its logical conclusion (s4e11): “We’ve outsourced deciding things, and computers — through their ability to diligently enact policy, rules and procedures (surprise! algorithms!) give us a get out of jail free card that we’re all too happy to employ.” It is, in extension, a jacked up version of “it’s policy, it’s always been our policy, nothing I can do about it.” Which is, of course, the oldest and laziest cop-out there ever was.
He continues: “Algorithms make decisions and we implement them in software. The easy way out is to design them in such a a way as to remove the human from the loop. A perfect system. But, there is no such thing. The universe is complicated, and Things Happen. While software can deal with that (…) we can take a step back and say: that is not the outcome we want. It is not the outcome that conscious beings that experience suffering deserve. We can do better.”
I wholeheartedly agree.
To get back to the airline example: In this case I’d argue the algorithm was not at fault. What was at fault is that corporate policy said this procedure has priority, and this was backed up by an organizational culture that made it seem acceptable (or even required) for staff to have police drag a paying passenger off a plane with a bloody lip.
Algorithms blindly followed, backed up by corporate policies and an unhealthy organizational culture: Exhibit D.
In the realm of computer vision, there’s been a lot of advances through (and for) machine learning lately. Generative adversarial networks (GANs), in which one network tries to fool another, seem particularly promising. I won’t pretend to understand the math behind GANs, but Quora got us covered:
“Imagine an aspiring painter who wants to do art forgery (G), and someone who wants to earn his living by judging paintings (D). You start by showing D some examples of work by Picasso. Then G produces paintings in an attempt to fool D every time, making him believe they are Picasso originals. Sometimes it succeeds; however as D starts learning more about Picasso style (looking at more examples), G has a harder time fooling D, so he has to do better. As this process continues, not only D gets really good in telling apart what is Picasso and what is not, but also G gets really good at forging Picasso paintings. This is the idea behind GANs.”
So we got two algorithmic networks sparring with one another. Both of them learn a lot, fast.
Impressive, if maybe not lifesaving results include so-called style transfer. You’ve probably seen it online: This is when you can upload a photo and it’s rendered in the style of a famous painter:
/images/style_transfer_junyanz.jpg “Collection Style Transfer refers to transferring images into artistic styles. Here: Monet, Van Gogh, Ukiyo-e, and Cezanne. (Image source: Jun-Yan Zhu)”
Maybe more intuitively impressive, this type of machine learning can also be applied to changing parts of images, or even videos:
Wait, how did we get here? Oh yes, output v process!
Machine learning requires new skills (for creators and users alike)
What about skill sets required to work with machine learning, and to make machines learn in interesting, promising ways?
Google has been remaking itself as a machine learning first company. As Christine Robson, who works on Google’s internal machine learning efforts puts it: “It feels like a living, breathing thing. It’s a different kind of engineering.”
Technology Review features a stunning article absolutely worth reading in full: In The Dark Secret at the Heart of AI, author Will Knight interviews MIT professor Tommi Jaakkola who says:
“Deep learning, the most common of these approaches, represents a fundamentally different way to program computers. ‘It is a problem that is already relevant, and it’s going to be much more relevant in the future. (…) Whether it’s an investment decision, a medical decision, or maybe a military decision, you don’t want to just rely on a ‘black box’ method.’”
And machine learning doesn’t just require different engineering. It requires a different kind of design, too. From Machine Learning for Designers (ebook, free O’Reilly account required): “These technologies will give rise to new design challenges and require new ways of thinking about the design of user interfaces and interactions.”
Machine learning means that algorithms learn from — and increasingly will adapt to — their own performance, user behaviors, and external factors. Processes (however oblique) will change, as will outputs. Quite likely, the interface and experience will also adapt over time. There is no end state but constant evolution.
Technologist & researcher Greg Borenstein argues that “while AI systems have made rapid progress, they are nowhere near being able to autonomously solve any substantive human problem. What they have become is powerful tools that could lead to radically better technology if, and only if, we successfully harness them for human use.”
Borenstein concludes: “What’s needed for AI’s wide adoption is an understanding of how to build interfaces that put the power of these systems in the hands of their human users.”
Future-oriented designers seem to be at least open to this idea. As Fabien Girardin of the Near Future Laboratory argues: “That type of design of system behavior represents a future in the evolution of human-centered design.”
Computers beating the best human Chess and Go players have given us Centaur Chess in which humans and computers play side-by-side in a team: While computers beat humans in chess, these hybrid team of humans and computers playing in tandem beat computers hands-down.
In centaur chess, software provides analysis and recommendations, a human expert makes the final call. (I’d be interested in seeing the reverse being tested, too: What if human experts gave recommendations for the algorithms to consider?)
How does this work? Why is it doing this?
Now, all of this isn’t particularly well understood today. Or more concretely, the algorithms hatched that way aren’t understood, and hence their decisions and recommendations can’t be interrogated easily.
Will Knight shares the story of a self-driving experimental vehicle that was “unlike anything demonstrated by Google, Tesla, or General Motors, and it showed the rising power of artificial intelligence. The car didn’t follow a single instruction provided by an engineer or programmer. Instead, it relied entirely on an algorithm that had taught itself to drive by watching a human do it.”
What makes this really interesting is that it’s not entirely clear how the algorithms learned:
“The system is so complicated that even the engineers who designed it may struggle to isolate the reason for any single action. And you can’t ask it: there is no obvious way to design such a system so that it could always explain why it did what it did (…) It isn’t completely clear how the car makes its decisions.”
Knight stresses just how novel this is: “We’ve never before built machines that operate in ways their creators don’t understand.”
We know that it’s possible to attack machine learning with adversarial examples: So-called adversarial examples are intentionally designed to cause the model to make a mistake, to train the algorithm incorrectly. Even without a malicious attack, algorithms also simply don’t always get the full — or right — picture: “Google researchers noted that when its [Deep Dream] algorithm generated images of a dumbbell, it also generated a human arm holding it. The machine had concluded that an arm was part of the thing.”
This — and this type of failure mode — seems relevant. We need to understand how algorithms work in order to adapt, improve, and eventually trust them.
Consider for example two areas where algorithmic decision-making could directly decide about life or death: Military and medicine. Speaking of military use cases, David Dunning of DARPA’s Explainable Artificial Intelligence program explains: “It’s often the nature of these machine-learning systems that they produce a lot of false alarms, so an intel analyst really needs extra help to understand why a recommendation was made.” Life or death might literally depend on it. What’s more, if a human operator doesn’t fully trust the AI output then that output is rendered useless.
Should we have a legal right to interrogate AI decision making? Again, Knight in Technology Review: “Starting in the summer of 2018, the European Union may require that companies be able to give users an explanation for decisions that automated systems reach. This might be impossible, even for systems that seem relatively simple on the surface, such as the apps and websites that use deep learning to serve ads or recommend songs. The computers that run those services have programmed themselves, and they have done it in ways we cannot understand. Even the engineers who build these apps cannot fully explain their behavior.”
It seems likely that this could currently not even be enforced, that the creators of these algorithmic decision-making systems might not even be able to find out what exactly is going on.
There have been numerous attempt of exploring this, usually through visualizations. This works, to a degree, for machine learning and even other areas. However, often machine learning is used to crunch multi-dimensional data sets. We simply have no great way of visualizing this in a way that makes it easy to analyze (yet).
This is worrisome to say the least.
But let me play devil’s advocate for a moment: What if the outcomes are really so good, so much better than the human-powered analysis or decision-making skills. Might not using them be simply irresponsible? Knight gives the example of a program at Mount Sinai Hospital in New York called Deep Patient that was “just way better” at predicting certain diseases from patient records.
If this prediction algorithm has a solid track record of successful analysis, but neither developers nor doctors understand how it reaches its conclusions, is it responsible to prescribe medication based on its recommendation? Would it be responsible not to?
Philosopher Daniel Dennett who studies consciousness of the mind takes it a step further. An explanation by an algorithm might not be good enough. Humans aren’t great at explaining themselves, so if an AI “can’t do better than us at explaining what it’s doing, then don’t trust it.”
It follows that an AI would need to provide a much better explanation than a human in order for it to be trustworthy. Exhibit E.
Now where does that leave us?
Let’s assume that the impact of machine learning, algorithmic decision-making and AI will keep increasing. A lot. Then We need to understand how algorithms work in order to adapt, improve, and eventually trust them.
Machine learning allows us to get desirable results, but without necessarily knowing how (exhibit A). It’s essential for a society to be able to understand and shape its governance, and to have agency in doing so. So in AI just like in governance: Transparent messiness is more desirable than oblique efficiency. Black boxes simply won’t do. We cannot have black boxes govern our lives (exhibit B). Something as transformative as this should not, in the 21st century, be driven by a tiny group of people with very homogenous backgrounds. Diversity is key, in professional backgrounds and ways of thinking as much as in gender, ethnic, regional and cultural backgrounds. Engineering-driven, tech-deterministic, non-diverse, expansionist thinking delivers sub-optimum results (exhibit C). Otherwise, algorithms are bound to encode and help enforce unhealthy policies. Blindly followed, backed up by corporate policies and an unhealthy organizational culture, this is bound to deliver horrible results (exhibit D). Hence we need to be able to interrogate algorithmic decision-making. And if in doubt, an AI should provide a much better explanation than a human in order for it to be trustworthy (exhibit E).
Machine learning and AI hold great potential to improve our lives. Let’s embrace it, but deliberately and cautiously. And let’s not hand over the keys to software that’s too complex for us to interrogate, understand, and hence shape to serve us. We must apply the same kinds of checks and balances to tech-based governance as to human or legal forms of governance — accountability & democratic oversight and all.