Power to the People: How One Unknown Group of Researchers Holds the Key to Using AI to Solve Real Human Problems

13 min readJun 30, 2016

In the last few years, a series of spectacular research results have drawn the world’s attention to the field of machine learning. Excitement for AI hasn’t been so white hot intense since the onset of the last AI Winter. But, despite the explosion of interest, most people are paying attention to the wrong research. And, in the process, they’re missing the work of a small set of researchers who are quietly building the foundation we’ll need to use machine learning to actually solve real human problems.

The current wave of AI excitement started with Hinton et al’s breakthrough success with deep convolutional neural networks on image classification. In a field that typical progresses by single percentage points, their results destroyed the previous state of the art. Hinton’s compatriots such as Yoshua Bengio, Yann LeCun, Andrew Ng and others quickly followed, using related techniques to set new benchmarks in speech recognition, face recognition, and a number of other research problems. The world of machine learning researchers rapidly became first aware of (and then profoundly obsessed by) this new suite of approaches, which was gathered together under the banner of Deep Learning.

And then, as Deep Learning gained more support from big companies like Google and Facebook, it started to produce achievements that were legible — and extremely impressive — to the wider public. AlphaGo won historic victories against the world’s leading Go players. IBM Watson dominated human players at Jeopardy on network TV. Smaller efforts like Neural Style Transfer and Deep Dream produced impressive visual memes that spread across social media.

All this success kindled a continuously burning flame of press attention and speculation that has drawn towards it executives, front-line technologists, and designers across a wide range of businesses. Venture capitalists are starting to talk about investing in an AI First World. Half of startups want to use these AI advances to build conversational UIs for their web and mobile apps and the other half want to use them to improve their Internet of Things products. I recently spoke at a conference put on by The Economist in Hong Kong and one of the major questions was about AI’s impact on marketing.

But now for a splash of cold water: while AI systems have made rapid progress, they are nowhere near being able to autonomously solve any substantive human problem. What they have become is powerful tools that could lead to radically better technology if, and only if, we successfully harness them for human use.

What’s stopping AI from being put to productive use in thousands of businesses around the world isn’t some new learning algorithm. It’s not the need for more programmers fluent in the mathematics of stochastic gradient descent and back propagation. It’s not even the need for more accessible software libraries. What’s needed for AI’s wide adoption is an understanding of how to build interfaces that put the power of these systems in the hands of their human users. What’s needed is a new hybrid design discipline, one whose practitioners understand AI systems well enough to know what affordances they offer for interaction and understand humans well enough to know how they might use, misuse, and abuse these affordances.

Look at history. It wasn’t some advance in cutting edge math or programming technique that produced the “killer app” for the personal computer. It was Dan Bricklin’s connection between the possibilities of programming and the working methods of real people that produced VisiCalc, the first “electronic spreadsheet”.

“I thought, if only we had a blackboard where I could erase a number and write a new number in, and everything would recalculate.” — Dan Bricklin, bored Harvard Business School Student

And hidden beneath the spectacle of Deep Learning’s much ballyhooed success, an entire field of research has quietly grown up that’s dedicated to exactly this problem of designing human interactions with machine learning systems. Interactive Machine Learning, as this small but exciting field is known, lives at the intersection of User Experience and Machine Learning research. And almost everyone reading this — almost anyone wondering how to incorporate AI into their own business or creative tool or software product or design practice — would be better off studying this field than maybe any other part of the AI landscape.

As Recurrent Neural Nets surpass Convolutional Neural Nets only to be outpaced by Deep Reinforcement Learning which in turn is edged out by the inevitable Next Thing in this incredibly fast moving field, the specifics of any given algorithm that temporarily holds the title of best performance on some metric or benchmark will fade in importance. What will stay important are the principles for designing systems that let humans use these learning systems to do things they care about.

Those principles are exactly the subject of Interactive Machine Learning. And if you’re a designer or a manager or a programmer working to use AI to make something for human use they’re the principles you’ll have to master.

To help get you started, I thought I’d summarize a few of the field’s results and provide links to some of its most interesting research. A couple of years ago, I was lucky enough to take an MIT Media Lab course on Interactive Machine Learning that was taught by Brad Knox, one of the most interesting practitioners in the field. Nearly all of what I’m going to describe here I learned from Knox or by studying the reading he assigned. (In fact, what follows is primarily a layperson’s summary of Knox’s paper, Power to the People: The Role of Humans in Interactive Machine Learning, written with Saleema Amershi, Maya Cakmak, and Todd Kulesza — all amongst IML’s leading lights.)

One additional note: unlike the wall-of-equations that make up most machine learning papers, the IML literature is profoundly inviting and largely friendly to non-experts. I encourage you to dive into the original papers wherever a particular topic piques your interest. I’ve gathered links to all of the papers in Knox’s syllabus here to make doing so especially convenient.

Use Active Learning to Get the Most Help from Humans

The core job of most machine learning systems is to generalize from sample data created by humans. The learning process starts with humans creating a bunch of labeled data: images annotated with the objects they depict, pictures of faces with the names of the people, speech recordings with an accurate transcript, etc. Then comes training. A machine learning algorithm processes all that human-labeled data. At the end of training the learning algorithm produces a classifier, essentially a small standalone program that can provide the right answer for new input that was not part of the human-labeled training data. That classifier is what you then deploy into the world to guess your users’ age, or recognize their friends’ faces, or or transcribe their speech when they talk to their phone.

The scarce resource in this equation is the human labor needed to label the training data in the first place.

Many impressive Deep Learning results come from domains where enormous amounts of labeled data is available because it was shared by a social network’s billion users or crawled from across the web. However, unless you’re Facebook or Google, you’ll likely find labeled data relevant to your problem somewhat more scarce, especially if you’re working in some new vertical that has its own jargon or behavior or data sources. Hence you’ll need to get your labels from your users. This entails building some kind of interface that shows them examples of the texts or images or other inputs you want to be able to classify and gets them to submit the correct labels.

But, again, human labor — particularly when it’s coming from your users — is a scarce resource. So, you’ll want to only ask your users to label the data that will improve your system’s results the most. Active Learning is the name for the field of machine learning that studies exactly this problem: how to find the samples for which a human label would help the system improve the most. Researchers have found a number of algorithmic approaches to this problem. These include techniques for finding the sample about which the system has the greatest uncertainty, detecting samples for which a label would cause the greatest change to the system’s results, selecting samples for which the system expects that its predictions would have the highest error, and others. Burr Settles’ excellent survey of Active Learning provides a great introduction to the field.

As a concrete example of these ideas, here’s a video demonstrating a hand gesture recognition system I built that uses Active Learning principles to request labels from the user when it sees a gesture for which it cannot make a clear prediction (details about this work here):

Don’t Treat the User as an “Oracle”

Active Learning researchers have shown success in producing higher accuracy classifiers with fewer labeled samples. Active Learning is a great way to pull the most learning out of the the labeling work you get your users to do.

However, from an interaction design perspective, Active Learning has a major downside: it puts the learning system in charge of the interaction rather than the human user. Active Learning researchers refer to the human who labels the samples they select as an “oracle”. Well, Interactive Machine Learning researchers have shown that humans don’t like being treated as an oracle.

Humans don’t like being told what to do by a robot. They enjoy interactions much more, and are willing to spend more time training the robot if they are in charge of the interaction.

In a 2010 paper, Designing Interactions for Robot Active Learners, Cakmak et al studied user perceptions of passive and active approaches to teaching a robot to recognize shapes. One option put the robot in charge. It would use Active Learning to determine the shape it wanted labeled next. Then it would point at the shape and the user would provide the answer. The other option put the users in charge, letting them select which examples to show the robot.

When the robot was in charge of the interaction, selecting which sample it wanted labeled in the Active Learning style, users found the robot’s stream of questions “imbalanced and annoying”. Users also reported a worse understanding of the state of the robot’s learning making them worse teachers.

In a software context, Guillory and Blimes found similar feelings while attempting to apply active learning to Netflix’s movie rating interface.

Choose Algorithms for Their Ability to Explain Classification Results

Imagine you have a persistent health problem that you need diagnosed. You have the choice of two AI systems you can use. System A has a 90% accuracy rate, the best available. It takes in your medical history, all your scans and other data and gives back a diagnosis. You can’t ask it any questions or find out how it arrived at that diagnosis. You just get back the latin name for your condition and a wikipedia link. System B has an 85% accuracy rate, substantially less than System A. System B takes all your medical data and also comes back with a diagnosis. But unlike System A it also tells you how it arrived at that diagnosis. Your blood pressure is past a certain threshold, you’re above a certain age, you have three of five factors from your family history, etc.

Which of these two systems would you choose?

There’s a cliche from marketing that half of the advertising budget is wasted but no one knows which half. Machine learning researchers have a related cliche: it’s easy to create a system that can be right 80% of the time, the hard part of figuring out which 80% is right. Users trust learning systems more when they can understand how they arrive at their decisions. And they are better able to correct and improve these systems when they can see the internals of their operation.

So, if we want to build systems that users trust and that we can rapidly improve, we should select algorithms not just for how often they produce the right answer, but for what hooks they provide for explaining their inner workings.

Some machine learning algorithms provide more of these types of affordances than others. For example, the neural networks currently pushing the state-of-the-art in accuracy on so many problems provide particularly few hooks for such explanations. They are basically big black boxes that spit out an answer (though some researchers are working on this problem). On the other hand, Random Decision Forests provide incredibly rich affordances for explaining classifications and building interactive controls of learning systems. You can figure out which variables were most important, the system’s confidence about each prediction, the proximity between any two samples, etc.

You wouldn’t select a database or web server or javascript framework simply because of its performance benchmarks. You’d look at the API and see how much it supported the interface you want to provide your users. Similarly, as designers of machine learning systems we should expect to have the ability to access the internal state of our classifiers in order to build richer, more interactive interfaces for our users.

Beyond our own design work on these systems, we want to empower our users themselves to improve and control the results they receive. Todd Kulesza, at Microsoft Research, has done extensive work on exactly this problem which he calls Explanatory Debugging. Kulesza’s work produces machine learning systems that explain their classification results. These explanations themselves then act as an interface through which users can provide feedback to improve and, importantly, personalize the results. His paper on Why-Oriented End-User Debugging of Naive Bayes Text Classification provides a powerful and concrete example of the idea.

Empowering Users to Create Their Own Classifiers

In conventional machine learning practice, engineers build classifiers, designers integrate them into interfaces, and then users interact with their results. The problem with this pattern is that it divorces the practice of machine learning from knowledge about the problem domain and the ability to evaluate the system’s results. Machine learning engineers or data scientists may understand the available algorithms and the statistical tests used to evaluate their results, but they don’t truly understand the input data and they can’t see problems in the results that would be obvious to their users.

At best this pattern results in an extremely slow iteration cycle. Machine learning engineers return to their users with each iteration of the system, slowly learning about the domain and making incremental improvements. In practice, this cumbersome cycle means that machine learning systems ship with problems that are obvious to end users or are simply too expensive to build for many real problems.

To escape this pattern we have to put the power to create classifiers directly in the hands of users. Now, no user wants to “create a classifier”. So, in order to give them this power we need to design interfaces that let them label samples, select features, and do all the other actions involved in a way that fits with their existing mental models and workflows.

When we figure out how to do this the results can be extremely powerful.

One of the most impressive experiments I’ve seen in Interactive Machine Learning is Saleema Amershi’s work on Facebook group invites, ReGroup: Interactive Machine Learning for On-Demand Group Creation in Social Networks.

The current Facebook event invite experience goes like this: you create a new event and go to invite friends. Facebook presents you with an alphabetical list of all of your hundreds of friends with a checkbox by each one. You look at this list in despair and then click the box to “invite all”. And hundreds of your friends get invites to events they’ll never be able to attend in a city where they don’t live.

The ReGroup system Amershi and her team put together improves on this dramatically. It starts you with the same list of names with checkboxes. But then when you check a name it treats that check as a positively labeled sample. And it treats any names you skipped as negatively labeled samples. It uses this data to train a classifier, treating profile data and social connections as the features. It computes a likelihood for each of your friends that you’ll check the box next to them and sorts the most likely ones to the top. The features that determine event revelance are relatively strong and simple — where people live, what social connections you have in common, how long ago you friended them etc. — the classifier’s results rapidly become useful.

This work is an incredibly elegant match between existing user interaction patterns and what’s needed to train a classifier.

Another great example is CueFlik, a project by Fogarty et al that improves web-based image search by letting users create rules that automatically group photos by their visual qualities. For example (as shown above), a user might search for “stereo” and then select just the “product photos” (those on a clean white background). CueFlick takes these examples and learns a classifier that can distinguish product photos from natural photos that users can later choose to apply to other searches beyond the initial search for “stereo”, for example to “cars” or “phones”.

Conclusion

When imagining a future shaped by AI, it’s easy to fall back on cultural tropes from sci-fi movies and literature, to think of The Terminator or 2001 or Her. But these visions reflect our anxieties about technology, gender, or the nature of humanity far more than the concrete realities of machine learning systems as we’re actually building them.

Instead of seeing Deep Learning’s revolutionary recent results as incremental steps towards these always receding sci-fi fantasies, imagine them as the powerful new engines of a thousand projects like ReGroup and CueFlik, projects that give us unprecedented abilities to understand and control our world. Machine learning has the potential to be a powerful tool for human empowerment, touching everything from how we shop to how we diagnose disease to how we communicate. To build these next thousand projects in a way that capitalizes on this potential we need to learn not just how to teach the machines to learn but how to put the results of that learning into the hands of people.