Human-centered Machine Learning: a Machine-in-the-loop Approach

In 1950, Alan Turing asked the question: “can machines think?” This question has inspired excellent research in the area of artificial intelligence and machine learning. Today, machine learning shows great promise in a wide range of applications, from autonomous driving cars, to face recognition, to medical diagnosis, to bail decisions, where judges decide the fate of arrestees. As machine learning becomes more and more integrated in daily life, applying a human-centered perspective to these tools becomes necessary.

Implicit in Turing’s question are two assumptions related to humans that have profoundly impacted the development of machine learning. First, when reading his question, it is understood that when he uses the verb “think,”he is referring to a distinctly human activity — he is not asking whether a machine can think like a sponge can think. The phrasing of the question implies that the emulation of human thought and performance is the target. Second, despite clever human design, the ultimate goal of machines is to “think” on their own. In the context of these two assumptions, it makes sense that machine learning methods have mostly been designed for automation, in a way, to replace humans.

However, there are many situations where neither assumption holds, and in many of these situations, machine learning could still be applied to improve our lives. In this blog, I’ll revisit these two assumptions and provide a simple framework to think about potential machine learning applications in terms of difficulty for humans and whether a task should be delegated to machines. This framework suggests a machine-in-the-loop approach that aims to assist and improve humans, in contrast with human-in-the-loop machine learning, in which humans help improve machine learning algorithms, e.g., by providing labels and features.

Human performance is not always an upper bound for machine learning.

In many applications, an implicit assumption for machine learning is that machines are designed to emulate human intelligence. A corollary is that human performance is the upper bound for machine learning. This explains why computer vision systems aim to match human performance in object recognition.

However, humans, even including experts, do not always perform well. Kleinberg et al. has carefully shown that machine learning algorithms outperform judges in deciding whether to release arrestees on bail; Lillian Lee, Bo Pang, and I have also shown that machines outperform humans in predicting which tweet was retweeted more. Behavioral biases and incomplete information/knowledge make these tasks challenging for humans in any realistic scenario. Therefore, human performance is not an upper bound in these tasks. Machine learning approaches often outperform humans and may in fact help us better understand the nature of human error and offer useful insights to humans, instead of imitating human intelligence.

Clarifying this implicit assumption is useful for understanding some interesting recent work on interpretable machine learning. For instance, LIME by Ribeiro et al., an interpretable machine learning approach, asks humans to distinguish trustworthy features from untrustworthy ones. This approach implicitly assumes that these tasks are easy for humans, or at least that humans have good intuitions about used features. For tasks that are difficult for humans such as medical diagnosis and bail decisions, such an approach may run the risk of confirming false intuitions.

Automation is not always the goal of machine learning.

Automation can improve the efficiency of our society and benefit humans in many cases. It is thus framed as the eventual goal in public discourse about many machine learning applications, e.g., autonomous driving and face identification.

However, there are many tasks that we should not, or do not want to, delegate to machines regardless of whether we can. Tasks that involve complex payoff functions that we cannot articulate or quantify are one example; tasks we enjoy for their own sake are another. It is not desirable to completely delegate a task where ethics and morality is vaguely defined and constantly evolving, like bail decisions, college admission and hiring. And, no matter how good machines get at writing novels, it is unlikely that an aspiring writer will be deterred from writing or would get the same satisfaction from delegating their writing to an automated machine. In these tasks, it is important to realize that the goal is more nebulous than can be captured in a metric, and optimizing for any particular metric may not advance the goal of that task. Therefore, the goal of machine learning is not automation in these tasks. We need to further research how to develop machine learning systems that aim to assist humans.

Whether we should delegate a task to machines is a subjective and even murky concept. The discussion of delegability is an important conversation that vary across cultures. For example, consider the following question:

Which medical diagnosis system do you prefer, an automated system with an accuracy of 90%, or a system that can provide explanations with an accuracy of 85%?

When I posed this question to the students in my human-centered machine learning class, the majority of the students preferred the system which gave an explanation. However, when I “lowered” the explicable system’s accuracy from 85% to 80%, the majority preferred the automated system.

By combining these two dimensions, difficulty for humans and delegability to machines, we obtain the above four quadrants.

  • Easy for humans, delegable to machines, e.g., OCR and objective recognition.
  • Difficult for humans, not delegable to machines, e.g., bail decisions and creative writing.
  • Difficult for humans, delegable to machines, e.g., network routing and earthquake rescuing. I also include multiplication or simple calculation as an example, but this does not require machine learning.
  • Easy for humans, not delegable to machines, e.g., eating food and small talks. However, whether small talks are easy for humans remains an open question.

As mentioned earlier, human performance varies between individuals, and the delegability of most tasks is often up for debate and can potentially break down into many more dimensions. However, I believe a preliminary framework with these two dimensions can facilitate discussions about machine learning applications from a human-centered perspective.

A machine-in-the-loop approach

Human-in-the-loop machine learning, e.g., interactively training a machine learning system in crayons by correcting machine outputs in each round; recommender systems similarly involve the loop between humans and machines in which machines keep improving by learning from user feedback.
Machine-in-the-loop where humans take full agency and machines play a supporting role, e.g., machines can provide suggestions to inspire creativity and help writers overcome cognitive inertia.

A natural outcome of this framework is a new paradigm for machine learning. Human-in-the-loop machine learning has been an important direction. For instance, humans can provide labels and features interactively with machines to obtain a better machine learning system. According to this framework, these tasks tend to be easy for humans and the ultimate goal is automation.

However, human-in-the-loop approaches do not work in many important tasks that are difficult for humans and are not delegable to machines, because it’s both unclear how humans would help machines and undesirable for a machine to take full agency. Instead, we should pursue a machine-in-the-loop approach, where humans take full control of the final decision/outcome and machines play a supporting role by giving suggestions in various forms.

Such an approach broadens our horizon of machine learning applications and (re-)introduces tasks that are understudied in machine learning. Elizabeth Clark, Anne Ross, Yangfeng Ji, Noah Smith and I have done some preliminary work on creative writing with a machine in the loop by providing suggestions to inspire writers’ creativity, but it requires a lot more work in this vein. This approach also connects with mixed-initiative design by Eric Horvitz and many other works in human-computer interaction, as well as the debate on intelligence augmentation.

Although machine-in-the-loop is still a nascent concept for machine learning, it will become increasingly important as machine learning keeps improving and human-ML interaction becomes universal. We should recognize that human performance does not limit machine learning and that machine learning is not limited to automation. We need to have a discussion about the nature of various machine learning applications and investigate possible machine-in-the-loop approaches to enhance individuals and our society.

This blog has immensely benefited from comments by and discussions with Elizabeth Clark, Sidney D’Mello, Aaron Jiang, Ningzi Li, Michael Mozer, Amit Sharma, Noah Smith, Adith Swaminathan, Hunter Wapman, Tom Yeh, and all students in my human-centered machine learning class.