How do we design automation to support, rather than replace, humans?

Ian Moura
Human-Machine Collaboration
9 min readSep 4, 2019

The idea that automation and AI will replace or supersede humans gets a lot of attention, often at the expense of discussion about fostering human-machine collaboration. However, as technological development continues to stretch the limits of what is automatable, designing systems and algorithms with human cooperation and benefit in mind is of increasing importance. Preparing and equipping humans to work and live with machines is far easier when creating those machines involves thoughtful consideration of human abilities and human needs. Given our interest in these issues, Bob Stark and I decided to create a discussion group for the purpose of research and problem-solving; the following article summarizes the background information that we covered in our inaugural meeting, and closes with some of the specific topics we plan to address at future events.

Automation is not a new trend, and can be generally understood to mean having a machine do something which was previously a human task. Therefore, artificial intelligence may be best understood as a recent step in the process of industrialization that began with the refinement of steam power in the 18th century. There are many models of automation, spanning a range of technical and theoretical approaches. For the purpose of this article, which focuses on artificial intelligence, automation means having a computer execute some or all of the stages of information processing, described by Parasuraman, Sheridan, and Wickens (2000) as information acquisition, information analysis, decision selection, and action implementation. Additionally, a key component in discussing automation’s societal implications is consideration of the degree to which a task or action is automated, often summarized as “automation levels.” For example, Cummings (2004) suggests that at the lowest level, a “computer offers no assistance” and a “human must take all decisions and actions”, whereas at the highest level, “[t]he computer decides everything and acts autonomously, ignoring the human.”

Automation levels (Cummings 2004)

A key component of successful automation — regardless of level — is an appropriate degree of trust. Lee and See (2004) document the ways in which trust in automation and automation capability can be mis-matched, which they describe as calibration, resolution, and specificity of trust. Calibration describes the degree to which a person’s trust in automation corresponds to the capabilities of that automation, with overtrust resulting from trust that exceeds capabilities (leading to misuse), and distrust resulting from capabilities that exceed trust in the system (leading to disuse). Resolution explains the degree to which trust differentiates the capability of automation levels, with poor resolution occurring when a similar level of trust is given to a wide range of automation capability. With poor resolution, significant changes in automation capability receive a similar degree of trust. Finally, specificity explains the way in which a person’s trust relates to specific components or aspects of automation; additionally, specificity can describe contextual or temporal fluctuations in trust. Low specificity occurs when trust is consistent in the long-term or for an entire system, while high specificity reflects trust that is dependent on moment-to-moment changes in automation capability and on the individual components or subfunctions of automation.

The relationship between calibration, resolution, and automation capability in creating appropriate trust (Lee & See 2004)

Poor matching between human trust and automation capability is involved in many unanticipated problems and failures. In particular, overtrust can create automation reliance and associated challenges. Detraining can occur as humans lose experience performing tasks which they once routinely completed; one example of this is how reliance on Google maps or other programs has reduced many people’s ability to navigate an unfamiliar (or even a familiar) environment, particularly in situations where a mapping program behaves unexpectedly or when changes to the physical route or area are not reflected in mapping data. Detraining is of particular concern in the context of self-driving cars. In a hypothetical future where humans are not routinely expected to drive, how can autonomous vehicles be designed so that human drivers maintain sufficient skill to control the vehicle when necessary? Given that autonomous vehicles will occasionally fail, how can they ensure that even without regular reinforcement, human drivers are able to take control of the car on short notice and operate it without assistance, potentially in extremely high-stakes situations?

Another consequence of overtrust is the dependence on new responsibilities; specifically, the movement away from completing tasks independently, and towards directing the automation. To return to the example of Google maps and similar types of automated navigation programs, people must now ask automation to direct them to a destination (as opposed to looking at a map and determining the directions themselves). This often can accompany reduced situational awareness (which is itself a potential result of overtrust). People may be adept at using automation while possessing little understanding of how or why it operates as it does; more concerningly, when the automation fails to behave as anticipated, a lack of attention to situational details can render people less able to solve a problem or complete a task on their own. The growing number of news stories documenting the predicaments in which drivers have found themselves after following automated directions reinforce the findings of researchers who study automation bias — the tendency of humans to fail for search for, or even ignore, evidence that contradicts a computed solution.

There’s a tendency to frame artificial intelligence in hyperbolic terms, or to position it as something new and unprecedented. However, in many ways, AI is simply the most recent development in an ongoing movement toward increasing automation. In fact, AI can be seen as a “flawed” form of automation, or at the very least, one that is subject to particular limitations. For example, algorithms may be constructed from or trained on incomplete or biased data. An especially well-documented example is that of the COMPAS risk assessment tool, which is intended to predict the likelihood that a person will go on to commit a future crime. Although none of the questions used by COMPAS to calculate recidivism risk ask about race, the results are nonetheless racially biased, with the formula significantly more likely to incorrectly flag Black defendants as high risk. While the predictions that result from the COMPAS algorithm are particularly egregious, it is important to note that all predictive algorithms and models can encode ways in which individuals (and society) are biased.

The risk of bias is of particular concern when AI relies on historical data, which often bears evidence of overt prejudices that dictated policies, but ultimately, all data sets inherently contain bias. It’s important to consider, for instance, how outcomes were measured (and who decided which outcomes were worth measuring in the first place). Artificial intelligence also runs the risk of replicating (and even amplifying) what some researchers call “natural stupidity” (Rich and Gureckis 2019). Just like humans, AI decision-making can fail when applied to rare events, fall prey to illusory correlations, and suffer from other limitations that often affect human judgment. Furthermore, in light of increasing globalization, consideration of cultural context is a critical part of ensuring AI works as intended. Microsoft’s infamous chatbot, Tay, devolved into a xenophobic train wreck after less than 24 hours on Twitter. In China, however, Microsoft’s chatbot XiaoIce has a dedicated following and composes poetry, not hateful screeds. The vastly different outcomes demonstrate the need for consideration of context — cultural, functional, and otherwise — in creating and deploying AI.

Artificial intelligence can also create new types of failures. While an expected benefit of automation is often a reduction in human error, rather than decreasing errors overall, automation frequently presents new and different types of errors, thereby transforming rather than eliminating their occurrence. Researchers have ascribed errors to “gaps and misconceptions in operators’ models of the automated sustems…bugs in operators’ mental model can make it difficult or impossible to form accurate expectations of system behavior” (Sarter, Woods, and Billings 1997). As automation is used in more and broader situations, operators without relevant domain expertise may increasingly face situations where they must respond to scenarios for which they lack necessary contextual knowledge.

In light of the various risks and challenges that accompany automation generally, and AI in specific, practitioners should carefully consider strategies to avoid perpetuating human bias and cognitive shortcomings or encountering novel error types. Historically, this has included the use of cognitive systems engineering and limiting automation to well-defined tasks and environments. More recently, usability, user-experience, and design thinking have allowed for better implementation of automation across relatively well-defined tasks and in diverse environments. However, while designing in ways that help machine-augmented humans complete tasks works well for easy-to-understand systems, it is less ideal for AI, particularly with the increase in both popularity and feasibility of neural networks and other “black box” models. Defining tasks becomes more challenging when designers are less able to state clearly what will happen, and user experience is complicated by lack of user awareness of what is happening within a system.

A better approach, perhaps, is to enable human-machine collaboration. In general, humans and machines have different strengths. Rather than viewing these faculties as in direct competition with one another, automation (and AI) can be designed so that humans and machines are each responsible for the areas in which they typically excel; examples of this approach are already emerging. Additionally, the ongoing interest in “handmade” products and the cultural cachet of “artisan” or “bespoke” goods in light of automation and mass production speaks to the way in which value exists in concert with societal milieu. Furthermore, looking to past examples of automation — for example, during the first Industrial Revolution — can provide ideas about strategies that allow humans to thrive alongside automation, rather than compete with it (the past is, of course, also a valuable source of cautionary tales).

Human and computer strengths in decision-making (Cummings 2004)

Since both automation and human-machine collaboration are broad topics, we plan to focus on specific aspects of each during future events. Our next meeting will cover AI in healthcare. Other topics we hope to address include interpretable models; societal implications of AI use; trust and dependence on AI; responses to worker replacement; the role of policy; and automating broader, more human-like skills. If you are in the San Francisco Bay Area and are interested in participating in future events, they will be listed through the Berkeley AI Meetup Group list.

About the Human-Machine Collaboration Publication and the Berkeley AI Meetup

Preparing and equipping humans to work and live with machines is far easier when creating those machines involves thoughtful consideration of human abilities and human needs. Given our interest in these issues, Bob Stark and Ian Moura decided to create a discussion group for the purpose of research and problem-solving through the Berkeley AI meetup group. This Medium publication summarizes the background information that we cover in our meetings.

References and Recommended Reading

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine Bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Bainbridge, L. (1983). Ironies of Automation. https://www.ise.ncsu.edu/wp-content/uploads/2017/02/Bainbridge_1983_Automatica.pdf

Cummings, M.L. (2004) Automation Bias in Intelligent Time Critical Decision Support Systems. https://web.archive.org/web/20141101113133/http://web.mit.edu/aeroastro/labs/halab/papers/CummingsAIAAbias.pdf

DARPA’s eXplainable AI (XAI). https://www.darpa.mil/program/explainable-artificial-intelligence

Hollnagel, E. (2016). Resilience Engineering. http://erikhollnagel.com/ideas/resilience-engineering.html

Hollnagel, E., & Woods, D. D. (1982). Cognitive Systems Engineering: New wine in new bottles. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458.2247&rep=rep1&type=pdf

Horvitz, E. (1999). Principles of Mixed-Initiative User Interfaces. http://courses.ischool.berkeley.edu/i296a-4/f99/papers/horvitz-chi99.pdf

Lee, J.D., See, K.A. (2004). Trust in Automation: Designing for Appropriate Reliance. https://pdfs.semanticscholar.org/8525/ef5506ece5b7763e97bfba8d8338043ed81c.pdf

Parasuraman, R., Sheridan, T.B., Wickens, C.D. (2000). A Model for Types and Levels of Human Interaction with Automation. https://www.ida.liu.se/~729A71/Literature/Automation/Parasuraman,%20Sheridan,%20Wickens_2000.pdf

Rich, A.S., and T.M. Gureckis. (2019) Lessons for artificial intelligence from the study of natural stupidity. https://sci-hub.tw/10.1038/s42256-019-0038-z

Rudin, C. (2018) Please Stop Explaining Black Box Models for High-Stakes Decisions. https://arxiv.org/pdf/1811.10154.pdf

Sarter, N.B., Woods, D. D., Billings, C.E. (1997). Automation Surprises. https://pdfs.semanticscholar.org/f4c7/caebecd0f1b42d1eb8da1061e464fcccae11.pdf

Stark, R. F., Farry, M., Thornton, W., Wollocko, A., Woods, D. D., & Morison, A. (2012). Modeling Resilient Submarine Decision Making. https://www.dropbox.com/s/dn4s35m88phcra3/Stark-et-al-2012%20BRIMS.pdf?dl=0

Stark, R. F., Farry, M., & Pfautz, J. (2012). Mixed-Initiative Data Mining With Bayesian Networks. https://www.dropbox.com/s/m6yarozba98kb4l/Stark-et-al-2012%20CogSIMA.pdf?dl=0

Stark, R. F., Roth, E. M., & Farry, M. P. (2013). Incrementally formalizing graphical models for collaborative operations research. https://www.dropbox.com/s/pa9ez2ka76sojku/Stark-et-al-2013%20HFES.pdf?dl=0

Stark, R. F., Woods, D. D., Farry, M., Morison, A., Thornton, W., & Wollocko, A. (2012). Visualizations and Interaction Methods for Resilient Submarine Decision Support. https://www.dropbox.com/s/l8b10uglras1z1r/Stark-et-al-2012%20HFES.pdf?dl=0

--

--