Using interface designs to explore challenges in AI-assisted child welfare decision making
Every day, trained experts must make decisions based on social perceptions and judgments: Should this applicant be interviewed for a job? Should this person be released on bail? Should this child’s family be investigated for child maltreatment?
In recent years, artificial intelligence (AI) algorithms have been developed with the promise of making these decisions more accurate and consistent, and public agencies around the US are listening. But all too often, algorithms fall short of those promises when put into practice, and workers are left with yet another mandatory tool that doesn’t actually help them. What would it mean to design AI-based tools that truly help workers make better decisions?
In a research paper we’re presenting at the 2022 Conference on Designing Interactive Systems (DIS ’22), we explored child maltreatment call screeners’ perceptions of ways to change or enhance how they work with AI-based decision support tools. What we found forms part of a broader story about the fundamental challenges of decision-making in complex social contexts. Most importantly:
To create AI-based decision aids that work, we need new algorithms and interfaces to accommodate the social complexities that decision-makers grapple with every day.
So, what do we mean by social complexities? For a real-world example, let’s dive into the challenging, ethically fraught world of child welfare — where AI-based decision support tools are already impacting decisions every day.
How is AI used in child maltreatment screening?
The child welfare system in the United States is large and diverse, comprising hundreds of thousands of children and families. For example, in Allegheny County, Pennsylvania, social workers serving as hotline call screeners review over 10,000 cases of alleged child maltreatment or neglect each year. To assist social workers’ decision-making on these cases, Allegheny County has implemented an AI-based tool called the Allegheny Family Screening Tool (AFST).
The AFST works by collecting information about a family from several public databases, then collapsing it into a single score from 1–20. A score of 20 signifies high risk that the child will ultimately be removed from their home and placed in foster care. Social workers use this score, alongside details about the alleged maltreatment and other information sourced from public records, to decide whether to screen the case for further investigation.
The AFST has been around since 2016, and although its impact on families is disputed, it has certainly been extensively studied and used as inspiration to implement similar algorithms in other child welfare agencies across the US — making it an ideal test case for our study.
Interface design concepts for AI decision aids
Now let’s take a look at the interface workers currently use to view the AFST score:
As you can see, this interface doesn’t convey a lot of information. Because of this, and various other organizational and model design challenges surfaced in our prior research, workers found it challenging to meaningfully integrate the score into their decision process, and often had to make assumptions about why the AFST score was high or low for a family.
That’s where our study came in: by showing workers interface design concepts for the AFST, we identified important design implications for future AI-based decision support tools. We also surfaced more fundamental concerns around the assumptions underlying statistical prediction in complex social settings.
We created ten different design concepts, which served as the basis for interviews with 13 Allegheny County hotline call screeners and supervisors. Each design concept envisioned a different way of augmenting or redesigning the AFST’s capabilities:
You might recognize some of these concepts from the field of interpretability in machine learning, which seeks to help people use AI through explanations of system behavior or processes. But other design concepts, particularly ones which were inspired by previous worker feedback, went beyond those kinds of approaches. For example, one concept was born from several workers’ desire to imbue the score calculation with their knowledge of relevant context, for example by removing specific individuals from the score calculation:
“[How] can I balance out father went to jail 10 years ago, but now father is out here being a productive citizen doing what it is that he needs to do? But based on where [the AFST is] pulling everything from, it’s pulling that, but it’s not pulling it that father is now a productive citizen.”
Giving workers the ability to apply such contextual knowledge to adjust the AI’s prediction introduces many technical challenges. For example, how would we measure the “accuracy” of the algorithm, if the algorithm’s outputs could be changed based on factors that cannot easily be captured in data?
Yet empowering workers with greater flexibility to adjust AI predictions might ultimately offer more value to decision-makers in social contexts. Why? Letting workers document how they are adjusting the AI predictions could help them reflect on their decisions in-the-moment, while leaving records of their thought processes to support collaboration with other social workers.
New directions for AI interface design
In our paper, we highlight 13 implications for the design of AI decision aids. Here are just a few examples:
- Design AI techniques and interfaces that display an awareness of the factors that a human decision-maker would be expected to use, in contrast to those used by the AI. Popular interpretability techniques today, like LIME and SHAP, are focused on explaining the features that a model uses to make a particular prediction. But decision-makers in our study were also interested in the features that the AI couldn’t use — in other words, having interfaces explain some of the reasons why their judgment might differ from the AI. To help highlight information that is unobservable to the AI, it may be helpful to perform deeper analyses of unstructured data about decision-making, such as case worker notes.
- Explore ways for explainable AI interfaces to support workers’ needs beyond direct decision support, for example by supporting workers in justifying agreement or disagreement with the risk score to their supervisors or peers. Ultimately, an AI interface is just another tool that workers can use to make decisions. Building better ways for those tools to integrate into people’s broader workflows (e.g., by having tools provide support for justifying or discussing decisions with others) is essential to support the usefulness of such tools in real-world organizational contexts.
- Design AI interfaces that help workers calibrate their own uncertainty, and adjust their decision-making accordingly. When we showed workers a visualization of the AFST score with an uncertainty interval around it, they generally found it unhelpful, confusing, and easy to ignore. But they did resonate with a form of uncertainty that highlighted how unusual a case was. Workers were interested in having tools that drew their attention to unusual features, which would heighten their sense of uncertainty and prompt them to be more cautious.
- Explore ways to directly involve workers in co-designing appropriate predictive targets for AI tools, as well as appropriate ways to evaluate AI-assisted human decision-making. Pushes for data-driven decision making often go hand-in-hand with data-driven ways to evaluate decision quality. But as participants pointed out, the metrics on which AI tools are built often clash with the way they actually make decisions. Measurements that rely on outcomes that the model is trained to predict may artificially disadvantage workers who are internally “optimizing” for different decision targets — ultimately obscuring the question of whether the AI truly complements workers’ decision-making. So what can tool builders do instead? We suggest drawing not just from the metrics that are easily available, but also workers’ notions of what it means to make a good decision.
Broader challenges for AI in social decision-making
Many of our findings reflect broader themes in social decision-making, a concept that ties together research on human-AI interaction applied to criminal justice, social work, business, and other domains.
At its core, social decision-making is about people — making inferences and predictions about their intentions, behaviors, or beliefs. And because of this, decision-makers in social contexts…
- …often work with nuanced, unstructured knowledge. As developers of AI tools, we often assume that the variables we have at our disposal will capture the information most relevant to the decisions at hand. But in domains like child maltreatment screening, human decisions can hinge on crucial information (e.g., details mentioned during a phone call) that was never encoded in a database. How can we develop AI tools that account for, and complement, workers’ use of unstructured knowledge?
- …recognize that people won’t always agree on what decision should be made. For example, in social media content moderation, automatically flagging posts as “toxic” can invite controversy when people don’t agree on what constitutes rude or disrespectful speech. Yet such disagreements should not simply be dismissed as “noise” to be smoothed away: they may represent genuine differences in perspectives and values across different groups of people. How might we effectively assess decision quality when there is no single, uncontested ground truth about what it means to make a “good” decision?
- …operate under uncertainty that can’t be reduced by collecting more information. Because these decisions come down to human behavior in the end, at some point it becomes impossible to make better predictions about what will happen in the future no matter how much data we collect. While it’s easy to call this “variance” in the long term, human decision-makers have to reckon with this uncertainty in every decision. How can we help people use data to reduce uncertainty — and how can we help them realize when they should feel uncertain regardless of what the data says?
Understanding these challenges is particularly important to tackle social decision-making, and many other applications for AI as well. But often, it’s easy to recognize that these challenges exist and hard to actually design something that overcomes them. Our work shows how engaging with experienced decision-makers, such as hotline call screeners, can lead to tangible new directions for algorithm and interface development. They can shed light on why existing solutions aren’t working. And they can push us as designers and developers to solve harder problems, yielding more effective and more appropriate forms of human-AI partnership.
Interested in learning more about AI in social decision-making? We invite you to check out our paper, as well as these other recent books and articles:
- A Human-Centered Review of the Algorithms used within the U.S. Child Welfare System (Devansh Saxena et al.)
- Improving Human-AI Partnerships in Child Welfare: Understanding Worker Practices, Challenges, and Desires for Algorithmic Decision Support (Anna Kawakami et al.)
- Automating Inequality (Virginia Eubanks)
- Shifting Concepts of Value: Designing Algorithmic Decision-Support Systems for Public Services (Naja Holten Møller et al.)
- The disagreement deconvolution: Bringing machine learning performance metrics in line with reality (Mitchell Gordon et al.)
- How to recognize AI snake oil (Arvind Narayanan)
- A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores (Maria De-Arteaga et al.)
- The games we play: critical complexity improves machine learning (Abeba Birhane and David Sumpter)
(Post written by Venkatesh Sivaraman, Anna Kawakami, Haiyi Zhu, and Ken Holstein)