Building Explainable AI (XAI) Applications with Question-Driven User-Centered Design
Two years ago, IBM Research released AI Explainability 360 (AIX360), an open-source toolkit to help machine learning developers use state-of-the-art algorithms that support explainability and interpretability of ML models. Since then, we have been actively working on leveraging these explainable AI (XAI) techniques to build explainable AI applications, including teaming up with IBM Design for AI to make IBM’s AI offering more explainable and trustworthy.
In this post, I would like to share some lessons we learned. I will discuss the rich application opportunities of XAI, the need to develop XAI with a user-centered approach, and a method you can use to pick the right XAI technique (e.g., using AIX360) and create user-friendly XAI applications. We will delve more into the technical details of XAI algorithms in a future post of this series. You can also check out the Resources and Demo parts of AIX360 to learn more about some popular XAI techniques.
Recently, AI explainability has moved beyond a demand by data scientists to comprehend the models they are developing. It is now frequently discussed as an essential requirement for people to trust and adopt AI applications deployed in numerous domains, fueled by regulatory requirements such as GDPR’s “right to explanation”. Examples of XAI features have already emerged in data analytics tools, healthcare applications, and consumer recommender systems. With the availability of open-source XAI toolkits like AIX 360, we are likely to see more and more AI applications placing explainability as a front-and-center element.
When you start thinking about how to make an AI application explainable, you may quickly realize that it is difficult to pin down a specific solution. The Guidance for AIX360 gives a glimpse of the rich, and still rapidly growing, collection of XAI algorithms. More fundamentally, if we define the success of explainability as “making AI understandable by the user”, XAI applications must be developed based on what the user needs to have such an understanding: What do they already know? What do they hope to achieve with this understanding? What aspects of the AI do they want to understand? However, these questions can be challenging to answer because users of XAI are far from a uniform group.
Who needs explainability and for what?
The short answer might be “anyone who comes in contact with AI”. Here are some common user groups that may demand explainability and what they may use AI explanations for:
- Model developers, to improve or debug the model.
- Business owners or administrators, to assess an AI application’s capability, regulatory compliance, etc.
- Decision-makers, who are direct users of AI decision support applications, to form appropriate trust in the AI and make informed decisions.
- Impacted groups, whose life could be impacted by the AI, to seek recourse or contest the AI.
- Regulatory bodies, to audit for legal or ethical concerns such as fairness, safety, privacy, etc.
This not an exhaustive list, but enough to illustrate the diverse types of people and their diverse needs for AI explainability. What further complicates this issue is that the same user may also need different kinds of explanations when they engage in different tasks. For example, a doctor using a patient risk-assessment AI (i.e., a decision-maker) might want to have an overview of the application during the on-boarding stage, but delve into AI’s reasoning for a particular patient’s risk assessment when they treat the patient.
One way to think about a user’s explainability needs is what kind of questions they would ask to understand the AI. Below I list some common tasks that users perform with AI, and questions they may ask to complete the task. (To access a GitHub Gist version of this chart that is accessible by a screen reader, click here)
Question-Driven Explainable AI
These examples demonstrate that, while there are many types of users and many types of tasks requiring XAI, we can understand users’ explainability needs by the kind of question they ask. In our human-computer interaction (HCI) research, published as a paper at the ACM CHI 2020 conference which won a Best Paper Award Honorable Mention, we looked across 16 AI products in IBM and summarized common questions their users would ask. Based on them we developed an XAI Question Bank, listing common user questions for AI that are categorized into 9 groups (bolded in the examples above):
- How: asking about the general logic or process the AI follows to have a global view.
- Why: asking about the reason behind a specific prediction.
- Why Not: asking why the prediction is different from an expected or desired outcome.
- How to change to be that: asking about ways to change the instance to get a different prediction.
- How to remain to be this: asking what change is allowed for the instance to still get the same prediction.
- What if: asking how the prediction changes if the input changes.
- Data: asking about the training data.
- Output: asking what can be expected or done with the AI’s output.
- Performance: asking about the performance of the AI.
These questions show that end users are interested in a holistic understanding of an AI application. Hence explainability should be considered broadly, not limited to explaining the model internals, but also providing explanatory information about the training data, performance, scope of output, among other dimensions.
Once we know the user questions, we can choose the right XAI algorithms or explanation content based on what is asked. Below is a suggested mapping of XAI techniques and the kind of explanation they can generate for the 9 question groups. Here we focus on techniques available in AIX 360 and other IBM Research’s Trustworthy AI Toolkits. Many more XAI techniques in academia work and other XAI toolkits can be mapped to these user questions. (To access a GitHub Gist that is readable by a screen reader, click here).
Question-Driven User-Centered Design for XAI
One reason we wanted to map out the space of user questions and corresponding XAI techniques is to encourage product teams to follow a user-centered design (UCD) process — to start with understanding user needs, and using that to guide the choice of XAI technique. With UCD we can prioritize user experience and avoid paying technical debt. Towards this goal, working with IBM Design for AI, we developed a UCD method and a design thinking framework, following IBM Design’s long tradition of enterprise design thinking practices. Below we give a brief overview of this UCD method, which you can follow to build explainable AI applications. More details are described in our recent paper, with a real use case of designing an explainable AI application for patient adverse event risk prediction.
This question-driven XAI design method consists four steps, and is ideally done collaboratively by designers and data scientists (or performing both roles).
Step 1: Question Elicitation.
This is ideally a part of formative user research, or else carried out as an exercise with your team. After defining the AI tasks and/or user journey, elicit or come up with what questions your users may ask to understand the AI. Also articulate the intentions behind these questions and expectations for the answers. When access to real users is limited, our XAI Question Bank can be used as a customizable list to identify applicable questions.
Step 2: Question Analysis.
Cluster similar questions into categories and identify priorities, e.g. by ranking the quantity of questions collected. The XAI Question Bank can be used to guide the categorization. You should also cluster and summarize user intentions and expectations behind the questions collected in Step 1, to identify key user requirements for the XAI user experience (UX).
Step 3: Mapping Questions to XAI Solutions.
Identify candidate techniques for each prioritized question category. Depending on what kind of model is used, your team may need to implement your own solutions to get the model internals or other facts, or use an XAI technique suggested in the mapping chart above. You should also pay attention to the detailed questions users ask. For example, it is often helpful to decide on what specific facts about the model performance and data should be provided based on your users’ questions. With the mapping output, data scientists can start the implementation and designers can proceed with creating a design prototype in Step 4.
Step 4: Iterative Design and Evaluation.
Create an initial design with the set of candidate solutions identified in Step 3. Evaluate the design holistically with the user requirements identified in Step 2 to identify gaps, ideally with user feedback. Iteratively improve the design to close the gaps. Designers and data scientists should continue having frequent touch points as design iterations can impact modeling solutions. New questions and user requirements can emerge and should be incorporated in future iterations.
For example, below is a design prototype we created for the healthcare risk prediction AI use case described in our paper, informed by the user questions and requirements the team gathered through user research early on. (For a GitHub gist that shares a description of this prototype shown, click here)
Building trustworthy AI requires centering the technical development around the needs of users and other stakeholders. Making AI explainable is a first and critical step. Here I have shared an example of our work at IBM across organizational boundaries of research, design, and product to innovate the AI development process from within. Together with IBM Design for AI, we are working on embedding this kind of design thinking for explainability and AI ethics broadly into as many IBM AI product teams as possible.
To learn more about AIX360, please visit the home page or join the AIX360 Slack channel to ask questions and learn from other users. Also, feel free to look at the page for our Intro to XAI course that we gave at CHI 2021.