User Research Makes Your AI Smarter

Some things we’re learning about doing UX research on AI at Microsoft

Penny Marsh Collisson

Published in

Microsoft Design

6 min readJul 1, 2019

By Penny Marsh Collisson and Gwenyth Hardiman, illustrations by Michaelvincent Santos.

As AI grows more prevalent, it’s changing what people expect from technology and how they interact with it. That means that every UX-er and customer-obsessed product person needs to consider not only how to create effective AI experiences, but also how to collect customer feedback along the way. Easy…right?

The good news is that many traditional research tools can help gauge customer reactions to AI. Ethnographies, focus groups, prototype research, customer surveys, and logs are all still relevant. However, AI systems differ from classical systems in that they’re context aware, personal, and able to learn over time. They also have limited accuracy and unique failure modes. These things introduce new challenges and opportunities when researching the UX of AI.

Today we’ll share practical tips for researching the UX of AI that we’ve learned along the way at Microsoft.

Diversify your recruit

As UX-ers, it’s our responsibility to ensure that the experiences we deliver embrace diversity and respect multiple contexts and capabilities. That’s especially important with AI. If your AI UX is only usable for a subset of users, potentially harmful bias will creep into your AI models. An arbitrary sample of participants, or even a split on basic demographics like gender and age, will not be enough to ensure your AI is inclusive.

Even during early feedback stages, recruit for a wide array of characteristics such as these:

Attitudes toward AI and privacy
Profiles of tech adoption
Levels of tech self-efficacy
Geographies
Social contexts and norms
Physical, cognitive, or emotional abilities

Fake it till you make it with Wizard of Oz techniques

During early prototyping stages, it can be hard to get a good read on how your AI is going to work for people. Your prototype might be missing key functionality or interactivity that will impact how participants respond to the AI experience. Wizard of Oz studies have participants interact with what they believe to be an AI system, while a human “behind the curtain” simulates the behavior that the AI system would demonstrate. For example, a participant might think the system is providing recommendations based on her previous selections when a person in another room is actually providing them. When people can earnestly engage with what they perceive to be an AI, they will form more complete mental models, while interacting with the experience in more natural ways.

Integrate people’s real stuff into your AI prototype

If study participants see generic content, their reactions may mislead you. People respond differently when the experience includes their real, personalized content such as photos, contacts, and documents. Imagine how you feel about a program that automatically detects faces in photographs. Now, imagine seeing the faces of your loved ones identified by the system. Your reaction may be very different when you see people you know. You’ll need to spend extra time pre-populating your prototype with people’s “real” content, but it will be worth the effort.

Reference a person instead of an AI

AI has a lot of hype and folklore around it. For that reason, referencing AI can cue participants to make certain assumptions — both good and bad — about their experience. For example, participants might key in on highly publicized stories of bias or failure in AI systems. Or they could assume AI is more capable and perfect than it will ever be. Getting participants to think about how a human could help them can be a good way to glean insight about where AI can be useful.

Here are some alternatives to talking about AI:

Invite participants to share how they currently enlist other people to achieve their goals.
Ask participants how they would want a human expert to behave.

Understand the impact of your AI getting it wrong

AI isn’t perfect. It’s probabilistic, fallible, and will make mistakes. Especially early in the design cycle, it can be easy to create perfect prototypes and get an overly optimistic response to your UX. While planning evaluations, build in realistic quirks or pitfalls to bridge the gulf between the shiny concept and realistic product execution. Once you understand how your AI’s failure modes impact people, you can design to mitigate their impact.

Here are a few methods to consider:

Intentionally introduce things into your prototype that are likely to be “wrong.”
Ensure that system interactions in your Wizard of Oz studies include different kinds of errors.
Take participants down different paths: things are right, a little right, a little wrong, totally wrong.
Invite conversation about where failures would be most impactful to their experience.

Dive into mental models

People don’t need to understand the nuts and bolts behind the technology powering AI to have a positive experience with it. But they need a mental model of the system — what it delivers, when, and why — to have realistic expectations of the system’s capabilities. It can be easy to assume that people correctly understand how your AI works, when frequently their understanding is wrong (even if they’re confident about it!). Once we locate the gaps in people’s mental models, we’re better equipped to shore them up with our designs.

To understand how participants envision your AI system, try this:

Ask participants to write down the “rules” for how the system works. For example, give them a result and ask them to explain why and how the system produced it.
Have participants imagine that a human gave them a specific result. Ask what it is about the data, or their interactions, that would have caused the human to give them that result.

Highlight exceptions as well as trends

People will have different experiences with an AI depending on their context, the content they bring in, and the way they interact with the system. There are challenges with extracting qualitative insights around AI systems based on what most people do, or how they react, when every person’s experience is so personal. As you roll up results, pay close attention to outliers. Understand why participants had the unique experience they did within your sample. This is particularly important when evaluating the experience across a diverse audience.

Talk with us

These are ideas we’ve been recommending to other people researching AI, but we’re sure there are other ideas out there. What do you think? What did we miss? We’d love to hear about your experiences. Please share them in the comments. You can also catch Penny, Gwen, and Michaelvincent on LinkedIn.

Authors

Gwenyth Hardiman is a user researcher working on AI in Office.

Michaelvincent Santos is a designer working on collaboration in Office.

Penny Collisson is a user research manager working on AI in Office.

With special thanks to Trish Miner, Erin Wilcox, Joline Tang, Thomas Dunlap, Paul Scudieri, Sara Bell, Brandon Haist, Jared LeClerc, Andrew Lambert, Joshua Tabak, and Nitya Nambisan for their teamwork and contributions.