UX research methods for testing generative AI

Martina Dove PhD - Senior UX Researcher
Bootcamp
Published in
5 min readNov 11, 2024
Picture shows two lego robots side by side
Picture credit: https://unsplash.com/@helloimnik

A typical procedure for testing any new products or services, is to design a prototype and do some user research to see if:

· The concept makes sense — is it easily understood and appreciated.

· Usability is good — intended users can understand labels, calls to action and information and complete tasks easily.

Whilst this research process can be done with a good prototype, it is not easily applicable to gen AI tools, such as digital assistants, bots or information generating tools or features like Copilot or Gemini.

A traditional product or a feature typically has an ideal user journey or a flow, a path the user would take to accomplish a task. The focus is on the product team to make this journey intuitive and easy to accomplish. That journey is typically the simplest way for the user to accomplish their goal, and whilst some users take unusual paths, well designed interaction should influence most users to take the quickest route.

User flows are unpredictable with generative AI

Generative AI output is not something that can be easily predicted, mapped, and prototyped. Each interaction is highly unique combination of the user’s input and the tool’s output, based on the quality of the database. Additionally, gen AI also has its own quirks, and sometimes the same questions produce slightly different results. User journey also depends on the prompting skills of the user, conversational design and accuracy of the data. For example, a user can be highly skilled in prompting, but may encounter inaccuracy, resulting in poor and irrelevant outputs, which will likely lead to prompt refinement and frustration. The same user, using the same gen AI tool, but with better quality data, will have a completely different experience.

Picture shows a diagram that depicts how users prompt and receive information from generative AI tools
User — AI assistant/bot interaction

Therefore, there is an added layer to generative AI tools, which touches on usability, but which is not usability as we know it. This interaction between a user and a tool is difficult to explore using a prototype. Additionally, prototypes typically will assume the most ideal scenario, whereas users are typically unpredictable. They often take cues from past experiences with different products, available information and UI elements. Their behavior is also often down to their personality (e.g., reads documentation vs learns by doing etc.).

So how can you effectively test generative AI tooling such as digital assistants or similar? Well, I had to think about this as I worked on such tools and here is what worked for me.

Usability

Exploring usability can be done by referring to usability heuristics or rules that designs should follow to optimize product usability. NN Group defines usability heuristics as ‘general principles for interaction design’, and I often use them to conduct UX audits.

Research can often be costly, in terms of time and resources (paying participants or having to recruit employees for internal tooling). In comparison, performing a UX audit to spot common heuristic violations can be quicker and easier, allowing design teams to iterate quickly. I often reflect on these heuristics to guide me when doing the audit, because it reduces bias and personal preferences and results in focused actionable recommendations for improvement.

Picture shows a list of 10 usability heuristics by NN group

How to:

  1. Keep the usability heuristics handy and create steps or prompts that test each heuristic, making sure that you ask about things you know are not in the database, to examine how error messages and dead ends are handled.
  2. Note if there are any shortcuts and if not, what would constitute as one (e.g., suggested follow up prompts). Is there enough information to guide the user, are users being informed while output is being generated etc.
  3. Look at outputs and evaluate for consistency — do outputs follow industry standards and is the tone and voice consistent throughout.
  4. Provide a color-coded overview of findings, so your team knows what to prioritize first, as you identify areas of improvement (see my grid example). Take plenty of screenshots for your team.
Picture shows a table grid identifying opportunities for improvement, according to 10 usability heuristics.

Output accuracy

Accuracy is equally important and should not be ignored. This can also be tested without participants.

How to:

  1. First, database should be checked for old documents and information, which is no longer valid, to improve overall accuracy.
  2. Define areas of information that you want to test (go broad so you can catch any old information).
  3. Create ideal prompts to get to the information you want to cover and start prompting.
  4. For each prompt you may want to define parameters to track, such as accuracy and completeness of the information (e.g., did not need extra prompting), available sources and are they accurate, what happens when the information is not available, is the tool generating a good information summary or is information missing from it etc. What you deem important will vary depending on the focus of your audit.

Expectations and mental models

The hardest part is the unpredictability that users bring to the mix. One thing to remember is that for a long time, we searched for information using keywords, so many people have established mental models on how to search for and get information they need. Generative AI is more conversational and needs more elaborate prompts to generate better outputs. This is where the understanding of the expectations and behavior of your user base or personas becomes very important if you want to foster adoption.

How to:

  1. Do a usability study/cognitive walkthrough of the AI tool, where participants are able to search for what they want (to see what they would do in the wild) but also complete set tasks (e.g., find information about X — so some generalizations can be made).
  2. Pay attention to how they prompt and ask why they prompt this way and what type of output they expected and why.
  3. Ask if the prompt gave them all the information they were hoping to get (success metric) and note what they do to refine the prompts (e.g., are they making the prompt more conversational or using extra keywords)

Having this knowledge is important as it can be used to create suggestive prompts and onboarding information/instructions, which can drastically reduce frustration that comes from unrealistic expectations of what AI is capable of.

Why this matters?

These methods can help the product team improve the experience in many ways.

  1. Find and fix problems that can affect usability and add any needed affordances that are missing.
  2. Optimize accuracy and even identify areas where LLM can benefit from some tweaking.
  3. Establish user expectations and existing knowledge around generative AI use, to outline what type of onboarding information and help is needed to help users onboard quickly and reduce any barriers to adoption.

Tackling these areas can greatly improve the experience and foster positive first impressions, which will help adoption in the long run.

--

--

Bootcamp
Bootcamp

Published in Bootcamp

From idea to product, one lesson at a time. Bootcamp is a collection of resources and opinion pieces about UX, UI, and Product. To submit your story: https://tinyurl.com/bootspub1

Martina Dove PhD - Senior UX Researcher
Martina Dove PhD - Senior UX Researcher

Written by Martina Dove PhD - Senior UX Researcher

Mixed methods UX researcher with a background in psychology. Also a published author with expertise in cybersecurity, fraud & social engineering.

No responses yet