Interaction Design Policies: Design for the opportunity, not just the task.

Published in

People + AI Research

10 min readJan 20, 2024

By Mahima Pushkarna, design lead of the People + AI Guidebook.

This post previews more of our forthcoming generative AI updates to the People + AI Guidebook, which offers practical guidance for designing human-centered AI products. Hearing from you is necessary to help us provide timely and useful guidance. Please share feedback in the comments, or email us at pair-guidebook@google.com. Follow People + AI Research here on Medium for ongoing previews of our Guidebook content.

Generative AI gives people the ability to control and customize AI using everyday, conversational language alone. For example, a graphic designer can create an image using a text-to-image (TTI) model by providing a description like, “A pastel castle with turrets and a drawbridge, like one you’d find in a children’s book, rendered as a realistic photo.” Or a web developer at a startup that sells strong coffee products can use a large language model (LLM) to help with software development by entering a request such as, “Create a Python script that implements a recommendation system for a website that sells canned coffee.” The script should be able to recommend coffee products based on their previous purchases and the ratings of similar coffee lovers.

This new ease marks a really noteworthy advancement in AI: Anyone can now describe what they want AI to do in everyday language, and the AI can often understand them. While the outcomes aren’t always perfect, this is a significant change in how easy it is for anyone using a GenAI product, like Google’s Bard, to customize and control AI to fulfill very specific individual goals.

For example, the prompt box in Bard can have an enormous variety of uses: to debug Python code, write a thank-you note, plan road trips, or even generate an 8-color palette in hex codes from an image. In our conversations with design teams across Google working on GenAI features, we discovered that a common thread is designing for such open-ended scenarios.

More flexible user journeys

GenAI models are capable of producing different outcomes with the change of a single phrase, or with the addition of very convincing details — like names, events, and links to webpages. So when people use a GenAI product and frame the same problem with different prompts, new complexities arise for user experience (UX) designers. For example, one parent using a service like Bard may type, “Help me plan my daughter’s birthday party,” while another might type, “Plan a dinosaur themed birthday party for an 11-year-old.” Instead of designing for specific tasks –which is the approach for designing AI — for generative AI, specifically, it’s necessary to look for clusters of similar tasks and the variations that exist within each task.

In addition, different people experience the same design problem in different ways. Consider the well-documented history of inequitable AI outcomes for people whose communities have been marginalized across different dimensions of identity. Their expectations of how GenAI systems may perform differently for them might vary from the way majority groups expect the GenAI system may perform for them.

Keeping these factors in mind, to complement critical user journeys (CUJs) for our own GenAI products at Google, we needed a new approach that offers more flexible user journeys. This requires collaboration between engineering and UX teams to make well considered product decisions from the outset.

Designing for “interaction design policies”, not just tasks

It’s important to begin by defining how a product responds to different inputs from different people and how individuals can safely navigate model outcomes at “critical moments” in a user journey. These are moments when the user interacts with the AI system in your product. These moments are opportunities for users to update their mental models of the AI product and calibrate their trust in it. With the wide variety of open-ended outcomes of GenAI powered products, it is crucial to have clear policies in place to offer a consistent experience for users. Teams can use these policies to define machine learning (ML) and UX requirements early during the product development process, with fewer unanticipated outcomes.

Interaction design policies are the set of criteria that govern the user’s experience at the point in a critical user journey (CUJ) where the user interacts with an AI system in a product.

Interaction design policies are made up of four key parts centered around a critical moment: acceptable actions, unacceptable actions, thresholds of uncertainty, and vulnerabilities. Interaction design policies ground the design of the experience and the AI system in the context of the user, weighing the consequences of AI outcomes and translating UX research insights into product and model decisions.

Anatomy of an interaction design policy

With most GenAI products, users can perform a variety of tasks at the critical moment in a product journey to meet different objectives. Acceptable actions defined in a policy help designers and engineers answer how GenAI systems can help people meet their objectives within the user journey. Identify acceptable actions by evaluating clusters of tasks that people typically perform, as well as variations within the tasks.

Equally important are unacceptable actions, which help identify the kinds of outcomes that we want to discourage — even if the user unintentionally asks for these. Avoid making any assumptions that users know how to control GenAI models correctly, or that the GenAI model will produce outputs that line up 100% with the user’s expectations, even if the outcomes are correct. Understanding how people frame problems, what sub-goals they need to achieve, and what parts of the problem people frequently under-specify are useful signals to identify unacceptable actions.

Image of the Anatomy of an Interaction Design Policy. At a critical moment, establish, acceptable actions, Unacceptable actions., Thresholds of uncertainty and Vulnerabilities. — *Interaction design policies are made up of four key parts centered around a critical moment: acceptable actions, unacceptable actions, thresholds of uncertainty, and vulnerabilities.*

AI results are rarely error-free, so your team will need to make an informed trade-off between the benefits of getting predictions right and the consequences of getting some wrong. Errors introduce vulnerabilities for people. Keep in mind that unlike classifiers and recommenders, model confidence scores aren’t available for all GenAI systems. In these scenarios, we propose framing uncertainty in model outcomes in terms of a user’s capacity to recover from weak prediction in their user journey. Unlike errors or risks that can actively harm users, weak predictions can reduce performance or slow a user down. These help set thresholds of uncertainty necessary to decide the limits within which AI outcomes can be surfaced in the product, while determining feedback mechanisms and controls necessary to ensure a useful and helpful user experience.

Acknowledge that errors can seep through the user experience. It’s important to identify mitigations and adversarial tests from the get-go. To that end, take a people-first approach to defining the classes of errors that models need to be evaluated for, in addition to typical model performance metrics.

While errors are usually easy vulnerabilities to identify, risks may be harder to diagnose and predict. For example, ask what risks are posed to a user or the people they interact with when they cannot recover from an error? Inversely, what are error-free outcomes that can have negative consequences for users without them knowing it? Identify the types of sociotechnical harms (such as representational, allocative, quality of service, interpersonal, or societal harms) that product teams may unwittingly leave users vulnerable to and which introduce risk.

Many errors and risks can be identified by auditing the product journey with potential users or third-party experts, assessing violations of an organization’s values, and from local and regional legal and compliance requirements. This allows time to plan for necessary UX interventions, (for example, instituting a no-show policy) and ML requirements (for example, conducting adversarial tests, adding safety classifiers). Use knowledge of common GenAI errors and risks to plan feedback mechanisms and escalation paths that users can leverage in case a bad model outcome gets past interventions and requirements.

Each part of an interaction design policy can be framed as people-focused statements describing your product’s critical moment. Use this image as a worksheet in your product development process to inform UX and ML requirements for more robust and resilient GenAI experiences.

While there are several approaches to evaluating and acting on GenAI model outcomes, we’ll discuss these in a future post. To ground interaction design policies in real world product experiences, we’re sharing how we’ve articulated prompts that help frame each part of the design policy as people-focused statements. For example, “When GenAI predictions are weak, people won’t mind being asked to…” is a good way to start framing tasks or actions that users are willing to perform when they encounter a weak prediction. Weigh the effort these actions require against the value delivered by your product to determine what a weak prediction is. These statements help leverage UX research and participatory design practices to capture insights from the lived experiences and contexts of our users.

You can download these as a worksheet here.

Case-in-point: Plannerific

A mock of a GenAI feature in the hypothetical event-planning app Plannerific. The feature helps people create beautiful, multimedia invitations from a user-provided prompt and some additional data from content previously provided about the event.

In our GenAI update to the People + AI Guidebook, we introduce two new hypothetical genAI applications to illustrate our guidance. We developed it with feedback from teams across Google, founders in Google for Startups, and the Equitable AI Research Roundtable.

Plannerific, one of the Guidebook’s hypothetical product examples, is an end-to-end planning app that helps plan any event, big or small. Plannerific strives to ensure that all GenAI features are inclusive across broad dimensions of identity. This means accounting for the fact that many events and festivals are celebrated by different communities using their local traditions and cultures.

One critical moment in an event host’s user journey with Plannerific is when they interact with a GenAI feature to create beautiful, multimedia invitations. Let’s take an example in which the event host wants to craft an invitation for a Mehendi or henna ceremony for their friend. This is a pre-wedding celebration for the bride typical to cultures in the Indian subcontinent and the Middle East. These are primarily female-centric celebrations which may be held in Hindu, Muslim, and Christian weddings.

The interaction design policy for Plannerific’s multimodal invite generation feature breaks down the experience of drafting invitation content into clear criteria for a safe, useful, and beneficial experience. These, in turn, are used to derive important UX and ML interventions.

Acceptable and unacceptable actions:

In this case, the goal of the model’s output is to produce text and imagery that’s appropriate for the cultural context of the Mehendi celebration, and is inviting enough that guests from different cultures feel like they’re set up to attend the event without any cultural faux pas. Another goal is to make sure that the model outputs don’t send any messages that might unintentionally introduce caricatures or promote unfair stereotypes — even if the host intends such a message to be playful.

A UX requirement might be to evaluate how different user groups express gender or culture, and how they experience gender-related harms, cultural tropes, or stereotyping in the context of a variety of cultural / religious events. The results can then be applied to uniform model evaluations in which ML model performance is gauged using metrics to capture correlations between culturally-ambiguous inputs and culturally-sensitive outcomes.

Thresholds of uncertainty:

If a user wants to use GenAI to help them write content, assume that it’s easier for them to correct a few words rather than rewrite the text of an entire invitation. Users may not always explicitly specify gender attributes in their inputs to the model, so producing gender-agnostic language in the absence of any gender indicative terms is a reasonable approach.

A UX requirement may be to run studies to understand users’ preferences between gendered and non-gendered outcomes for gendered input. Giving users control to pick from alternatives and edit generated content gives them the agency to decide when and how to introduce gendered language in their invite. This, in turn, establishes an ML requirement in which edits to an AI outcome update the context for the AI model in the next turn / call.

Vulnerabilities:

As discussed, GenAI introduces a whole new set of vulnerabilities for users, such as errors in the generated content in the form of “hallucinations.” In the Plannerific example, this could look like incorrect dates, venues or times. These can be easily remedied by ensuring that users review and validate this information before sending the invite. The success of social celebrations are based on building and preserving relationships, so it’s important to prevent risks from the presentation of unfairly biased, culturally offensive or insensitive generated content in the Plannerific invite.

Mitigations against this type of harm could include identifying a new set of adversarial inputs and blocking them out, the ML team introducing post-hoc safety classifiers to filter out unsafe content, and the UX team designing a flow to ask the user to try again if all model outcomes are unsafe or contain offensive content.

Conclusion

Adding interaction design “policies” can also reflect brand standards or UX guidelines. Operationalizing these mandates within the user experience means that product teams design in terms broader than a single task with specific outcomes. Interaction design policies can help with the design of more robust and resilient GenAI experiences further upstream in the product development process, while helping to fulfill compliance requirements to prepare for AI regulation readiness, too.

There’s no one way to get started with interaction design policies. You might choose to define your CUJs on the basis of a single interaction design policy, or craft different interaction design policies based on a specific CUJ. Rather than thinking of these policies as end artifacts, think of them as boundary objects — used for collaborative decisions to inform the development of GenAI models and the interfaces people use to interact with them. Whichever approach you take, remember that the best interaction design policies are those crafted from research with the people who will use your product.

Interaction design policies are a novel future addition to the People + AI Guidebook, so we encourage you to try using them in your product design process and give us your feedback in the comments section. We’re excited to learn from your experience, and collaborate with you as we navigate this new paradigm.

Acknowledgements

Rida Qadri, for helping us extend interaction design policies for building culturally inclusive genAI. Josh Lee, Chris Butler, Carston Johannsen, Dan Littlewood, and Vanessa Milan for parsing the relationship between critical user journeys, design practices, and interaction design policies. Quinn Madison, Pratheek I, and Iris Chu for giving us valuable feedback and sharing their design experiments; Victoria Wirtala and Kylan Kester, for helping us validate Interaction Design Policies with Google for Startups founders and their teams. Reena Jana, Lucas Dixon, Michael Terry, and Ayça Çakmakli for giving feedback on this post.