Design, Test, and Repeat

Published in

MHCI ’24 Capstone | Consumer Reports

10 min readJul 2, 2024

In Sprint 7, we consolidated stakeholder feedback from our collaborative session in New York and conducted our first user tests to validate the riskiest concepts that made up Negotiation Helper. e conducted 5 Wizard of Oz tests and synthesized our findings, which we used to rapidly prototype a second iteration of our service. We continued this cadence of rapid prototyping, testing, synthesis and iteration in Sprint 8. Thus, we now find ourselves swiftly transitioning from the Develop phase in the Double Diamond Framework to the Deliver phase; conducting rapid testing to determine what works/does not work, and making design changes until a final solution emerges that effectively mediates customer service interactions between consumers and businesses.

Findings from Round 1 Prototype Testing

Prototype Testing Round 1: What, Why, How
To recap, in our initial prototype testing, we focused on evaluating the following concepts:

Compiling Company Contact Information
Customer Service Ratings (Consumer Reports)
Customer Service Reviews (Community and Social Proof)
Simultaneous Multi-Modal Interaction (Customer Service Call & Text-Based Tips)
Building Arguments Using Text-Based AI-Generated Tips
Call Summary & Transcript

These concepts were prioritized in our first round of testing as they represent our riskiest assumptions and were highlighted by our clients during the Buy A Feature activity in our collaborative session.

We conducted this initial round of testing with 5 participants using a Wizard Of Oz testing protocol. Participants were provided with a customer service issue scenario (an Xfinity overcharge) and called an Xfinity customer service “agent” (portrayed by a teammate). During the call, they received real-time text tips and policy insights from a “CR agent” (another teammate).

The objectives of this testing round were:

To gather insights on users’ preferences and needs when contacting customer service.
To identify any challenges or difficulties users face with the design of the Negotiator Helper prototype.
To investigate potential differences in user experience based on age groups, especially concerning multi-modal/sensory interactions.
To assess participants’ reactions to “unconventional features,” specifically customer service metrics, community reviews, and real-time text arguments and tips.

Prototype Testing Round 1: Overall Findings
Overall, our tests revealed that —

Information overload: Simultaneous text messages during calls were overwhelming for all users; reading and processing text tips while on call proved to be a challenge for them.
Information before the call: Many users expressed a desire to receive information and actionable tips before initiating the call with customer service. This could include insights into company policies, scripts for navigating the conversation, or summaries of relevant reviews.
Concise information presentation: There was a preference for clear and concise information presentation. Users found the combination of star ratings, percentages, and written reviews overwhelming.
AI agents preferred over human agents: Some users expressed a preference for AI assistance over human agents listening to their calls.
Limited use cases: A few users preferred contacting their credit card companies or using official company apps for customer service issues.
Tips deemed valuable: Most users found policy arguments and tips valuable.

Further feature-specific findings are discussed in the section below.

Synthesis & Design Changes

What did we learn from this round of testing? To make sense of all the raw findings, we came up with the F.U.N. evaluation framework — to categorize a specific feature as Favorable (something that enhances the user’s experience), Unfavorable (something that takes away from the experience), or Neutral (the user is indifferent to the function of this feature).

Synthesis Data Sheet

Rainbow chart of concepts, features, and their respective user desirability under FUN framework. The concepts are ranked from most favorable to least favorable from top to bottom.

#1: Call Summary & Tips

After consumers call the company, Negotiation Helper displays a summary of the interaction with details of company name, contact date/time, and issue(s) discussed along with the claim result. This received overall positive feedback as the majority of participants are favorable of it with some being neutral and none unfavorable. Some users even requested additional post-call communication like confirmation emails. Thus, overall no design changes will be made upon this feature and it will not be tested in round 2 prototype testing.

#2: Building an Argument Using LLM

The majority of participants still found it favorable for Negotiation Helper to help structure arguments/tips for the service request. P4 found it unfavorable as they felt the tips were too confrontational, when they did not even wish to have to call the company to deal with this conflict in the first place. P5, while neutral, did also express similar sentiment that providing argument assistance contributes to a society of people who are rude to customer service agents. Due to some mixed reviews, this concept will be tested further in round 2.

#3: Centralizing Company Contact Channels

This feature serves like a “phone book” that collects contact information from product/service companies and works as a central hub for consumers in need of interacting with any company. While it theoretically addresses the pain point discovered from research that consumers struggle with finding company contacts, it appeared that all Round 1 participants found this feature to be neutral; as P1 & P5 shared similar sentiment of “why get a new app on their phone to contact companies when they could go to the provider’s app to find a phone line or chat?” On top of that, P3 & P4 further shared that they don’t have a desire to call and contact people, but would rather start with a chatbot associated with the company. Due to the mixed reviews, this feature will also be tested again in future testing sessions.

#4: CR Community Reviews

Before calling a company, we originally provide an overview of community reviews from people who have interacted with the given company in the past. Despite P3’s positive comment that seeing how other consumers solved similar issues in the past could inform them on how to “prepare for the call,” this feature was perceived as unfavorable to the majority of participants. P4 & P5 shared that these reviews are “setting oneself up for failure,” “riling people up beforehand to lead to unproductive or negative interactions.” In essence, there seems to be an element of emotional priming at play as users get negatively swayed by the sentiments of the reviews and become pessimistic about the outcome of the call. Due to this, we plan to test this feature further in round 2, however, it will be altered as described in Design Changes section below.

#5: CR Customer Service Ratings

Further down the line, we have data visualization of CR customer service ratings available at the given company’s profile page to give an overview of their effectiveness. This was also overwhelmingly unfavorable as a result of improper timing — it does not matter at this point in the purchase journey how good the customer service is since they have already bought something from the company. Similar to #4 as well, seeing these ratings right before calling the company only negatively primes consumers and provides no substantial help.

#6: Multi-sensory Interaction

Lastly, we tested the meat of our solution — providing real-time text tips while consumers are on a call with company customer service. This requires a person to divide their attention to auditory, visual, and cognitive stimuli before they also need to speak out their response after processing all the stimuli. All participants were unfavorable of this form of interaction as it was too cognitively demanding to engage in this multi-sensory interaction. Nevertheless, note that this form of interaction is separate from the concept of “providing tips upon customer service interaction,” which led us to thinking about alternative ways to present this concept to alleviate this burden while being helpful to users.

Design Changes

As we synthesize our findings, we made decisions on design changes along the way with the rule of thumb of keeping the favorable ones, differentiating dislike about the concept and dislike about the presentation, and then getting rid of the conceptually unfavorable ones and iterating on ones that just needed better presentation/altered timing.

User Choice of Interaction upon Landing Page

We have decided to keep this feature regardless because there was no unfavorable sentiments but high interest from our clients to enhance accessibility to companies. However, we will modify it to give more flexibility for users — these contact channels may be presented as either a phone book or access to company chatbots. This way, an user will be able to choose to call or chat with the company, or we take it even one step further in our next point.

Act on Users’ Behalf — Have “CR Wizard” Call for Me

As we discovered that people’s preferences on calling or texting differ based on personality, age, or other factors, we also identified an archetype that would naturally avoid any sort of confrontational situation where one needs to interact with the company to request for post-purchase services. Therefore we reintroduced the idea of “acting on users’ behalf”, as inspired by one of our talented clients who shared this concept in our co-design session, by having a CR Wizard persona calling the company on users’ behalf under supervision. Users are listening to the call and can correct CR Wizard or supplement details through text, or they can also completely take over the call from CR Wizard.

Mid-fi wireframes for Negotiation Helper — having CR Wizard do it for me

Reconfiguring the Timing of Tips

Since text tips in real-time call are cognitively overwhelming, we divide it into two setups and test separately — giving tips before users call, or providing tips real-time in a group chat among the user, the company customer service, and CR Wizard. We hope to use this procedure to evaluate the effectiveness of different modalities in which tips are provided.

Reframing Community Reviews as Actionable Tips

To mitigate the emotional priming effect of community reviews, we opted to not showing these reviews as literally written. Rather, CR will synthesize these reviews with LLM and convert them into actionable tips with objective wording before customer service interaction. These actionable tips will be combined with the built-in LLM argument construction to integrate all tips seamlessly as well as providing order details as pulled from the user’s CR account.

Moving CS Ratings to Pre-purchase Stage

Now that we know it is useless to inform people of company customer service performance after purchase, we test whether these ratings serve a better purpose if they are displayed while users are conducting product research to evaluate whether to buy something. At the same time, we hope to test whether users prefer quantitative, objective CR ratings or subjective but social-proofed community reviews. In order to test this, we will send out a survey to gather a larger response size to inform our decision.

Round 2 Prototype Testing Plan

As a result, the design modifications outlined above were integrated into the prototype for the second round of testing. Additionally, we included a new, highly risky concept: an AI agent acting on behalf of the user.

Prototype Testing Round 1: What, Why, How
In our second round of prototype testing, we aimed to evaluate the following concepts:

Asynchronous Multi-Modal Interaction (Tips Before Customer Service Call)
Synchronous Single-Modal Interaction (In-Chat Tips During Customer Service Call)
Building Arguments Using Text-Based AI-Generated Tips
Actionable AI-Summarized Tips from Customer Service Reviews (Community and Social Proof)
AI Agent Acting on User’s Behalf
AI Agent Error and Repairability
AI Agent Tone and Personability

This week, we conducted this round of testing with 10 participants using a Wizard Of Oz testing protocol. Participants were given a customer service issue scenario (TV replacement) and asked to either call or chat with the TV company’s customer service “agent” (portrayed by a teammate). Five participants called the customer service “agent” and received argument and policy tips before their call, while the other five participants engaged in a live chat with the customer service “agent” and received real-time text tips within their chat interface from the “CR agent” (portrayed by another teammate). Additionally, a test wherein a “CR Wizard” (AI agent portrayed by a teammate) calls customer service on behalf of the participant was tested with all 10 participants. The “CR Wizard” test was conducted twice: first with a regular-sounding AI agent that made no mistakes during the customer service call, and second with a “friendly” AI agent that purposefully made an error, with the participant having the ability to correct the agent via text message or take over the call by saying “Take over!”.

The objectives of this testing round were:

To determine the value of pre-call tips and/or real-time chat tips for customer service interactions.
To evaluate the effectiveness of pre-call tips and/or real-time chat tips in preparing customers for interactions with customer service.
To gather participant feedback on and reactions to an AI agent acting on their behalf to resolve customer service issues.
To evaluate the discoverability of AI errors.
To assess participant reactions to AI agent errors when acting on their behalf to resolve customer service issues.
To determine the preferred correction modality when the AI makes an error (text corrections vs. complete takeover).
To assess participant reactions to “regular” vs. “friendly” AI agents.
To determine the value of aggregated community tips prior to customer service interactions.

Next Steps

As we write this post, our final prototype test is in progress. Our next step is to analyze the findings from round 2 testing and synthesize directional insights over the next two days. During this synthesis, we will decide which features and concepts to retain, discard, or test further. After making additional design changes and establishing research goals and protocols for round 3 prototype testing, we aim to begin testing early next week. We plan to maintain this rapid prototyping and testing cadence until the second week of July to maximize feedback and rounds of iteration.

Whilst we continue to iterate on our prototype, we also plan to simultaneously research and account for technical feasibility and ethical and legal implications of our design. We will meet with CR’s engineering team to understand the technical considerations necessary to ensure our product-service is realistically achievable. Additionally, we will research the legal implications and feasibility of agentic technology, specifically AI agent acting on behalf of consumers.

(note: The work and knowledge gained from this project are only intended to be applicable to the company and context involved and there is no suggestion or indication that it may be useful or applicable to others. This project was conducted for educational purposes and is not intended to contribute to generalizable knowledge.)

Design, Test, and Repeat

Findings from Round 1 Prototype Testing

Synthesis & Design Changes

#1: Call Summary & Tips

#2: Building an Argument Using LLM

#3: Centralizing Company Contact Channels

#4: CR Community Reviews

#5: CR Customer Service Ratings

#6: Multi-sensory Interaction

Design Changes

User Choice of Interaction upon Landing Page

Act on Users’ Behalf — Have “CR Wizard” Call for Me

Reconfiguring the Timing of Tips

Reframing Community Reviews as Actionable Tips

Moving CS Ratings to Pre-purchase Stage

Round 2 Prototype Testing Plan

Next Steps

Written by Vaidehi