Your Call Is Important to Us — Speech UX Tips to Show You Really Care

Cari Jacobs
IBM Data Science in Practice
8 min readJan 5, 2021
Photo by yang miao on Unsplash

I have a passion for good UX. I also have great disdain for poor user experiences. Over the past few years, I’ve had the pleasure of working with many clients to design and build virtual agents that use speech technology. I’ve also enjoyed the opportunity to provide assessments for existing cognitive speech solutions. There is one fundamental concept which I find continually reinforced by these experiences.

“A good user experience doesn’t ‘just happen’…”

The biggest takeaway I have learned over the years is that a good user experience doesn’t ‘just happen’; it happens as the result of intentional, thoughtful design. This is not an easy thing to do in cognitive solutions (though it is vital for success). It becomes even more difficult when you add the complexity of speech-to-text and text-to-speech.

For that reason, I wanted to share some principles to keep in mind as you design and build your speech-enabled virtual agents:

Start from a recognition that your caller’s time is valuable to them

A speech-enabled solution differs greatly from text interactions such as web chat. For example, in a text-based solution, the user can:

  • Visually parse and absorb information (text and images) at whatever speed is natural to them
  • Respond at their convenience
  • Skip over wordy disclaimers, instructions, and legalese
  • Easily refer back to information previously provided (such as scrolling up to re-read earlier messages in a web chat)

Contrast the experience of interacting with an IVR or Voice Agent solution, where the caller:

  • Must hear and understand the information as it is provided audibly, in real time
  • Must respond immediately (most solutions will terminate or escalate calls to a live agent after a configured response timeout threshold has been reached)
  • May be forced to listen through information that is not relevant to their reason for calling
  • Cannot easily (if at all) refer back to information that was provided earlier in the call

These differences need to inform every aspect of your conversational design approach.

Take out the trash

Photo by Gary Chan on Unsplash

As a follow-on to the points above, avoid superfluous messages such as, “Your call is important to us” (which will never sound sincere coming from an automated system), or “Visit us on the web.” (The user probably found your number on the web and couldn’t, for whatever reason, accomplish their goal online!)

Too, be mindful of any generic or informational messages. If these responses are lengthy or don’t apply to 80% of your users (and the reasons they call), try to make the message optional.

“Thirty seconds feels like an eternity when you are on the phone…”

I had a recent experience calling a pharmacy for refills. The greeting contained an additional message telling me that a COVID-19 vaccine was not currently available at their pharmacy locations. Then it directed me to a website for more information about general vaccine availability. To make matters worse, speech barge-in was disabled until the entire message readout was complete.

All told, it took a full 30 seconds to reach a point where I could speak the purpose of my call. Now, I’m not saying that this kind of information isn’t important to some people. I just find it hard to believe that 80% of the calls coming into the number printed on a medication box are questions about in-store vaccination availability. I would want to see data to justify a design choice like that. Thirty seconds feels like an eternity when you are on the phone, especially when you know what you need and are waiting for your turn to talk.

An easy fix to this greeting could be something like, “For questions about COVID-19, press 8.” (Of course, the reasons people call will change over time. At some point, I would expect this to need to change to something like, “We now offer COVID-19 vaccinations. For more information, press 8.”)

Photo by Icons8 Team on Unsplash

KISS (Keep It Short & Simple)

A general guideline for your copywriter to follow is to keep the number of syllables in a given output response to a minimum. Use more concise words wherever possible/appropriate. This strategy has two benefits: it helps cut down the total call time, and it results in messages or instructions that are easy for the caller to hear and understand. (e.g. “use” vs. “utilize” — 1 syllable vs. 3 syllables, or “Member ID” vs. “Member Identification Number” — 3 syllables vs. 10 syllables)

Remember that every second counts in a voice solution!

Consider the cognitive load you are placing on your caller. Output responses designed for speech often stray from “proper” grammar rules. Ask what you need of the caller in plain and simple terms. Use enough words to be polite and direct, but don’t waste syllables on trying to sound grammatically correct, overly friendly, or apologetic.

Photo by Arnaud Mariat on Unsplash

Optimize your dialog design for natural language requests

Traditional IVR solutions can be tedious. They served a purpose when the keypad was the best input option. Personally, I find it disruptive to constantly pull the phone away from my face to punch in numbers. When you harness the power of speech and natural language technologies, callers are able to navigate or digress by simply stating their need — in their own words.

You can switch to a more directed dialog on retries. (To reduce the need for retries, customize your Speech-to-Text model with the most likely responses for your domain and use case.)

  • Example A, 1st attempt: “What is the zip code?”
  • Example A, retry: “Sorry, please say or enter the five-digit zip code.”
  • Example B, 1st attempt: “Was that 90210?”
  • Example B, retry: “Was that 90210? Press 1 for yes, 2 for no.”
  • Example C, 1st attempt: “What is the date of birth?”
  • Example C, retry: “Please enter the date of birth as a 2-digit month, 2-digit day, and 4-digit year.”
Photo by Matt Collamer on Unsplash

Avoid making the caller feel as if they have done something wrong

If you need to retry due to invalid input, simply state what kind of response you are looking for. Don’t waste syllables on what the problem was (or appears to be) unless it is critical for correcting the problem.

Compare these retry message alternatives for a use case where the user must enter a long account number (first attempt: “What is your account number?”):

“You did not provide the right number of digits. Say or enter your thirteen-digit account number.”

vs.

“Sorry, please say or enter your thirteen-digit account number.”

There are so many causes for this type of error. The user might have spoken thirteen digits, but the speech service didn’t recognize all of them. Poor reception or background noise on the caller’s end may have interfered with the transcription. Maybe they fat-fingered the dial pad? In most cases, it really doesn’t matter why — blaming the caller will always sound rude and is not productive.

Don’t go overboard with “please” or apologies

It’s a good instinct to want your virtual agent to sound as polite as your star call center agents, but try to limit how many responses contain these words.

An overuse of these words becomes more obvious if you listen to the call experience from start to finish. The conversation will feel repetitive and unnatural.

“On a voice call, these unnecessary syllables add up quickly and don’t provide much value to the overall experience”

Consider an authentication flow that requires 3 pieces of information. The caller might hear these consecutive outputs:

  • “Please enter your account number.”
  • “Please enter your date of birth.”
  • “Please enter the last four digits of your social security number.”

That’s a lot of pleases! On a voice call, these unnecessary syllables add up quickly and don’t provide much value to the overall experience. In general, keep them out of your first attempts for data collection. They are better suited for retries. They are a gentle shorthand to indicate there is a minor problem — that we haven’t yet progressed to the next step.

Photo by UX Indonesia on Unsplash

Optimize the call flow according to your user persona

After all, the user persona was your starting point, right? Go beyond thinking about what the user needs to do. Observe and reflect on how they get it done. Are your users interacting with your system daily or weekly (such as a medical assistant calling to check claim statuses)? If so, prioritize expediting a task flow in a way that best serves repeat callers. (Bonus points for solutions that use context from prior interactions to further enhance the experience!)

Conversations designed for one-time or infrequent callers will need to provide more guidance and help options. (Think: reporting an auto insurance claim or scheduling a dentist appointment.) Set expectations on what the caller can do through this interaction and explain what will be needed from them. Design these interactions with good retry logic and an ability to repeat instructions, get to a menu of options, or return to an earlier step within a process flow.

Final Thoughts

Your callers don’t want to hear how much you care about their time — they want to accomplish a task and move on with their day.

Good design almost always goes unnoticed. That’s the way it should be.

A speech-enabled virtual agent powered by natural language processing allows users to express their goals and needs in their own terms. If you have designed an interaction that allows your user to momentarily forget that they are interacting with a machine, you will almost certainly succeed in delivering business value.

If you would like help in building or improving a cognitive solution, reach out to IBM Data and AI Expert Labs and Learning.

Special thanks to the reviewer: Andrew Freed

Cari Jacobs is a Cognitive Engineer at IBM and has been working with companies to implement Watson AI solutions since 2014. Cari enjoys kayaking, science fiction, and Brazilian jiu-jitsu.

--

--

Cari Jacobs
IBM Data Science in Practice

Cognitive Engineer at IBM Watson. Interests include natural language processing, user experience design, and martial arts.