“For Here or To Go?” Menus Are Conversational Too

Photo by JC Gellidon on Unsplash

Many people who create Actions for the Google Assistant start their voice-controlled interactions with a menu of options or some other type of question that constrains the user’s possible responses. It may feel to you, the designer, that such an approach is robotic, or at very least, counterintuitive in achieving your goal of creating a naturally conversational experience. But not every turn in a conversation needs to be open-ended. I’m here to tell you that it’s alright to use menus.

This article will explain some of the reasons why directed dialog is totally fine. In fact, we’ve been directing dialog for tens of thousands of years, and it’s been in our conversational technology toolboxes for decades, so it’s natural to use constrained, hand-holding questions in your voice designs. If you’ve created — or plan to create — Actions that include them, fear not. Here are some reasons why they’re invaluable conversational design tools.

The solution to the hardest problem is not the only solution

Advances in speech recognition and natural language understanding are allowing computers to understand more advanced, nuanced human interactions. If you’d called an airline in the early 90’s and tried to say “I want to fly to Boston next week,” nothing would have happened except a prompt telling you to please choose an option on your keypad. If you say the same thing now to the Google Assistant on your phone, you will see a list of potential flight options, and those options will take your location into account in order to choose a departure city. These improvements have been on every level of language, from the morphemic to the pragmatic.

Without question, providing users with the ability to say more at any given dialog turn — and then responding appropriately — has always been the hardest challenge facing the field of conversational technologies. And because it’s the hardest problem, it has become the primary focus of our work.

But just because something is the hardest problem doesn’t make it the entirety of the goal. When Google cracked the code on search in 1998, its advance did not preclude the need for clickable links on the results page. Both remain useful tools for reaching web content. Open-ended questions and menus can and should co-exist as different means of achieving a conversational goal.

Menus are actually conversational

I must admit that sometimes I eat fast food. And when I do stop in for a helping of shame at the nearest rest stop, the conversation with the person behind the counter almost inevitably goes the same way:

Employee: Hi there, welcome to Burger Trough.

Me: Hello. I’ll take uhhhh- a cheeseburger, uhhhh — a small fries, aaaaand a small soda.

Employee: Is that for here or to go?

Me: To go, please.

Employee: That’ll be nine dollars and twenty-five cents. Card?

Me: Yeah. [Takes out card and places it into the machine. Tap tap. Beep boop. Done]

Employee: [Handing me empty cup for soda and receipt]. Your number’s three four five. [points to number on receipt]

Me: [Takes receipt and cup] Thanks.

Note that when the employee asked, “Is that for here or to go?” my response was not “What are you, some kind of a robot?” That’s because providing explicit options is a completely normal part of conversation.

Granted, there was a time when we all had to deal with phone menus (IVR’s), and the menus were often frustrating. But that was not because they were menus. They were frustrating because (1), menus were the only strategy available and so they felt repetitive, plodding, and endless. No open-ended, user-initiated dialog turns — like today’s invocation steps on a smart assistant — existed even where it would have made sense. And (2), many of those menus were poorly designed — more on that later. But the lack of dialog strategy alternatives and the poor design led people to feel that menus and other very directed, specific questions were “robotic.” But as the fast food example shows us, such questions are quite natural when used appropriately.

A conversation is a means for two or more people to complete a task, even if that task is just to pass the time. And in order to complete those tasks, we work with our conversational partner to establish a foundation of mutual understanding (I encourage you to review the work of Susan Brennan at SUNY Stony Brook and Herb Clark at Stanford for more on this subject). Open-ended questions and more restrictive questions are both tools we use to establish that understanding.

Talking without any help from your conversational partner is hard. That’s why quiet therapists from a strict Freudian background can often be frustrating for clients. And that’s why people memorize and rehearse before giving a presentation. Such long, unstructured speaking jags require excessive cognitive overhead. And with speech technology, the open-endedness is further complicated by time constraints put on the user. If they don’t respond in an open-ended fashion within a few seconds, the interaction will fail. Open-endedness and time-restriction are some serious challenges to put to a person, especially if they don’t know what the system can and cannot understand.

Directed dialogs ease the pressure following an open-ended state. If the user neglects to provide a key piece of information in their initial utterance, then very specific questions can follow up to prime the user to add details (I never mentioned to the fast food employee during my original order that I wanted it to go, so he asked). Think about someone saying “Hey Google, call Kate.” There are two Kate’s in the person’s contacts, neither of whom the user has called before. In such a situation, it’s fine for the Assistant to ask “Which Kate?” Then the Assistant can learn the preferred Kate over time, if any, and not have to ask the follow-up again. “Which Kate” is not robotic. It’s completely natural. However, if the Action never learns the preferred Kate, and continues to ask “Which Kate” every time the person tries to place the call, then it becomes robotic.

Menus are fine, if you design them well

At this point, the conversational design community has over twenty years of experience. In that time, we have collectively learned a lot. Just to provide one example of a design guideline: Finish your question with the question itself. Don’t follow it with some additional information. It’s okay for a system to say, “We open at 5pm. So what time’s the dinner reservation for?” It’s not okay for the system to say, “What time’s the dinner reservation for? We open at 5pm.” The latter causes the user to trip over the still-talking system, stammering and potentially leading to a misrecognition of their response. Conversational designers now have scores of these handy guidelines to follow. The menus and other constrained questions of today are far superior to the ones decades ago. Granted, not everyone who is designing Actions today are conversation designers, but those guidelines are available for all to review. With these, the overwhelming, clunky questions of older IVRs are hopefully disappearing, being replaced by frictionless, elegant questions that are easy to answer.

Also, just because a question asks for specific information, that doesn’t mean you can’t create a grammar that’s listening for other likely information as well. When someone calls for a train status, Amtrak’s phone system asks, “Do you know the train number?” After asking, the system is not just listening for ‘yes’ and ‘no’ responses, but also for train numbers. Such design allows for even less friction.

It minimizes existential dread (seriously)

I suspect that perhaps the most fundamental, preconscious reason why designers and developers want to get away from menus has to do with notions of Freedom, with a capital F. It goes without saying that the majority of the modern world holds Freedom in extremely high regard. The design world is no different. While most of these sentiments likely go unmentioned, some critics overtly label design that restricts users as acts of paternalism. And in the speech community, we also see this love of Liberty. The very first statistical semantic model (SLM) put out by SpeechWorks was called “SpeakFreely.” Seriously, who’s going to rail against freedom of speech? Isn’t it absolutely the case that more freedom in conversations is a good thing, regardless of whether it’s with another person or a computer?

But consider this. In the field of existential psychology, which studies the most primal drives of human beings, Freedom is inherently scary. Think about the last time a phone system said to you, “Briefly describe the reason for your call.” The moment it goes silent, giving you absolute Freedom, and while the system waits for your answer, you experience a slight discomfort. You think, “How do I phrase this? What words do I use? What does it understand? What do I truly need?” You feel pressure, uncertainty. That little moment of stress? That’s the hard part of Freedom.

Of course there’s value to Freedom, and open-ended dialog is needed at times, but my point is that the value of that Freedom comes at a slight existential price, a little taste of stress, of existential dread. In short, Freedom isn’t free. Consider this cost the next time you remove guardrails from your design.

In closing

Think of all of the “paternalistic” questions you’ve come to expect when interacting with people in customer service. Do they bother you when you hear them?

  • To stay or to go?
  • Bubbling or still?
  • On the rocks? Neat?
  • Window or aisle?
  • Economy seat?
  • Are you thinking a touring bike, a mountain bike, maybe a hybrid?
  • Open-faced or closed?

If you’re like me, then you find these questions completely understandable and even welcome if you didn’t mention the information earlier. Conversational technologies need not be any different.

Some of you may be wondering about that last example, “Open-faced or closed?” That’s actually what the guy at my local bagel store asks every time I order. It’s actually a little frustrating when he asks because I go in there quite regularly and at this point, he should know I prefer open-faced. Conversational systems have an obligation to learn user preferences so that in the long run, very constrained follow-up questions are no longer needed. But until they get there, it’s okay for these systems to learn by asking. It’s what people do all the time.