A Voice UI to Optimize In-Car Navigation

Let your car recommend the best route for you, like literally!

Published in

Women in Voice

8 min readApr 21, 2022

About a year ago, I was dictating a shopping list to Alexa just before heading out for an appointment, with the goal of getting those items on my way back. I quickly got into the car, entered the destination address and drove off. While returning, not only did I forget a few items, but also didn’t have enough time to make a stop that I had initially planned, due to traffic congestion. After encountering a similar problem a few times, I wondered if there were other people who also experienced the same challenges!

My initial hypothesis was that people might need voice assistance with shopping on the go. But surveying over 50 participants aged 25–45 revealed that most people preferred to multitask on their commute. Parents with young kids especially expressed the need for a way to optimize their journey to the destination — so more like someone to manage their route — than assistance to create the shopping list itself.

My goal with this essay is to build a case for a Voice UI/UX design proposal for an in-car navigation system. And in doing so, I’ll be drawing upon the insights that I gained from my low-mid fidelity prototype testing sessions to effectively illustrate my points throughout.

The Problem

Currently, there is no way to use voice with Google Maps to add additional stops on a journey, for example verbally searching for generic places/services such as coffee, food fuel or rest area. Some of the common problems that a user runs into while using Google Maps is that when adding multiple stops, the system is unable to take into account the user’s context. Clearly, there is a strong use case for voice here. Of course, for it to work as expected the voice assistant would need access to the user’s location and the navigation GUI while scheduling stops. As Cathy Pearl states, contextual awareness and memory of past interactions is what will make VUI truly conversational.¹ It means thinking beyond one-off conversations and training the VUI to anticipate what the user might say next. Below is a video capturing a friend’s interaction with Google Maps when trying to optimize her route to Target via Scotty’s donuts and Walgreens.

Friction in adding multiple stops in Google maps

youtube.com

The user manually picks from multiple nearby options to input the closest one to their destination. This easily involves 10+ steps! Let’s look at a user journey below:

Open Google Maps app
Look for Scotty’s donuts, tries to decide which one she wants to go to (there are two locations)
Search for Walgreens locations, but the results aren’t shown in context to her already-planned route, so she tries out two different Walgreens locations to see which route she like better
Look for Target. Again, location results aren’t shown in context to one’s planned route, but she guesses which one makes more sense

Wouldn’t it be nice to have a system remember the user’s preferences, make recommendations and most importantly optimize their driving route accordingly?

Solution: Alexa skill that would take into account a user’s context and also support multi-channel, multi-modal interactions.

While interviewing participants for the project, I observed two things: First, that the user is often trying to accomplish more than one goal when navigating from point A to point B. The goal is usually to find a convenient stop en-route or in close proximity to the main destination. Time is considered not-well-spent or wasted when a person is unable to sequentially accomplish the goal/s of their journey, resulting in the user having to re-travel either a part of the distance or the entire distance again, which would call for a second trip. Second, time can be understood and expressed in many different ways. The system needs to be able to recognize and identify each of the different utterances as ways to represent time. One user said:

“If I want to make multiple stops, I would like the voice assistant to tell me which is the best way to go and where I should go first.”

Based on user research and stories I came up with an initial set of intents that would help the users of this in-car voice assistant accomplish their goals.

Intents derived from User stories. Author’s own image.

The Design

Some of the basic functions that I wanted the voice assistant to carry out include, 1) Identify stops based on a task, 2) Categorize the most common types of stops based on commonly said or referenced words such as ‘gas’ or ‘groceries’, 3) Plan or organize stops based on distance and traffic, 4) Add stops to or remove stops from existing navigation, 5) Save locations for later, 6) Offer the option to rename or label a location such as ‘favorite ice-cream place’, 7) Give feedback upon completion of successful navigation to a certain destination, 8) Offer confirmation after adding or removing stops from navigation, 9) Let the user know when what they said was not understood or unsupported by the system and 10) Begin navigation guidance by prompting the user as soon as they opened the navigation app on their phone or in the car. The list of basic functions, as you can tell, could be endless. Clearly, I was pushing for high standards! But I knew that it was prudent to focus on the most essential functions for my mvp (minimum viable product).

The first step was to identify the top intents of the user I was designing for. In order to script my prompts for the dialogue flow, I started by thinking about situations in which a user might want to use voice to create navigation guidance and optimize routes on the go. For example, here are a couple of the different situations that I designed for:

Situation #1: The car is in motion. Driver wants to add a stop. Voice assistant locates & routes to pharmacy on the way.

U: Alexa, let’s stop at a pharmacy on the way.

S: Ok, sure. I found a Walgreens, 2.4 miles away from you. Would you like me to add it to your main navigation?

U: Yes, please. Go ahead and add the stop.

S: Ok, adding a stop to Walgreens on your way to Yosemite National park.

U: Great, thanks!

S: Stop added! You are welcome.

Situation #2: The car is in motion. Driver wants to add a stop. Voice Asst. cannot locate a pharmacy on the way

U: Alexa, let’s stop at a pharmacy on the way.

S: Ok, sure. Hmm, I cannot find a pharmacy on your route. But I see a drug store only 2 miles away. Want me to add it to your navigation?

U: Yes

S: Ok, Civic Drug Store is added to your navigation.

U: Great, thanks!

S: You are welcome!

For the purpose of building my prototype I narrowed down to 4 main intents -Drive now, Drive later (allowing me to save routes for later), Plan route (for a trip with a lengthier duration), Access/edit saved stops.

Next, I reconfigured a Tesla navigation UI to match my script and then layered in the Alexa design elements to create a series of interactive prototypes. The video below demonstrates my interaction with the prototype.

A video of me testing the prototype. Tools: Adobe XD, Quickplay.

Observations

While testing the voice skill, I observed the different ways in which participants responded to, and interacted with the prototype while commanding the VUI:

Each participant timed the stops along the way using a variety of utterances such as: 9:15, 10 o’clock, in 1 hour, a few minutes later, 10, 10 am, then, in 10 miles (approx. 10 min.)
One of the participants wanted to be reminded of what time they were going to get to a certain place, like grandma’s house, if they decided to optimize their commute by making stops along the way. Once they knew how long a certain detour could take, then they would work backwards from the ETA of their final destination.
It would be great, another participant exclaimed with excitement during a session, if the voice app were able to recommend a stop that is closer to my current location or easier to get to.

It is common for people to confuse time with duration and even distance. Framing the question therefore is critical to elicit a suitable response from your users. However, when using variables such as time or location, it is a good idea to make use of slots when defining utterances. Unfortunately, in this case I was limited by the platform’s constraints. Below is an example where the voice assistant could not understand the utterance “in 15 min.”

The system does not understand the utterance ‘in 15 min’. Author’s own image.

Feedback & Key Takeaways

Evidently, periodic reminders about ETA as well as the number of stops planned along the way could help set accurate expectations of the commute / journey. While Adobe XD’s really cool feature allowed me to layer voice on top of GUI, it is still fairly limited in its capabilities. For example there’s no way for me to add slots for duration which I could do in say, Voiceflow.

What I also realized is that advanced features such as trip planning suggestions as well as nudges upon automatically recognizing longer driving periods could go a long way to enhance user experience while putting the driver and/or the navigator at ease. For example, “Hey you’ve been driving for over 3 hours. Do you want to stretch your legs or stop for coffee”? One user felt that the prototype involved multiple points of interaction such as the map, the buttons and the car view all of which could increase the cognitive strain on the driver while driving. Another veteran conversation designer recommended that I make the system prompts more conversational by replacing ‘Try saying…’ with a direct question such as ‘Do you want to start navigation?’

“The navigation mode is primary. Map could be minimized when focusing on directions, nudges or recommendations. Maps should occupy focus as soon as user starts navigation.”

The flows for Saving stops and Adding/removing stops that were scripted, but are not built out yet are critical to allow for multi-user interactions. For instance, let’s take a case where husband and wife take turns to drive the car and say, each day the wife drops their son to school. One day, the husband decides to drop their son to school and quickly asks the system to route to school, assuming that the word ‘school’ is common knowledge or “common ground”, as Clark and Schaefer point out in their essay, ‘Contributing to Discourse’.² It’s possible that users prefer to use call signs or labels for frequently visited places, such as “Bob’s school.” It would make a very poor user experience if the voice assistant responded back with ‘I’m sorry I didn’t understand that,’ requiring the husband to enter the school’s address again. What would make a smart UX here, is for the system to be able to associate the keyword ‘school’ with the address frequently entered by the wife.

If you are someone that’s already working on a similar project and like what I’ve presented here, hit me up on Linkedin! Otherwise, feel free to comment on this piece below. As my next steps, I’d love to continue refining and iterating on this concept and ship it to production.

1] Cathy Pearl. (Dec. 2016). Designing Voice User Interfaces: Principles of Conversational Experiences. https://www.cathypearl.com/book

2] Herbert Clark and Edward Schaefer. (April 1989). ‘Contributing to Discourse’, Cognitive Science. https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1302_7

3] Special thanks to Ryuka Ko and Courtney Artuso Berg for their expert guidance and insights on this project.