Alexa do you copy? Voice UI for professional use.

Rick de Groot
Design@ING
Published in
7 min readOct 10, 2022
Radio operators in action. No comms.. no bombs..

A while ago there was a lot of buzz and growing awareness on Voice user interface (VUI) as a way to interact with technology. I still think that technology like Alexa, Google Assistant, Siri and Duplex will be the next big thing in the field of user experience. In general we tend to overestimate the impact of technology on the short term (hype), but underestimate the impact in the long run. So, now that the hype is over, the time for the real implementation of VUI can begin. Because voice is a way of interacting that is very natural for our species to communicate, I believe the impact will be substantial in the long run, if implemented in the right way utilizing it for the right type of users in a logical environment to do so.

At the moment VUI in my view is still a very generic technology that could apply to any type of user and implementation of the technology is mostly in and around the house or through a personal device. For me the most obvious implementation of Voice is implementation for the professional user. In this article I will explain further why I think implementing voice for professional users first could actually make more sense than focus on voice for “everyone” first. Implementing it for professional users could be the main catalist for VUI to go mainstream.

A lot of the benefits for implementing VUI seem evident, in the same time a lot of flaws are also very obvious. Looking at how we use Voice now, it can be a convenient and easy as a way of input into a system. The flaws occur when that input is wrong or misunderstood somehow. Which causes a lot of frustration on the user side. One of the reasons why dificulties arise I think is because we want the system to be “natural”. We should speak to it as if we were to speak to another human being. While this seems understandable, I don’t necessarily agree that the implementation of Voice user interfaces should focus on “natural” language solely. Especially since we humans don’t always understand each other too well when speaking in a “natural” way, misunderstandings are rather common in spoken language and can have a huge impact. Let alone communicating with technology that doesn’t (yet) understand all nuances and quirks of the spoken language. And then I’m not even addressing the problems of dialect, regional differences, sayings and different spoken languages as a whole.

Of course, in time, technology can probably overcome all those issues, but we must also understand that as humans we do not only create technology, technology also influences our own behavior. And instead of looking only at technology to overcome these shortcomings, we can also look at ourselves and what is already used today to enhance the way we utilize technology. In order to implement VUI successfully, we have the possibility to look at existing protocols of how spoken information is passed between two entities today. I’m clearly not talking about the chatter in everyday life but rather in a professional setting. One of the most clear-cut examples is the military radio operator or an air traffic control operator. These jobs require clear radio protocols to communicate in a structured manner to avoid as many mistakes as possible.

“Unnatural” communication

Let me explain how “unnatural” military conversations could be conducted. When military personnel spot enemy they use the term “contact wait out” over the radio to report enemy sightings/attacks. The next thing that follows will be a SITREP (situation report), if things get intense an AIRSUPREQ (air support request) can follow and in the worst case a 9-line MEDEVACREQ (medical evacuation request) could be needed (see example below). All of these messages need to be communicated in a very precise manner in order to be effective. Imagine air support, hitting the wrong area, or a medevac that is expecting one wounded person and instead gets ten.

NATO MEDEVACREQ radio protocol

Not only the structure of the message has to adjust to its purpose, but also the language itself is used differently in a military setting. In order to avoid any miscommunication or confusion certain pronunciations are used in a different way to provide clarity, the push for this consistency eventually resulted in the creation of the NATO phonetic alphabet.

NATO Phonetic Alphabet

So instead of just talking in a “natural” way, the military setting requires structure and standardization to increase the success of the message coming across as intended by the sender. As a result of communicating “unnaturally,” the results of getting the intended message across will actually improve. When it comes to implementing Voice user interfaces, we should not only focus on “natural” use, especially when looking at the implementation for professional users. I can imagine such communication protocol rules will not easily be implemented in the ordinary usage of Voice interfaces, for example, if I’m at home and I ask Alexa for the weather I want to do this in a natural way. I do however feel that there are more than enough possibilities to create spoken protocols for VUI for the professional user in order to make sure communication with technology is done as precisely as possible.

Radio chatter in war.

VUI for professional users

So what will VUI for professional user look (or rather sound) like? Well imagine a surgeon needing both hands doing a very precise job who wants to get a certain status update: “system, tell me stats, X,Y, Z”. Following a precise protocol helps the surgeon get the desired information. Or imagine a CFO wanting to get a certain financial insight: “System, show me all US Dollar transactions this week” The focus should not necessarily mean that the language is natural, but rather that the formatting is correct. If a person would instead say: “System, show me all Dollar transactions this week”. The system would not know what kind of dollar should be shown, as there is more than one currency using the terminology dollar (Canadian, Australian to name a few). So, in order to be correct, some protocols should be in place in order to get the desired results.

I understand the system will eventually be able to ask for a specification to further refine the request, but as shown this is not necessarily the only way to go. We can also improve our own ability to better communicate with technology by making better use of protocols to improve structure and clarity for a system to process. As said this is probably more difficult to expect for regular users, but for a professional user speech protocols are already in use today with good purpose, so leveraging that experience to also use in the implementation of VUI for professionals seems to make a lot of sense to me. Especially in dire situations, when miscommunications can potentially be harmful it is good to rely on refined and shared protocols to remove most if not all ambiguity. This is what we do currently in communicating with each other in such situations and we should do the same in similar situations when communicating with technology.

Proffesionals first, but everyone else will follow.

Another great use of VUI is also in cars, professional drivers like truck drivers will benefit substantially if they could properly navigate, alter car settings or enter cargo data through voice input. But also taxi drivers could benefit of course. The transition to regular drivers naturally is right around the corner and a lot of VUI implementation is already being done in cars. I do believe that creating and thinking about proper protocols could make the actual implementation of VUI in cars a lot more beneficial. It has nothing to do with being “natural”. Just as learning to operate the vehicle itself has nothing to do with something “natural”. Learning voice protocols to adjust car settings, navigate etc. could be just as common as learning to shift gears.

So, when implementing Voice for professionals in an “unnatural” way it could potentially be the catalist that VUI needs to become mainstream. If we understand that adaptation of technology is both in the tech, but also in our own behaviour and our willingness to learn that behaviour we can look for better initial implementation opportunities of that new technology. For Voice targeting the professional user first, could make more sense than targeting the mainstream users and in my view is the way to go.

I’m really curious to hear your opinions on the subject, so leave a comment with your thoughts if indeed the implementation of VUI for professional usage could be the driver for VUI to go mainstream and eventually become a common way of interacting with technology.

--

--

Rick de Groot
Design@ING

Design lead with extensive experience designing for financial institutions. Former Dutch Marine, father of three, Tech philosophy & Design leadership.