Designing Voice UIs for Healthcare: 7 Tips to get you started

As of 2021, there are hundreds of millions of Alexa-enabled devices in households around the world. This estimate alone is convincing enough for many industries to experiment with developing voice-driven services. With a natural language user interface sitting on the kitchen counter, we have an opportunity to expand healthcare services to outside a hospital or care center setting.

While we are still perfecting how our assistants can handle things like complex queries, accents, context, and silence, there are still some known fundamentals to designing conversations that improve a user’s quality of care. This guide is intended for designers of all types who may be curious about some facets of conversational UX for patients and assumes this is your first rodeo.

Emergency / 911

It’s important to get this out of the way first: in a health conversation, no assistant is equipped to handle emergencies. Your users may build a mental model that substitutes your automated conversation for a medical professional. Every time they interact is an opportunity to reinforce that this conversation is not designed to provide connections to emergency services. Most home assistants devices can’t call 911 due to regulatory reasons, anyway.

You can simply mirror the same kind of message you might get when you schedule an appointment with your doctor:

Welcome to ____________. If this is a medical emergency, please call 911. How can I help you today?

Privacy and Authentication

Depending on the nature of your conversation, you may need to abide by certain privacy measures for a given country. This can be pretty murky, particularly in the USA where the laws are deliberately written to be non-specific in regard to the technical implementation of security. For the purpose of this guide, we’ll focus solely on factors that impact the user experience.

Basing this guidance on a document circulated by the Department of Health and Human Services, we can see there are a few methods for authentication of a person or entity. We can call this the “Three Ms” approach to user authentication: Memory, Material, and Metric:

  • Memory: something the user knows (password, PIN, etc.)
  • Material: something the user has (smart card, USB key, etc.)
  • Metric: something the user is (voice, fingerprint, any biometric)

For VUIs, a good pattern might consist of:

  • A web-based, one-time configuration of a username and password
  • A PIN that can be remembered and uttered, constructed in the one-time configuration above
  • A voiceprint for seamless verification of a given speaker / user


You can toss out Miller’s Law right now. Even short interactions can fill your user’s short term memory with a lot of contextual information to compensate for the lack of visual feedback. Because we are looking at VUIs for healthcare, we may also assume that this is not an opportunity to test the limits of a digital assistant’s comprehension. It’s best to simply state the possible commands that will yield a response from the system. The recommendation is to provide up to five options. If you have more than five, first reconsider if that’s necessary. Otherwise, start chunking and reserve the fifth option as an opportunity to request more options. This is more inline with Nielsen’s recommendation of having 3 +/- 1 options.

Later in this exchange, consider repeating the selection or directive after the user has responded. It’s good practice outside of a healthcare context, too, but the consequences here are potentially more dire. Things like symptoms and medication names need to be accurate; imagine a voice assistant placing an order for the incorrect medication. Even if this is a user error, that user may live or die based on the accuracy of the medication request. If you think this is an exaggeration: the NIH estimates that approximately 250,000 deaths may be attributed to medical errors annually.


If you are designing this interaction, there is a possibility that you and your collaborators are speaking different languages: plain versus clinical. It is important that we don’t make assumptions about a user’s reading level or clinical literacy. Therefore, a VUI that is intended for patient interactions needs to accept plain language inputs and produce plain language outputs. Let’s look at a possible example where our assistant will be confounded by a user response.

User: Alexa, ask __________ if stomach ache is a side effect of Alazopram.

This is more than likely what is returned from the query:

"results": [
"count": 65714
"term": "NAUSEA",
"count": 50157
"term": "FATIGUE",
"count": 48880
"term": "PAIN",
"count": 44892
"term": "DYSPNOEA",
"count": 40019
"term": "HEADACHE",
"count": 39439
"term": "DIARRHOEA",
"count": 38078
"term": "DIZZINESS",
"count": 35537
"term": "ARTHRALGIA",
"count": 32024
"term": "VOMITING",
"count": 30893
"term": "ASTHENIA",
"count": 27200
"term": "FALL",
"count": 26464
"term": "MALAISE",
"count": 25439
"count": 25075
"term": "OFF LABEL USE",
"count": 24877
"term": "RASH",
"count": 24199
"count": 23085
"term": "PNEUMONIA",
"count": 22168
"term": "PRURITUS",
"count": 22150

This is a truncated result from a query to OpenFDA that returns adverse events and their reported quantities from a given drug. Notice terms like “myocardial infarction” and “pruritus” are present in these results. There are plenty of clinical terms that are components of a stomach ache, but that term itself is not present.

If this is a potential interaction you’re designing for, there is good news and bad news for you. The bad news is: you may need to support a dictionary of synonyms for these entities. The good news is: this may be tedious but it is relatively simple technically. You can also save yourself some effort by simply testing your prototype and observing how users already translate clinical terms to plain language. You can do this without any development effort through the Wizard of Oz method, and you can set up your tools for this style of testing in less than five minutes.

Pain Scales

If you haven’t designed patient interactions through a VUI before, you may be surprised to learn that you don’t have to invent your own measurements for seemingly subjective things like pain. Plenty of research has gone into understanding the usefulness of pain scales, and for VUIs we have one obvious choice: the Verbal Numerical Rating Scale (VNRS). This is simply a scale of 0–10 in relation to Activities of Daily Living (ADLs):

0 = No Pain

1–3 = Mild Pain (nagging, annoying, interfering)

4–6 = Moderate Pain (interferes significantly)

7–10 = Severe Pain (disabling; unable to perform)

These self-reported conditions are often considered the “fifth vital” and are surprisingly helpful to clinicians. Because of their apparent simplicity, users don’t need the same breakdowns of the pain brackets as described above. You can simply say:

VUI: On a scale of 0–10, with 0 meaning no pain, how much pain are you experiencing?

The added benefit here is speed. Voice interactions typically take longer than visually scannable experiences, so the fact that the VNRS can be administered in less than a minute makes it particularly effective.

Bedside Manner

Bedside manner is a ubiquitous term that has its roots in describing the expectations of a doctor-patient relationship. The conversations your interface is having with a patient should be upheld to the same standards as it is a surrogate clinician in the eyes of the user. If you’re a UX Designer, this is a great opportunity to exercise your empathy muscle. Put simply, this is important in our patient experience for two reasons:

  1. It humanizes the experience
  2. It reinforces that the system has collected the information correctly

Let’s look at our pain scale dialog to see how our VUI can support good bedside manner with our theoretical patient:

VUI: On a scale of 0–10, with 0 meaning no pain, how much pain are you experiencing?

Patient: I would say 9 right now.

VUI: Sorry to hear that you are experiencing pain. Would you like me to connect you to a nurse?

Multi-Modal Interactions

This is our Back Button or Escape key in the VUI experience for healthcare. There are a number of reasons why this may be useful to patients:

  1. The VUI has simply not yielded the results expected by the patient.
  2. The user has suffered physical or cognitive impairments that reduce the usability of a VUI.
  3. The system requires inputs that are not possible or inefficient to collect via voice alone (e.g. taking a photo of a dermatological side effect to a medication during a clinical trial)
  4. The situation requires immediate human intervention or response (keeping in mind that a VUI is unable to call emergency services).

This is probably the most complicated and ambiguous part of designing an effective, voice-driven patient experience as there are a number of technical approaches to switching modes of interaction. You can use an architecture that enables your various modes to seamlessly handoff to each other, or consider building your interaction on a platform that already offers multichannel support.

That’s it!

Just kidding. There are many more nuances to patient care via Voice User Interfaces. I hope that this short guide is helpful to people who are looking to quickly prototype a new healthcare VUI that at least has a solid foundation. In addition to this article, I recommend reading this excellent article about NN/g’s classic usability heuristics as applied to Voice. Are you an expert on the topic? Please comment below!

Senior Design Technologist for AWS