UX Design Challenges for Voice-Enabled SaaS Platforms

Key user experience insights gained while building voice solutions for healthcare

Published in

Bola AI

5 min readMay 15, 2019

The genesis of voice recognition technology happened somewhere around 1950s, when it was used to understand only numbers and digits. Since then, the advancement of high speed processors, and the availability of humongous amount of data to train efficient machine learning algorithms have facilitated researchers to invent systems that can comprehend voice commands in fraction of a second. This accomplishment allows voice assistants like Alexa, or Google to sit in your pockets waiting for your next command.

Challenges in using Voice Recognition in Healthcare

Throughout our journey to build Bola, we faced four major challenges while creating a voice interface that could provide a frictionless user experience without compromising the response time, and accuracy.

* Adapt to user, and not vice versa

We started off with building Bola as a skill on Alexa. If you would have played around with your phone’s voice assistant, or your Echo dot, you must be aware that these systems timeout if you don’t say anything for sometime after issuing a command like — “Set a reminder?” or “Send a message”, which means you will have to say the word “Alexa” or “Hey Google!” again to continue where you left.

In healthcare settings; like your dentist visits, hygienists take time to record the readings of your teeth, or in pharmaceutical scenarios, researchers take time to analyze results before they can record their notes. A system which times out, such as Alexa, adds an extra layer of friction to the user experience, as a user will have to wake up the system after every short interval. To solve this issue, we built our own models over cloud based services that wait for the user to pause before processing the commands. This allowed Bola not only to adapt to a user’s speed of using the product, but also to provide a continuous voice based interface with domain specific understanding.

* No screen like your favorite SaaS platform

If you remember any experiment from your high school chemistry lab, then you must be aware that while you are adding chemicals to a conical flask, you try to not to get distracted, as one extra drop of an acid might ruin everything. A similar scenario happens in the world of dental care, and pharmaceutical research. While examining a patient’s teeth, (or during an experiment) it is inconvenient for a hygienist (or a researcher) to look at the screen. They rely on their assistants to record correct values, or must interrupt their process to record data.

A voice system that continuously processes your commands after every pause requires a way through which user can be notified about the acceptance of the previous command. This is necessary because if there is no way for the system to notify the user about the acceptance of the previous command, then either users sit there wondering if it heard them and if they can proceed, or even worse, they don’t wait and the system gets overwhelmed.

To solve this challenge, we started off with short phrases like — “Bola AI started”/“Bola AI shutting down”, and small quick beeps for the intermediate responses. During our testing we noticed that our users didn’t like to wait for the system to say “Bola AI started”. They wanted it to be quick. A three word phrase which we thought of was cool like in sci-fi movies wasn’t that effective in a real scenario. To resolve this issue, we removed the phrases and added small beep sounds for start and stop. We made sure that the start/stop alert sounds were different from the intermediate notification sounds, allowing user to easily understand the difference when the system started, and when it processed the last response.

* Real Time Migration vs After the Job Migration

The fundamental requirement while designing Bola was that it should be easily integrable with any of the softwares that our clients would be using. Keeping this in mind we built our first MVP that was capable of migrating data to the client’s software in real time. During our testing at one of our client locations, we found that the real time migration added a significant delay as compared to the speed at which hygienist examined the teeth. Using paper notes as a backup option triggered a “eureka” moment for us. We realized that the final result (data recorded in the client software) matters more than the approach taken to record it.

We redesigned our system to record all the data inside Bola, and then commanding Bola to transfer the data to the client’s software at the end. This not only reduced the latency due to real time migration, but also provided our users more flexibility in terms of correcting errors through our UI.

* Flexible Error Handling

In probabilistic systems such as voice, there is a 100% chance that errors will occur. So how to handle them when they do is the real UX challenge. Showing a user meaningful error messages improves the user experience by a great extent. However, in a continuous voice based system, dictating errors over voice becomes a big challenge. The system should not only indicate errors using short phrases, but should also guide the user to fix them.

While building error handling for Bola Dental, we thought of an option of asking the user to fix the errors after every sixteen teeth. During our testing we noticed that some of the hygienists felt obstructed when the system requested them to correct the errors after every sixteen teeth. As this deviated them from their main task of examining patient’s teeth. To resolve this problem, we integrated flexible error handling, which allowed a user to ask Bola, if there are any errors so far whenever they wanted. This helped us to provide a smooth data charting experience with robust error handling that’s adapted to the user.

Conclusion

Voice recognition has been around for ages, but the last decade has been a huge leap in this domain that has resulted in systems as accurate as humans. These nearly hundred percent accurate systems will become the foundations of the next generation of softwares. Designing a frictionless user experience is a lot more challenging job when compared to screen only interfaces as the added layer of voice requires the assessment of trade-offs to determine what ratio of user tasks could be completed over voice vs over screen.