Testing for Voice
Challenges and best practices for testing voice applications i.e. Alexa Skills and Actions on Google
By Shriya Chhabra
Over the last couple of years, we’ve experienced a huge influx of speech recognition technologies — to the extent that it feels like we have another person in our homes. Voice platforms are relatively new, making it very attractive not only for consumers but also companies who want to be among the first to launch on these platforms.
Since voice recognition technology is becoming such an important part of our lives, software companies like TribalScale are working to build and deliver best-in-class products for consumers. In order to accomplish this, TribalScale has a team of test engineers who ensure the desired level of quality is maintained.
At TribalScale, I’ve had the opportunity to test an application on Amazon Alexa which was developed for one of our major media clients with operations in the United States. As a test engineer, I was responsible for testing every functionality and user flows throughout the application. Below, I share some of the challenges I faced and the best practices I adopted when testing for voice.
3 Challenges in the Testing Process:
1. Time required for manual testing
The current testing process of a voice application is carried out manually. Test engineers currently have to use trial and error methods to find the most suitable test strategies. This creates bottlenecks in the pipeline. Test engineers must be efficient and attentive to catch the audio responses from the device as that is the sole method of communication for majority of the voice devices. For a few newer devices, test engineers not only have pay attention to audio but also monitoring displays on certain devices (like the Amazon Echo Show). This is vital in cases where the audio stream breaks or when an unknown skill is invoked. Paying attention to such details ensures a smooth and enjoyable user experience and adds to the quality of the application created by the project team. However, this process is extremely time-consuming, especially when the number of test cases is very large.
2. Overloading the skill
“Overloading the skill” means adding more capabilities than the skill can support. One of the current drawbacks of the voice platform is that as new functionalities are added to the skill, the device is not always able to recognize the command the user is trying to invoke. Typically, a skill will have commands/invocations that have a majority of words in common. For example: In a media application, the invocations could be: “play podcast”, “play artist” and “play songs”. In all of the above invocations we have the word “play” in common which may confuse the device in determining what needs to be played. This could result in unexpected responses, skill crashes, or stream crashes which disturbs the user experience. Since these responses cannot be reproduced every time, they can easily be missed, and identifying and reporting these bugs is a challenge for test engineers.
3. Unintended speech pauses
It is expected that the user makes every invocation with perfection, and at a constant pace. This is because majority of voice platforms are designed in a way that the device stops processing once:
- it hears a short pause, or
- it has reached the time allotted to invoke a command
This can be extremely frustrating for testers who are trying to test multiple long flows within the application.
Best Practices for Testing on a Voice Platform
1. Manual testing to improve design
Even though manual testing can be time-consuming, it can be reliable and useful to help improve the design and product experience. This is because it replicates the user experience. Sometimes, when testing for our project, we realized that some responses could be better framed, or that we could improve the flow of certain invocations. We wouldn’t have known this without experiencing the flow as a user, so we tweaked the design to accommodate these changes.
2. The human experience
Another finding was that using a recorded voice command system for an invocation may not be the best option. Robots have a consistent audio frequency, unlike humans. Since voice devices have a fixed time-frame allotted to hearing an invocation, it is likely that many users will exceed the time-frame. Thus, we redesigned these invocations to ensure that such issues were resolved, and we only would have uncovered this by testing the application ourselves.
3. Exploratory testing
Exploratory testing is key when testing a voice project. It is vital for a test engineer to make sure that no flow breaks or crashes the application. For example, we not only checked that each skill was working, but we also checked to ensure that the user would be able to jump from one skill type to another without any errors.
Being creative is very important when testing applications on location-based devices. A test engineer thinks out-of-the-box, uses external tools to create different scenarios, and uses different locations to replicate the experience of a user moving around and being in different environments. For example, imagine you are in Toronto and want to play a local radio station, the test engineer checks that the right station is being played when the call is made to the server. To check for accuracy we used Google maps and measured the distance between the local tower station and location of the device. Another example is location coverage. We also tried changing the location to remote places to get an idea of how these areas covered are by the server. By doing this, we were able to provide feedback to the API server company and could request for a wider coverage if necessary.
Testing on a voice platform is a great learning experience for a test engineer due to its unique testing process and the unexpected findings that come with it. Every project adds value to the platform and creating testing strategies. Testing bleeding edge technology requires patience, different levels of creativity and techniques to make sure the skill is ready for the most intense user.
Voice recognition technology is the future and by finding out and solving problems with through testing various voice applications, we will enable this technology to be very useful and dependable for its users.
Shriya is a Computing and Financial Management Student at The University of Waterloo. She has been working as a Software Test Engineer (Co-op) at TribalScale where she anchored various projects and helped her team deliver valuable products to the market.