Beginners Guide to Automated Voice App Testing

Florian Treml
The Startup
Published in
5 min readFeb 3, 2021


This guide suggests best practices, infrastructure and tools to ensure your voice app continues to deliver outstanding user experience.

Questions When Testing Voice Apps

Application of the suggested practices helps answering the questions:

  • Is my voice app following the designed conversation flow ? Is the conversation flow working as expected ?
  • How does my voice app work under real-life conditions ? Does is handle low audio quality ? Does it handle slow network connections ?
  • Is my voice app available 24x7, or are there any interruptions in service ?

The Art of Challenging Chatbots

The challenges when testing chatbots, escpecially voice-enabled ones, are different ones than when testing apps with a graphical user interface: while a graphical user interface restricts the possible user interactions by the controls it offers, with natural language, the number of possible user inputs are limitless. Additional when using voice as user input there are again more variables to take into account: the individual nuances in voices, the quality of the microphone, the background noises sourrounding the speaker, and more — when testing a graphical user interface, a button click is always perceived the same by the application, regardless of who actually clicked it.

The platforms behind powerful voice applications are still evolving and are subject to constant improvements — which means that developers have to rely on components that they do not own and the possible influence is limited.

Testing the Voice Conversation Flow

The open source product Botium provides you with all the tools required for implementing a comprehensive, holistic test strategy for your voice apps. You can read about Botium and the background on testing conversation flow in the official Botium documentation.

We will use Bring! Shopping List as an example of a voice app to test. It is published as Alexa Skill, and we can use the Botium Connector for Amazon Alexa with AVS for simulating voice input and output with Botium.

For details about the presented steps and tools please take a look at the Botium Wiki and our Blog!

Record Test Cases

The quickest way to get started is to use the Live Chat in Botium Box to record your own voice with your microphone. You can immediately see and listen to the response of your voice app.

Depending on the technology of your voice app, both text and audio response are shown or either of them.

Botium Box Live Chat — Recorder

You can save the conversation as test case and make some changes afterwards.

  • Refining input and output text and audio
  • Using wildcard matching or utterance lists instead of full text
  • Add additional test steps or asserters
Botium Box Voice Test Case

Synthesize Test Cases with Text-To-Speech

Instead of recording your own voice for the test cases, you may decide to instead (or additionally) use synthesized voice samples. Botium has it’s own Text-To-Speech and Speech-To-Text platform based on the best open source and cloud engines available — Botium Speech Processing.

Test cases are showing plain text now instead auf audio input:

Botium Box Live Chat — Text Input

Eliminating Flakiness — Homophone Mappings

A typical problem when testing voice apps is that audio transcriptions, especially for low quality audio, can be rather unstable — in test automation we usually rely on hard facts (fixed text assertions), and this will lead to increased flakiness of the test results.

In this example, you can see that instead of okay milch ist auf deiner liste the transcription says okay milch is auf seiner liste — this one character difference will make a test case fail:

Transcription problem

Botium provides the option to specify homophone mappings to deal with audio snippets that are often misinterpreted by the Speech-To-Test engine.

Specifying Homophone Mappings

Test cases use these mappings to qualify transcription results as success or failed.

Transcription Problem — Homophone Mapping applied

Testing Real-Life Scenarios

Using your own microphone in front of your laptop might be a good starting point, but in real-life voice apps are used in another way — with smartphone, with a home automation or entertainment device like Alexa or Google Home, in a car. To come up with meaningful End-2-End test cases for these scenarios you will have to make your test data similar to those scenarios.

  • Add background noise on various levels
  • Pitch volume up or down
  • Simulate various levels of distance
  • Simulate technical restrictions like GSM phone line or low bandwith
  • Simulate otherwise bad audio quality like interruptions or various levels of silence
  • … you name it …

In Botium Box you can apply various effects for simulating real-life usage scenarios to your own clean recordings or synthesized audio samples.

Botium Box Voice Effects

Continuous Monitoring

The recipe for ensuring availability of your voice app is actually rather simple — all you need is:

  • a smoke test for checking basic behaviour (for instance, just sending a simple hello to the voice app and listing for a response)
  • a scheduler to run the smoke test every few minutes
  • a notification mechansim to inform you in case of failures

With Botium Box, everything you need is coming out of the box.


Now you know what is needed for automated testing of your voice app, you may give Botium Box a try, or you can stick to the free and open source plan with Botium Core.

  • Record your own voice or use synthesized voice
  • Apply audio effects for real-life simulation
  • Conversation flow testing with Botium

See this article in spanish here! 🇪🇸



Florian Treml
The Startup

Co-Founder and CTO Botium🤓 — Guitarist 🎸 — 3xFather 🐣