Voice of Kamu — Experimenting with chatbot voice technology at Migri
Our Migri chatbot Kamu is now 1 year old. The team has taught the chatbot to answer in many of the content areas that Migri is concerned with. This means that the team, with the support of Inland, is now looking at how to take Kamu to the next level. Within the chatbot development team Inland is in charge of the experiments but involved less and less in the everyday work of producing new content. One of the experiments, for which we receive funding from the Ministry of Finance this Spring, is “Voice of Kamu” (PuheKamu in Finnish): The goal is to get a better understanding of the maturity of voice technologies as well as the type of work required to put a voice-based version of Kamu into use with Migri’s customers.
When Inland does user testing and customer interviews, we realise the variety of customers that Migri has: From special experts, literate in many languages and technologies to illiterate immigrants and people being very sceptic about the use of new technologies[MS1] . With the current text-based Kamu implementation we can better serve the special experts, the students and all those that are digitally literate and enthusiastic. But a text-based chatbot cannot serve the tech-sceptics and the illiterate among Migri’s clients. This is why we want to experiment with a voice-based version of Kamu. Compared to our customer service the talking Kamu could be online 24/7 and answer all the easy questions, so that human service advisors could concentrate on more complicated cases.
In our experiment we used an out-of-the-box-solution, which we connected to our existing text-based Kamu solution.
This means that we did not do any adjustments to the speech-to-text- and text-to-speech algorithm. We simply chose a standard voice, which Kamu uses to answer and set up a phone number where to call to try out the implementation. Upon calling, the user is asked to choose between Finnish and English and after selecting the language with a button press, the conversation is purely spoken. Voice inputs were interpreted as American English by default. For this experiment we had to choose one single accent, and this was the one we chose.
Links to external websites, called external links in our system, were hidden from the users in this version. We did this, because users could not follow the hyperlink on a phone call anyhow.
The testing setup
When we had the implementation in place, we did one round of user testing with foreigners, who have moved to Finland. The tests took place in Helsinki during one day in the end of March 2019. We tried recruiting participants coming from different language groups in order to test different accents. 8 participants spoke English in the test, while only 1 participant tried out the Finnish version. This means that some of the outcomes of this user test apply only the English language version. However, most of the outcomes are such, that we do not expect differences between English and Finnish language versions.
Each user was given a mobile phone from where they called the voice-based version of Kamu. They used the external speakers of the mobile phone, so that users as well as observers and the interviewer could listen both parts of the conversation. We know that the echo of the room, might have had impacts on the speech-to-text-transcription of the implementation.
Each user was then given 2–4 tasks that we asked them to complete in separate phone calls. After each phone call the interviewer asked the user how they experienced the conversation, what came to their mind and how they felt about the answer Kamu gave them. Already during the conversation interviewer and observer took notes about obvious misunderstandings.
The research question in these user tests was:
Can the current voice implementation of Kamu satisfy the same user needs as Kamu’s text-based implementation?
The testing outcomes & possible solutions
First and foremost, most users enjoyed the conversations that had with the voice-based version of Kamu. For most users it was one of the first times that they talked to a robot, which also meant a sense of excitement in the room.
Many users had some frustrating experiences during the user test, nevertheless all users tried to complete their tasks as good as Kamu allowed them to.
In more detail we have identified outcomes in five areas:
Speech-to-text-transcription varies a lot
With the wide variety of users, backgrounds and levels of English language command the voice-to-text-transcriptions sometimes work well, but at other times they do not work at all.
Here are some examples of what the users said and what Kamu understood:
To solve the transcription problems, we could:
· try out the effects of using another default accent than American English
· try to deduce the dialect from geolocation or by using a fall-back cycle during the conversation to get best possible transcription. This means the system would go through possible dialects one after the other and choose the one that produces the best transcription.
· narrow down a use-case for voice-based Kamu, where inputs are less complex,
· try out other providers than Twilio
In our eyes, this is mostly a technical issue that needs to be solved technically. At this point, we do not want to limit the scope of the voice-based Kamu, because this might cause different problems for our customers.
Change of language during the conversation is not supported
Our chatbot provider Boost AI affords changing the language in the middle of the conversation, but the Twilio standard setup does not. Changing the language during the conversation resulted in a strange intonation of the replies which made them almost non-understandable.
We need to check for technical solutions on how to solve this. If it is not possible to change the language during the conversation, we need to disable this option in our voice-based chatbot.
Unexpected and unresponsive behaviour
During our testing sessions Kamu sometimes suddenly ended the phone call or not react to users’ inputs. Sometimes Kamu did not wait until the user had finished their question or it took very long before it replied at all. This category of unexpected behaviour is rather large and needs to be tested in more detail when we get closer to production use.
For most of the problems in this area we think by understanding better what happens technically, we may be able to find solutions for these problems. Those might be adjusting the maximum time a user can talk, before Kamu assumes that she has finished her question.
Talking speed and additional commands
Problems in this area include
· Kamu speeding up during long reply texts,
· missing pauses between options the users can choose from, and
· long pauses before a reply.
Additionally, users also requested features such as repeat a reply and interrupting Kamu during its talk.
Each of the issues in this category needs a specific technical solution. However, they all seem feasible to implement with a bit more time and budget. We need to remember, that we were testing an off-the-shelf-solution. Possible resolutions include
· shortening of reply texts
· consistent talking speed implemented technically
· pauses between action link options added technically
· additional features like “repeat this reply”, “stop talking”, “speak slower”.
Content adjustments needed
To support voice users better the content of Kamu needs adjustments in different areas to support a more natural feeling of a voice-conversation:
· replace “click” and similar words that suggest a display-based interface
· hide weblinks & send as email or SMS (needs to be checked how this can be done technically)
· shorten answers (to support a more natural spoken conversation)
· feature to repeat answers
· avoid one-word selections since they are harder to predict reliably
So, going back to the research question:
Can the current voice implementation of Kamu satisfy the same user needs as Kamu’s text-based implementation?
The answer is: No.
But at the same time, we think that we can solve many of the challenges that the voice-based technology has at the moment. The difficulties with speech-to-text-transcription is where we are most worried. We have a wider variety of users than most other Finnish government organisations have (or at least a bigger percentage of these types of users), this is why the speech-to-text-transcription reliability needs to be higher than currently.
Text- and voice-based chatbots
Based on our user testing session, we have also learned some new things about the difference between text- and voice-based chatbot conversations:
For both text- and voice-based Kamu it is challenging to answer complex inquiries. This is more prominent and visibly annoying to voice-based users, because they expect the same affordances that a human-to-human conversation has.
We realised that in voice-based conversations users seem to be more insecure about the answers. We think this has to do with the fact that users cannot see their input, and therefore are not 100% sure if Kamu got their question right.
In voice-based conversations users cannot read an answer several times to understand it fully. This means that we need to put even more effort into writing answers in simple language and keeping them short. As mentioned before, we also need to introduce additional voice commands, like “repeat your last answer” or “pause for a moment”.
When there is long lists of requirements or attachments, users have difficulties remembering the content when they only hear it spoken out. We will need to design the content in these instances even more specifically for the voice-based interaction. Kamu might have to make pauses after every list item or it might introduce a way to send those lists via text message or email.
Compared to text-based chats users ask more follow-up questions in voice-based conversations. They tend to ask, for example, what a specific requirement means in detail.
In our current implementation users do not have control over the speed of the conversation. There are no commands for “speak slower” and “repeat this”. A text-based chatbot allows to take your own time, read at your own speed, go back to read a previous answer, select a different option and so on. All these affordances we need to carefully think how they can be transferred to the voice-technology.
We have done the first round of our testing with English language. Only one test was run in Finnish. Our next step will be a low-level testing round with native Finnish speakers. Most importantly, we want to see how the speech-to-text-transcription works for Finnish language.
Secondly, this has been a technology-centred experiment approach: We have not yet thought about how the voice-based chatbot would work in combination with our human phone line personnel. We need to create a concept for this, if we decide that voice technology is in the state that we want to use it in production.
author: Suse Miessner
edited: Mariana Salgado