Alexa Skills and the German Language
Nowadays Amazon’s Echo devices are part of a lot of homes in the German speaking area in Europe. Apart from voice assistants directly built into smartphones, the separate Alexa device is used regularly by people in Austria, Germany and Switzerland to enjoy their favourite music, listen to radio stations, setting their timers and even controlling their smart homes. Beside simple commands like “Alexa, play some Metal” or “Alexa, set a timer for 5 minutes” more in depth and complex conversations can be hardly accomplished with Amazon’s Natural Language Understanding (NLU).
After using and developing a fair amount of Alexa Skills targeting the German market I came to the sobering conclusion that there is one major reason why deep conversations are not common in German speaking homes when it comes to voice assistants: Natural Language Understanding.
More specifically, Amazon’s NLU. German might be considered a particularly hard language to learn and even harder to master but it is definitively not the most difficult language on earth. However its grammatical, syntactic and phoenetic rules are very complex and even German native speakers regularly fail to abide these rules in written and spoken conversations. My guess is that this grammatical complexity of the german language might be the reason why current systems are not able to handle German conversations as good as for example English ones.
Admittedly though, speech to text conversion is working quite well, especially when I compare Austrians who mumble and pronounce words completely different than Germans who speak “pretty” German for example. The major challenge is semantical understanding of text in the German language. Since this is not a scientific article I will spare you the details but in case someone is interested in a deep dive on the difference between English and German I highly recommend this book: A Comparative Typology of English and German by J. A. Hawkins (ISBN 9781317419723).
I daresay the NLU provided by Amazon is not sufficient for advanced German conversations. Not to mention the variety of German dialects consisting of contrary prononciation with words, just known ins specific regions and different positioning of words in sentences. When actually trying out the vast amount of skills for Alexa, someone can imagine why Amazon is promoting the quantity of skills available rather than their actual quality and usefulness. Most skills targeting the German market do not even deserve being listed on the Alexa Skills page since they neither serve a proper purpose nor might be considered useful. And yes this might sound very polemic, but I intend to provoke at this point. It is important to understand that I do not want to blame the skill developers at all.
I solely blame Amazon and its restrictions which prevent developers from building useful and more complex skills by connecting and using alternative NLUs besides the built-in, immutable German language model. I am sure, that Amazon’s skill building structure will become more sophisticated in order cope with more complex use cases, however at this point in time it is just not sufficient for useful skill development. It is maybe sufficient for command based skills without actual conversations where context does not matter. Anyhow it may be sufficient for simple command based skills without actual conversations where the conversational context does indeed matter.
Someone might ask why I think that I am in the position to accuse Amazon of such things. Well, we at our company (Leftshift One) building language models mainly focused on the German language in order to provide more accurate and fine tuned conversation building. As such, we wanted to connect Amazon’s Alexa devices with our ecosystem to be able to build skills that are actually smart and can handle complex conversations that use context information. So we planned to use Alexa as a speech-to-text interface for our conversations.
Two years ago, we already encountered Alexa’s weakness to not fully support getting the user’s utterance. At first there was hope because there was a workaround available. However, Amazon made changes and the workaround is not working anymore. After some research and experiments we encountered a way to bypass Amazon’s NLU again. We utilized Alexa’s custom slots in order to get the user’s utterances. So we created an overall intent which matches to all incoming requests and called it “AllIntent”. This intent has just one sample utterance which consists of a custom slot called “All”, as can be seen in the image below.
Now it comes down to randomness. Custom slot types are required to have some samples as well. In our case more than 3000. As can be seen in the image below, all we did was to shuffle the most frequently used German words and provided it as samples for our custom “All” slot type.
As can be seen in the JSON extract of the request to our server, the slot “All” holds the actual utterance I told Alexa. Unfortunately, there are caveats to this “solution”. When we tried to publish a skill based on this solution it got rejected due to the senseless, random samples provided for our custom slot type “All”. Despite the fact that we were not even able to publish a skill with this workaround, the question remains if it will work after future changes in Alexa’s skill building platform.
If Amazon keeps disallowing to plug in external language models or NLU APIs I do not think the quality and usefulness of skills will increase in the German speaking areas. Nevertheless, I am confident that Google Assistant will increase its market position and more people will use Google Assistants rather that boring Alexa. It has to be mentioned that Google Assistant is not much better when it comes to German NLUs but at least Google provides a way to directly get the user’s utterances.