Siri, Alexa, and Other Voice Assistants Struggle With Bilinguals
Ok, Let me be honest with you guys. I never use Siri on my iPhone. When I’m not home, it's easier for me to grab my phone than to deal with Siri and the possibility for her to misunderstand me.
Even if I want to ask Siri to call someone for me, there is a high chance she will not understand me — Let me explain.
I speak three languages, and I somehow use all three languages at once. Even though my iPhone is set to English, many of my phone contacts are either German or Arabic name. So even if I want to tell Siri to call someone for me, she will probably be too focused on the English pronunciation, and she will understand me wrong — she isn’t that smart. Even google on my iPhone isn't performing correctly when it comes to names, street names, and other easy functions.
Keep in mind that 60% of the world’s population speak more than one language, and we feel left behind.
I set Siri in my phone to Arabic for a previous story that I wrote about voice assistants and Arabic, ever since Siri on my iPhone speaks Arabic. Again I never use Siri, so there was no need to turn Siri back to English.
Last week I received a text while wearing my AirPods. The text was in English. However, Arabic Siri tried to read it, and the results were just weird — it was gibberish.
Suddenly it hit me how stupid Siri must be even to try to read it in Arabic. If you are not familiar with Arabic, Arabic uses the Arabic alphabet, and it is written from right to left. An Arabic Siri should immediately and easily detect that the SMS is not in Arabic (Latin Alphabet), and Siri shouldn’t have even tried to read this SMS. At least if she did this, I’d consider Siri a bit smart.
Apparently, Siri is programmed to read aloud while running simple algorithms to detect the text language. However, this is not a trivial task, as we might think.
Don’t even let me start on Google. Google Maps can’t even read German street names correctly if the phone language is set to English. It would read the street names as a white middle-class American. Even though I’ve told Google not to translate German for me, my phone even tells Google that I reside in Germany. Yet, those voice assistants fail us — bilinguals.
Is there a reason why smart assistants struggle with languages? Even if the two languages use different alphabets (Arabic and Latin), why can’t smart assistants detect different languages, which we set up on our phones?
This is an important topic, believe it or not. Voice personal assistant market is projected to reach over $5 billion by 2025. And I’m sure that I’m not the only bilingual person using Siri and other smart assistants.
How do machines speak?
Alexa, Siri, Cortana, and other voice-activated AI understand our speech by using NLP techniques. NLP stands for Natural Language Processing.
NLP is a subfield of linguistics, computer science, and AI concerned with computers and human languages’ interactions. The main challenges in NLP are speech recognition, language understanding, as well as language generation.
Computers do not learn a language like us humans; computers learn languages through statistics and data science. For example, in its early days back in 2006, Google Translator translated the original text to English and then to the target language by cross-referencing millions of documents and transcripts from the United Nations and the Europen Parliament.
Today, Google Translator is still faulty, but it reveals the translated texts' broader intent. In 2016, Google announced that the service is now based on a neural machine translation premise, which can translate whole sentences at a time from a broader source of linguistic sources.
So, the more linguistics and language data we feed into a computer, the easier it gets to teach it a language. However, when it comes to detecting different languages as input, this is where all AI assistants still fall behind.
Smart assistants speak different languages, but
As I told you before, Siri, Alexa, and all other smart assistants speak multiple languages. However, they cannot understand these languages together. Let me give you an example.
The Siri on my iPhone speaks American English. She has no problems with English names; she will pronounce and understand English names no questions asked. However, when I ask Siri to call my friend “Mohammed,” she does not understand his name if I pronounced it correctly in Arabic. I have to butcher the name with typical American English pronunciation for Siri to understand me, which is frustrating.
Even if I asked Alexa to play a certain Spanish or Arabic song, Alexa would never understand me unless I speak like a German, who has no idea how to pronounce Spanish or Arabic sentences correctly.
P.s.: Even though my iPhone is in English, Our Amazon Echo had to be installed in German to use Alexa’s skills in Germany.
By the way, the smart assistants also have a problem with my accent when I speak German or English to them, and my accent isn't even that strong compared to others around me. This is also a reason why I only use Alexa to control the lights and set timers and nothing else.
The Washington Post released “The Accent Gap,” a study in which the Post found out that smart speakers do not perform well when people speak to them with an accent. When it comes to American English, the smart speakers and their smart assistants rely on what is called “broadcast English,” which the paper classifies as “predominantly white, nonimmigrant, non-regional dialect of TV newscasters.”
Smart speakers and smart assistants are programmed, trained, and tested by native speakers. Therefore, they struggle to understand people with accents. The more data we and input we give these smart assistants, the better they will eventually get. For example, I can report every Alexa mistake to Amazon if I want to share this information with them. However, people might be reluctant to share these mistakes with big tech companies due to privacy reasons.
Now imagine the challenge for smart assistants and their programming. For smart assistants to be used by 60% of the world’s population without any problem, voice assistants must understand different accents. On top of that, the AI has to recognize code-switching or conversations in two or more languages.
Teaching voice assistants language and speech recognition is difficult. The word’s position in a sentence and its prefixes and suffixes are characteristics on which the computer bases its data to recognize your commands. On top of that, add idioms and some colloquial sayings or regional dialects, and you got yourself everything necessary for the voice assistant to recognize speech and be classified as smart — and this is all for one language only.
The complexity of speech recognition makes it challenging for Siri and her friends to be bilinguals. The sentence structure is important for them to understand our commands. Different languages have different structures, and the AI cannot keep track.
With the increasing popularity of smart speakers and voice assistance, big tech companies are looking at bilingual users and their problems. The competition in this field is fierce, especially between Amazon and Google, and each company aims to be the first to solve this problem to sell their devices to the 60% of us who speak more than one language.
Google assistant in Google Home and mobile devices is bilingual since August 30th, 2018, and can be used in two different languages. One user can use Google Home in English, and another can person in the same home can use Spanish to communicate with the smart speaker. Nevertheless, the assistant will always answer in the used language, and it still does not properly understand accents and code-switching. But, Google is on the right track to dominate this segment.