Machine Learning and Voice Assistants

Voice recognition is starting to show up everywhere in everyday life. Currently, two Voice Assistants dominate the consumer market; Google Now and Apple’s Siri. While both have vastly grown to what they are today, they still aren’t quite 100% reliable, especially for people with foreign accents or cultural language differences.

Voice Assistants have long been a product of machine learning and have long been considered more of a joke than something serious. Only when Apple began packaging Siri on iPhones by default and heavily marketing it in 2011 did voice recognition seem to pick up with the public. But still, it had numerous issues, often returning the infamous “let me google that for you,” which was not always the most helpful. In 2012 Google released their own voice assistant, Google Now. Where Siri searches for information from specific validated websites, Google Now takes its data from any relevant website. This allowed it to return much more information, but sometimes not always correct information. Additionally, Google Now had the same issue as Siri with foreign accents or less than normal name pronunciation. The only solution to this was to gain more training data through consumer usage.

In 2014, Google Now made significant progress with accents when it developed a method of identifying the source of the accent. This allowed it to be much more accurate with foreign accents than previously before. Additionally, it now allowed you to train and “personalize” your Google Now voice recognition to give it better data sets and hopefully better understand you.

In 2015, Apple released iOS9, which included the “Hey Siri” voice recognition training. This worked much like Google Now’s training where you spoke a few different sentences to Siri to get it to better have personalized voice recognition data on you.

If you are an American English speaking user of Siri or Google Now, it works out pretty well, with many studies finding 70–80% accuracy in returned or relevant messages. But if you are from another country with a different cultural language or a thick accent, that 70–80% falls very fast. This is the current Achilles heel of voice assistants, the inability to detect an accent and properly correct the cultural language for it. Google has made some leeway here, but it mostly only works for American’s with light accents, and does not translate well into cultural language differences like British Cockney or Australian Broad.

References:

http://www.imore.com/siri

http://www.androidcentral.com/google-now