Private & Context-Aware Speech Recognition with Snips

Joseph Dureau
Snips Blog
5 min readApr 24, 2018

--

The development of Voice Interfaces over the last years has been exhilarating. They have evolved from spotting limited and predetermined keywords in a sentence to understanding handling any formulation of given intentions. You can now simply talk to Voice Assistants using natural language. They also became much more reliable: state of the art Speech Recognition engines have reached the human level. They basically make as few mistakes as professional transcribers. Last, the speed at which the voice ecosystem evolves is breath-taking. Every day, new integrations are made with new connected devices, so they can be controlled with voice. The way we interact with technology is undergoing a fundamental revolution.

Yet, something’s missing

However fast they progress, Voice Assistants still feel somewhat limited. Something’s missing. I’ll take Google Assistant as an example, that is one of the best in class. Privacy concerns set aside 😓, it works very well. Regarding music, it is excellent at recognising popular artists, and even artists slightly below the tip of the iceberg. But when it comes to discovery mode, one of the best features brought by music streaming platforms, things get trickier. Each week, I spend hours listening to my Discover Weekly playlist on Spotify, through which I regularly stumble upon artists I really like. I can listen to an artist like Lucas Hart for hours, but whenever I say “Ok Google, play me Lucas Hart”, Google Assistant systematically fails, however hard I enunciate. And don’t get me started on my brothers’ band, that I simply cannot get my Google Assistant to play from Spotify!

Systematically testing this issue on my account, and on the ones of a few friends, revealed that about 1 out of 3 artists being suggested in Discover Weekly playlists fail to be recognized by Google Assistant. Even when the assistant is given access to the corresponding Spotify account. The logs of the assistant show that the issue comes from Speech Recognition: the names of the unrecognized artists get wrongly transcribed. To be honest, most often than not, a human without knowledge of my musical tastes would not have done better.

It is likely that these artist names actually belong to the vocabulary of the Google Assistant Speech Recognition engine, but from a generic standpoint they are just less likely than other possible transcriptions.

Giving these artists names more weight in a context in which the user is more likely to say them is called Context Awareness. This Context Awareness is a piece that is critically missing in current Voice Assistants.

Like talking to a stranger

Granted: Discover Weekly playlists are updated weekly, the featured artists are generally not famous, and these artists are ones Spotify thinks I’ll like rather than artists I’ve actually listened to for hours. So I thought I’d test my Google Assistant with a more straightforward Context Awareness example: when Google Home lets me define new calendar events, how good is it at recognizing participant names taken from my Address Book?

My Address Book is a stable and explicit list of people I care about, that I’m susceptible to organize things with. But yet again, the fact that Google Assistant’s Speech Recognition is not Context Aware led to frustrating performance. Similar to what we observed for Discover Weekly artists, about 1 out of 3 contacts names from my Address Book and the ones of my friends did not get recognized. That’s just the level of ambiguity there is when you lack context. It feels like talking to a stranger.

Snips: a Private, and soon Context Aware Voice Platform

We believe Context Awareness is going to be the milestone that will bring Artificial Intelligences to the next level. Performances of current solutions are basically as good as the ones any professional transcriber. Yet, a close friend would fare much better, because they know you.

For Voice Assistants to become Context Aware, you need to trust them with data about your context. We are talking about your Spotify streaming history, your Address Book, and potentially many other sensitive data sources. Which means that Context Awareness is tightly linked to the question of Privacy.

This is a question the Snips team has addressed head on, by building a Private by Design solution. The Snips Platform runs Wakeword Detection, Automatic Speech Recognition (ASR), and Natural Language Understanding (NLU), 100% on-device. Performances similar or better to cloud solutions are achieved by specializing the vocabulary understood by the platform, based on the use cases it is expected to handle. From turning lights on and off to playing music, or registering events in your calendar.

Until now, ASR and NLU models were trained once and for all on Snips servers, based on popular entity values, and deployed for inference on device. To make our solution Context Aware, we needed to make these entity values dynamic. We needed to be able to locally update the vocabulary these models could handle, adding new artist names, new contact names, etc, depending on the context.

Working on this feature over the last months has been a very fun ride. On the NLU side, we made sure that our models would generalize with very high accuracy beyond the set of entity values (artists, contacts, etc) seen in the training set. On the ASR side, we completely re-structured our decoding strategy so entity sets could be dynamically updated, on-device. It is now possible to extend the vocabulary of the Snips Recognition engine, keeping only a single core busy, for up to a few minutes. In the meantime, the assistant remains live, which makes the update process completely transparent to the user.

We call this new feature Entity Injection. It is now part of the Snips platform. Taking the form of a simple local API, it is trivial to implement Skills that know the music the end user likes, the people they care about, etc. Entity Injection is a significant milestone to make Snips not only the first Private, but also the first Context Aware Voice Platform.

We’re looking forward to seeing what you build with it!

If you liked this article and want to support Snips, please share it!

Follow us on Twitter jodureau and snips.

If you want to work on AI + Privacy, check our jobs page!

--

--