An Introduction to Snips NLU, the Open Source Library behind Snips Embedded Voice Platform
The Snips Embedded Voice Platform allows any device manufacturer to build a Private by Design voice interface to their product. It handles Wakeword Detection, Speech Recognition, and Natural Language Understanding, fully on-device, so that none of your private voice data goes to the cloud. The Snips platform is also open for non-commercial use, for anyone who’d like to hack a voice assistant at home. With a Raspberry Pi 3, a microphone, and an hour of your time, you can start controlling your home by voice, keeping full control of your data!
Today, the Snips team is making an extra step to promote the use of privacy-preserving Artificial Intelligence: we are fully open sourcing Snips NLU, our Natural Language Understanding library.
Snips NLU has been developed with accuracy and footprint in mind. We wanted our embedded solution to yield similar or better performances as cloud-based NLU solutions (Dialogflow, Amazon Lex, etc). We also wanted it to run anywhere: not only on servers but also on tablets, mobiles, or any connected device. To achieve this, we had to completely rethink how to build an NLU engine, both in terms of engineering and machine learning. Let us introduce you to Snips NLU.
Natural Language Understanding
NLU engines are used to power any chatbot or voice assistant. Their objective is to identify the intention of the user (a.k.a intent) and the parameters (a.k.a slots) of the query. The developer can then use this to determine the appropriate action or response.
Let’s start by looking at a simple example, and see what you would expect from a good NLU engine. Consider the following dataset, used to train a simple weather assistant with a few query examples:
Give me the weather for [tomorrow](snips/datetime)
Show me the [Paris](location)'s weather for [Sunday](snips/datetime)
And a short list of cities:
The first thing you want is that all the examples you give to train the model are correctly supported by the engine. This makes the system predictable and easy to use: if a query is not correctly parsed, then add it to the dataset and it will work right away.
Having this deterministic behavior is great for robustness and predictability, but a powerful NLU engine also needs to have some generalization power. You want the system not only to recognize patterns provided in the training set, but also all the possible variations that come from speaking naturally. If we go back to the previous dataset, it is reasonable to expect the NLU engine to parse a query like: “What’s the weather in Beijing right now?” even though it is not part of the training examples.
Lastly, you need something called Entity Resolution. Essentially, extracting the chunk “the third Sunday of March 2018” from the sentence “I need the weather for the third Sunday of March 2018” is a good first step. However, what you want to do next is call a weather API to get the weather, and there’s little chance that the API will accept raw dates strings as input. It will rather take a date in ISO format:
2018–03–18. The later is referred to as the resolved value of the entity.
Dates and times are examples of entities that can be resolved. There are many others such as number, temperatures, durations etc. We call this special kind of entities Built-in Entities because the engine supports them natively without requiring the developer to provide examples for them (which is the case for custom entities). The list of built-in entities currently supported by Snips is available here, we plan to add more in the future.
In order to satisfy these objectives: deterministic behaviour, generalisation power, and the ability to resolve entities, we built the processing pipeline described on the figure above. It receives text as input, and outputs a structured response containing the intent and the list of slots. The main processing unit of the pipeline is the NLU engine. It contains two intent parsers which are called successively: a deterministic intent parser and a probabilistic one.
The deterministic parser relies on regular expressions to match intent and slots, which results in perfect behavior on training examples but doesn’t generalize. This parser is the first to be used because of its strictness.
The probabilistic parser is used whenever the first parser fails to find a match. It uses machine learning to generalize beyond the set of sentences seen at train time, thus mitigating the limitations of the deterministic parser. This parser involves two successive steps: intent classification and slot filling. The intent classification step relies on a logistic regression to identify the intent expressed by the user. Slot filling consists in a linear chain Conditional Random Field (CRF), specifically trained to extract the slots of the identified intent.
Regarding the choice of models, we tried dozens of different architectures, including deep ones. We found that there was no significant gain using deep learning versus CRFs for this task, so we favored the lightest option.
We also reproduced in January 2018 an academic benchmark published last summer. In this article, authors assessed the performance of API.ai (now Dialogflow, Google), Luis.ai (Microsoft), IBM Watson, and Rasa NLU. For fairness, we used an updated version of Rasa NLU and compared it to the latest version of Snips NLU (both in dark blue).
The last step after identifying the intent and the slots is to resolve slot values. Converting a raw string into a resolved entity is often a complex task for which we rely on another Snips open source library: Rustling. This is an in-house re-implementation of Facebook’s great duckling library in Rust. The original algorithm was modified to make run times more stable with regards to the length of the sentences parsed. It resolves values such as dates, temperatures, durations, etc, as explained previously.
This whole pipeline has been designed to be both configurable and extensible. For instance, the CRFs in the slot filler can easily be replaced with something else. Each processing unit of the pipeline has its own configuration which can be tuned to adapt to custom use cases.
The Snips NLU Ecosystem
The Snips NLU ecosystem powers everything NLU-related at Snips. Snips NLU is used to train models generated in the Snips Web console, on Python workers. Snips NLU Rust is used to run inference everywhere: in our Web console using a Scala backend, or on-device, whether on Linux, iOS or Android. To get the same code to run in such diverse and constrained environments, we heavily bet on Rust. This modern language offers high performance and low memory overhead, as well as memory safety and cross-compilation as first-class citizens. A JSON serialization of trained models is used as the interface across Snips NLU libraries.
This makes Snips NLU the first open source NLU library which is fully portable.
Using Snips NLU on the edge or on premises significantly reduces the inference runtime compared to a roundtrip to an NLU cloud service. The memory footprint ranges from a few hundreds KB of RAM for common cases, to a few MB for the most complex assistants.
Our focus for the next months will be about improving our models, specifically the intent classification part, improving the way we handle resources, and adding support for more builtin entities and languages. Today, Snips NLU handles English, French, German, Spanish and Korean.
Last, we will be very attentive to the feedback we get from the community so don’t hesitate to file issues, and make pull requests. 😉
If you liked this article and want to support Snips, please share it!
If you want to work on AI + Privacy, check our jobs page!