Better Linguistics for Better Voice Assistance

How we use linguistics to advance multilingual NLU.

Daniel Galbraith
Mosaix
6 min readMar 27, 2019

--

Decoding human language into 1s and 0s.

In recent years, Language AI or the use of AI for tasks which require understanding human language has exploded. A plethora of real-world applications are built on natural language understanding (NLU), everything from autocomplete to ordering pizza. Now we expect to talk to our smart devices and receive instant, intelligible and relevant responses, in an ever-expanding number of domains. Conversational AI has improved so rapidly that users are often surprised by the great strides of progress made since even a couple of years ago. It is now unremarkable that we can use voice command to set reminders, find restaurant recommendations, ask what the weather will be tomorrow, or translate foreign text on the spot. The big data and machine learning revolution has fueled this rapid technological advance.

NLU, however, is not a completely solved problem. Humans are complex, and the human means of communication is complex, especially the multi-layered system of human language. Some issues in language processing, for instance part-of-speech tagging, are considered to be well covered for industry purposes: we achieve a good degree of accuracy for the common use cases (though there is room for improvement even here, see e.g. this paper). Other issues are more intricate and less understood even from a theoretical perspective, e.g. how to represent world knowledge and leverage it to imitate actual human reasoning. Part of the solution assuredly involves improving our deep learning models, tailoring them to the tasks at hand, and exploring new methods of getting computers to learn. Another important dimension, however, is having robust linguistics “under the hood”, and a deep understanding of how language works in the first place. In this post, I discuss two areas of difficulty in NLP: syntactic parsing and multi-language support, and how we use linguistic knowledge to continue the advancement of Language AI.

Syntactic parsing

Syntax, in a human language context, is the study of how meaningful sentences are structured. This requires understanding how parts of sentences relate to each other, and how this differs across languages. Some obvious applications of this are splitting queries into machine-intelligible chunks, for example “tell me the weather in San Francisco tomorrow”: the command “tell me”, topic or intent “weather”, and relevant phrases “in San Francisco” and “tomorrow” must be parsed. Parsing in a syntactic sense means assigning a representation of labels and structure to a sentence, which in theoretical linguistics is a hotly debated topic. The most common industry solution for syntax is known as dependency parsing, a representation of how tokens depend on each other, which often feeds named entity recognition (NER), the step which identifies the entities of interest. Dependency parsers such as Stanford/Universal Dependencies are integrated into most of the common NLP toolkits (e.g. NLTK, spaCy, CoreNLP). Below is an example of a dependency parse tree:

Dependency parse visualization from displaCy (spaCy Python package)

As well as POS tags on the bottom line, two crucial pieces of information are represented here: types and relations. The types or labels, e.g. nsubj, prep, dobj, come from a pre-defined set of categories which correspond to traditional grammatical notions like “subject”, “preposition”, “direct object” etc. The relations are effectively asymmetric links between nodes, e.g. here “intelligence” is the dobj (direct object) of “delivers”. With this information, we can improve NLU models by providing disambiguation: for instance, if we know that in the sentence “show me what I ordered just now”, “now” is a advmod (adverbial modifier) dependent on “ordered”, we are less likely to get bad results downstream like interpreting “now” as dependent on “show”, or simply ignoring it altogether. Time and location phrases can be especially difficult for NLU systems: in navigation, a complex query like “find Starbucks within 10 miles of my destination” requires mapping to some kind of semantic denotation like “within [DISTANCE] of [PLACE]”, a task made significantly easier if we know that “destination” is the pobj of “of”. At Mosaix, we are investigating the use of dependency parsing as a layer of syntactic features to feed our deep semantic model. With the addition of domain-specific, gold-standard dependency annotations, this approach promises to advance the status quo of voice search AI.

Multi-language NLP

Another challenge for Language AI is providing support for multiple languages. Linguistics 101 will teach you that languages are both strikingly similar and strikingly different: similar, in that we only observe a small subset of logically possible human languages, and different, in that there is a wide range of variation within that subset across many dimensions. Hence, the model we assume for English will typically not easily transfer to Mandarin, Ukrainian, or Zulu. Much has been written elsewhere on this problem, but here I will focus on one particular example of differences between languages that is often overlooked by native speakers of English.

Morphology concerns word forms and parts of words, including what has been traditionally called “inflection”, or “declension” of nouns/adjectives and “conjugation” of verbs. English is relatively impoverished when it comes to inflection, whereas languages like Hindi or Bengali rely on this to a far greater degree. One important kind of inflection is known as case. English has lost case everywhere except in pronouns: the difference between “he” and “him”, “she” and “her”, “they” and “them” is one of case. In languages with cases — e.g. German, Russian, or Turkish — the ending of the word is used for distinguishing subjects and objects, or the roles played by the participants in the verb. Some languages use case for a wider range of functions, for example the equivalent of English prepositions “to”, “from”, “inside”, “on” etc. are conveyed in Hungarian by word endings: lakás “apartment” becomes lakás-hoz, lakás-tól, lakás-ban and lakás-on to convey those same meanings.

German graffiti showing the cases of the definite article

This presents a challenge for NLU, because it means that the same noun may have several different forms which all must be identified as the same entity. For instance, in Bengali the same name খান Kha-na “Khan” may take the forms খানকে Kha-na-ke or খানের Kha-ne-ra depending on its role, but all three of these must be matched to the same entity in NER. In languages with many cases like Russian or Finnish, long lists of forms must be accounted for, e.g. the Finnish word kirja “book”, for entity recognition purposes must match all of kirjan, kirjaa, kirjassa, kirjasta, kirjaan, kirjalla, kirjalta, kirjalle, kirjana, kirjaksi, kirjatta, kirjat, kirjojen, kirjoja, kirjoissa, kirjoista, kirjoihin, kirjoilla, kirjoilta, kirjoille, kirjoina, kirjoiksi, kirjoin, kirjoitta and kirjoineen! This is essentially a stemming problem, i.e. matching inflected word forms to their stems; it is easy for a human to see the pattern kirj- behind the Finnish example.

Many good stemmers already exist in industry standard toolkits, but many major world languages still lack these resources. At Mosaix, we are working on developing models for Hindi and Bengali text which integrate stemming and character-level features. Recent research shows that character-based neural approaches to machine translation show improvements over word-fragment models, which also looks to be a promising research avenue for stemming and parsing, particularly for these morphologically rich languages.

Finally, none of these advances in Language AI would happen without access to large quantities of high-quality data, especially when it comes to tasks which involve predicting labels or linguistic categories. Streamlined data collection pipelines have proven invaluable for this, which is why Mosaix is currently developing easily deployable interfaces for multi-language annotation. An understanding of how to properly delimit the problem in linguistic terms, and how the solutions will inevitably differ across languages, is crucial for NLU to keep pushing the boundaries of what is possible.

--

--

Daniel Galbraith
Mosaix
Writer for

NLP Scientist at Mosaix. Stanford PhD graduate in Linguistics. Working on research and development of cutting-edge, multi-language AI for the emerging markets.