Natural Language Understanding: What’s the purpose of meaning? (Part 1 of 2)

Published in

Data Science at Microsoft

13 min readFeb 1, 2022

Let me begin with my favorite memory on this topic — not a paper, but a cartoon. I recall to this day when, many years ago, an episode of Dexter’s Laboratory — as shown in the opening image above — asked the hardest question ever: What is the purpose of meaning? So difficult was this inquiry that Dexter had no alternative but to seek advice from the Grandfather of All Knowledge, a kind of science deity which, the episode being about nightmares, turned out to be his silly sister. How delighted I was when I ended up working precisely with meaning at Microsoft! After all these years, I’d understand and apply the purpose of meaning — at least regarding Natural Language!

In this first article in a two-part series, I provide an overview of some key elements in Natural Language Understanding (NLU) that I have found particularly foundational and useful, in order to help other data scientists and interested readers to enter and explore this area. I have no intention to write a canonical academic overview here. Rather, I give my own practitioner perspective, taking into account the actual problems I’ve seen and resources that I found important, including some pointers to tools and other practical assets.

To avoid misunderstandings

Before we proceed, it is important to emphasize that there are at least two ways to explore language understanding in NLP, which can be summarized as follows:

Explicit understanding: The extraction and direct use of meaning representations from natural language.
Implicit understanding: Everything else that requires (or seems to require) language understanding but does not produce explicit meaning representations. This includes many NLP tasks, particularly those making headlines today (e.g., about realistic text generation with GPT-3), because the more sophisticated an NLP task is, the more likely it is to require some level of understanding.

In this text, particularly the first part, I’ll focus on explicit understanding. Besides being a personal preference, this choice also mirrors historical developments, since many of the first NLU programs were dialogue systems that had to explicitly interpret and reason about user input (ELIZA being among the earliest and most famous ones; Wikipedia has a list of historical systems). Furthermore, this also seems to be a more productive approach to decompose such a study, because otherwise we’d need to talk about most of the NLP field — which, besides being impractical, would also dilute the central discussion about meaning we want to have.

What are you talking about? Semantic role labeling

Consider the following sentence:

“Schedule the planning meeting for next Monday.”

One thing we can do with it is to analyze its syntax, that is to say, its form: “schedule” is a verb, “the planning meeting” is an object, and so on. There are NLP tools to do just that. However, once we have the syntactic tree built, what is the computer supposed to do with it? That depends, for example, on what “schedule” and “planning meeting” means. NLU techniques can help the software understand that it must add a meeting regarding the topic planning for next Monday in some calendar. One way to denote this is through slot tagging or slot filling:

<action>Schedule</action> the <theme>planning</theme> <event>meeting</event> for <date>next Monday</date>.

This is an example of Semantic Role Labeling (SRL). Of course, the labels to use depend on where the meaning is coming from. Here, let’s assume that the sentence is being given to a personal assistant program, which handles meetings for the user. In this case, each such tag can be handled in predefined ways, their meaning given by what the assistant program will do with them! There are a variety of ways to do labeling, particularly when considering arbitrary English utterances, which we’ll see later.

The labeling itself can be implemented in many ways, for instance by any technique for sequence labeling such as Hidden Markov Models (HMM), Conditional Random Fields (CRF) or modern Deep Learning methods. From an algorithmic perspective, it is interesting to note that semantically different tasks might be handled in similar ways. For example, some existing technologies can be used both for the above SRL task and for Named Entity Recognition (NER), though it’s questionable whether “schedule” is really an “entity.”

In more complex situations, such as a chat bot that can handle multiple aspects of a user’s life (e.g., assistants like Microsoft’s Cortana, Apple’s Siri, or Amazon’s Alexa), we might need more abstract structures that are not directly represented in the provided text. In particular, it is often convenient to classify to which domain (in this case, “calendar”) and intent (in this case, “schedule_meeting”) the sentence belongs, because this adds a further layer of meaning to the interpretation, which can help in disambiguation (e.g., is the query about “calendar” or “project management”, which are handled by different software?) and in selecting cases that are manageable (e.g., “meeting next Monday” is not as easy to interpret as the above example, despite the fact that we can see that it is about a meeting event and a date).

Dialogue systems (e.g., chat bots and personal assistants) is a rich field in itself, with increasing popularity among consumers, and you can learn more here: A survey on dialogue systems: Recent advances and new frontiers. Because such systems often rely on spoken input, they are also considered under the field of Spoken Language Understanding (SLU). There are open source implementations of such technology, like Stanford’s Almond (plus related software and data), Rasa, and Snips NLU (which appears to be dead, though). For commercial applications, one might wish to use industrial solutions such as Azure Bot Service, Azure Language Understanding (LUIS), or AWS Lex.

A practical example from the LUIS homepage:

From syntax to semantics

The above example showcases the transformation of a given raw text into meaningful elements that can be easily interpreted by downstream processes in a domain-specific context (i.e., that of calendar software). We can also consider more open-ended utterances, using the whole of English (and other languages). This has been extensively explored in computational linguistics from various different perspectives regarding what to do with the lexicon we find in texts. Some well-known examples include:

WordNet: Possibly the most famous resource in this category, WordNet relates the meaning of words in a graph, which can serve as the basis for many tasks we shall consider later (such as Query Expansion for search engines). It is easy to use programmatically through the NLTK library.
PropBank, VerbNet, FrameNet: Different approaches to capture predicate meaning, notably through the definition of standard semantic (or thematic) roles, including the kinds of arguments they take and how they relate to each other.

By establishing a common understanding of English, they provide a basis upon which to make further progress in different applications. One important use is to produce reliable annotated training data for Machine Learning semantic parsers, the original motivation for PropBank as described in their paper From TreeBank to PropBank.

Though such resources are mainly about mapping lexical data into standard meanings (i.e., what various lexemes from English mean), they also often include representation aspects (i.e., how all of this can be written down). This can be justified on historical grounds, but it is much clearer if we can separate these concerns, and we do have alternatives to decouple them.

Semantic representations

Semantic content extracted from text must be represented in some way to allow inspection, analysis, and manipulation. Meaning representations are supposed to iron out the mess that natural language is, and therefore are typically expected (as adapted from Jacob Eisenstein’s NLP notes) to:

Be unambiguous. A chief advantage of converting a confusing linguistic representation into another format is to clarify what, exactly, is meant.
Link language to external entities and knowledge. Language, after all, is about things and ideas, so a semantic representation must clarify the connection.
Be computationally tractable. We want to ensure that analysis, inference, and composition can be done effectively by computational means.
Suit the domain of discourse. We need enough elements to be able to represent the language expected to be used, which can range from domain-specific themes (e.g., calendar events) to general language use (e.g., arbitrary English sentences).

There are many such representations, and one way to understand them is to realize their different historical backgrounds: Some originate mostly from linguistic tradition, and others from computer science and Artificial Intelligence, though we can see that they all tend to converge as common computational linguistic progress.

Meaning Representation Parsing

Given a piece of text, the task of extracting a meaningful representation is known as Meaning Representation Parsing (MRP). There are many ways in which we may extract and write down these representations, and even competitions about it. Perhaps the most obvious one would be some form of logical formalism, for instance through first-order logic, which itself has an underlying semantic representation. Logic, however, has problems: It abstracts too much, and the inference machinery, as well as the human eye, cannot use the structure of the text for free anymore! Thus, as so often happens in computer science, a more suitable representation would be welcome.

In terms of domain-specific applications, we can turn to databases and information retrieval for a good example. Researchers in these areas have tried for a long time to find ways to automatically convert Natural Language queries into executable database queries through so-called Natural Language Interfaces (NLI). There has been a recent (2017) resurgence in interest for this in the form of text-to-SQL (see Seq2SQL: Generating Structured Queries from Natural Language Using Reinforce), and since then, techniques have been showing increasingly good results — you can even try some in Excel these days. Related datasets include WikiSQL and CoSQL. Moreover, the same idea can be expanded to other languages and applications beyond SQL and information retrieval, for instance through semantic parsers like Genie (see also the related paper Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands).

Language meaning can also be represented in more abstract and general ways. Abstract Meaning Representation (AMR) is one alternative. It is a relatively recent (2013) and already popular graph-based meaning specification formalism. In AMR, the key semantic pieces of the underlying text become nodes in a graph, and the relationship among them is captured in the graph’s edges. Weak linguistic elements of the text are discarded and only the important components are preserved. For instance, from the AMR paper, consider the sentence “The boy wants to go”:

Note that “The” and “to” are discarded as they are not necessary for establishing the sentence’s meaning. Furthermore, note that the verbs are marked as “want-01” and “go-01”, implying that there are very specific meanings for each of these uses. These meanings, in turn, come from the PropBank dataset we saw before, so we can now better understand how these NLU artifacts can be used together for greater impact. The actual parsing of such structures is an ongoing research topic, but we can find practical tools such as amrlib, an add-in for the popular spaCy NLP Python library.

It is possible to go even further along this line of thought and parse meaning leveraging both representations we just saw, specifically as a dataflow graph resulting from program execution. This technique is currently being developed by the Semantic Machines group at Microsoft for dialogue systems. Although this is complex state-of-the-art technology, the following illustration from their main paper is quite informative:

Using graphs for meaning immediately suggests two directions to further explore semantic representations: the nodes, which are natural places to put entities; and the edges that connect them to establish relational meaning. Let us examine these aspects in more depth now.

Frames and ontologies

A crucial characteristic of meaning, as we have seen, is to connect language to the actual things it represents. One way to do so is to carefully structure the domain of discourse by establishing what entities exist and how they relate to each other. For instance, we might define that there exists a set of actions (e.g., “schedule”), that each action might potentially refer to an event (e.g., “meeting”), and that such an event has both a theme (e.g., “planning”) and a date (e.g., “next Monday”). Such a specification is now typically called an ontology, thus named after the traditional philosophy branch.

This is fundamental both for computational linguistics and Artificial Intelligence, with roots that can be traced back to the late 1960s and 1970s through some version of the notion of frames, itself inspired by research in psychology. Nonetheless, in practice it seems that the fields have historically progressed more or less in parallel, with variable overlapping along the way. In computational linguistics, we can see many of these ideas present in efforts like PropBank and FrameNet as we saw before. In Artificial Intelligence, the basic ideas around ontologies can be attributed to Marvin Minsky’s A Framework for Representing Knowledge article from 1974, though its modern formulation arises from semantic web research efforts over the past two decades. One notable result of the latter is the Web Ontology Language (OWL) standard in which to define such specifications — it provides not only a common syntax and format to use, but also a precise logical framework to support meaning representation and inference.

Over the years, these various ideas and technologies have grown and influenced each other. In practical terms, one popular tool that has been growing since the 1980s is the Protégé ontology editor, a very convenient tool to create and analyze ontologies, including those written with OWL. Though the semantic web per se remains elusive, much of the Web is already following semantic standards to some extent, for example:

Through the schemas found in https://schema.org/, used, among other things, to annotate web pages and help search engines understand what they crawl.
The (now quite large) knowledge base found in Wikidata. You can even use SPARQL to query it. I wish I had more time to play with this!

Furthermore, these kinds of ontologies have also found use beyond the Web, particularly in the biomedical field, as the SNOMED CT clinical terminology demonstrates.

Though not often recognized as such, a very successful industrial outcome from the broader research effort, inheriting both from linguistics and computer science traditions, is the Siri personal assistant now found in many Apple devices. Siri was created by the company co-founded by Tom Gruber (the researcher responsible for our use of the term “ontology,” who famously defined that “an ontology is a specification of a conceptualization”) and later acquired by Apple. This success was quickly noticed and followed by others, notably Amazon’s Alexa, Google Assistant, and Microsoft’s Cortana. Interestingly, we can thus trace what began thousands of years ago as philosophical speculation to the gadgets found in our pockets today.

Meaningful relationships as graphs

Entities hardly exist in isolation; we have seen that they often make reference to other entities. By explicitly considering and representing these relationships themselves, we obtain so-called semantic networks. This allows two things: a shift in perspective, so we can still leverage ontologies and similar formalisms while profiting more from the relations defined therein; and, more broadly, the replacement of whatever logical machinery exists to relate concepts with (possibly simpler) graph-based methods. In practice, one may then, for instance:

Simply follow links in a data structure to make inferences, instead of dealing with the intricacies of theorem proving.
Build huge graphs to capture as much knowledge as possible about the world, instead of considering small sets of carefully defined relations regarding a limited domain of discourse.
Use graph databases such as Azure Cosmos DB via Gremilin API and Neo4J.

As we saw above, perhaps the most famous (though surely not the oldest) such network is WordNet, which relates the meaning of words according to well-known linguistic categories (e.g., noun, verb, adjective) and relations (e.g., hypernym, hyponym), resembling a kind of dictionary. ConceptNet is another, more recent, take on capturing the relations of words and their underlying concepts. However, they fundamentally differ regarding the nature of relations to be captured: While WordNet focuses on standard linguistic relations, ConceptNet aims at recording “common sense” knowledge, thus following a longstanding AI tradition — and showing once again the influence of the different research traditions we find in NLU. The contrast is perhaps easier to grasp by inspecting an example, such as how each graph presents the word “cat”:

Many applications can be built on this basis. For example, take the synonyms in WordNet. In search, we can do query expansion, which is to automatically rewrite the user’s query (e.g., by adding synonyms of the search terms) in the hope of matching more relevant results (thereby increasing the search recall). The same technique can be used to perform data augmentation for training ML models, so that the model can learn from more varied examples.

Tech companies also leverage such graphs to directly provide services in different industries. Google and Bing have underlying knowledge graphs to present direct answers to search queries, and Microsoft has recently (2019) released Project Cortex, which organizes internal customer data into topics and relations that are meaningful (and private) to each specific customer. More niche and smaller Web companies can also profit, like Pinterest with its Taste Graph.

Coming up next

We have seen how and why NLU is used for extracting explicit meaning representations from language, including some of the historical roots behind these developments, and practical resources to allow one to start working with it. Next up in Part 2 of this two-part article series, I continue this investigation, exploring further the relation between language and the world, showing how meaning plays a broader role in various Natural Language Processing tasks, and providing additional learning resources.

Paulo Salem is on LinkedIn and paulosalem.com.

Be sure to check out the conclusion in Part 2 of this two-part article series:

Natural Language Understanding: What’s the purpose of meaning? (Part 2 of 2)

This is the second of a two-part article series investigating Natural Language Understanding (NLU). In the earlier Part…

medium.com