Natural Language Understanding: What’s the purpose of meaning? (Part 2 of 2)

Published in

Data Science at Microsoft

10 min readFeb 15, 2022

This is the second of a two-part article series investigating Natural Language Understanding (NLU). In the earlier Part 1, we have seen some of the most intuitive aspects of NLU, for instance how to assign meaning to sentences (e.g., user commands) by enriching them with labels and other auxiliary information. We also have seen various ways in which semantic information can be represented. In this last part, we go deeper into the muddy waters of meaning, discussing more intricate aspects of the relation between language and the surrounding world, which leads us to consider the broader role of semantics in Natural Language Processing as a whole — and, at last, the purpose of meaning! To conclude, I provide some references to courses, books, conferences, and articles for further learning.

Grounding

Few people would dispute that there is, in fact, an external world in which things happen. In language this means that, whatever is written or spoken, it refers to entities outside the text or speech itself. In the meeting scheduling task we saw earlier, the main entity is calendar software, like Microsoft Outlook or Google Calendar, which in turn has various elements, such as meetings and alerts. These actually do something in the world: A meeting might pop up an alert in your screen that produces light that hits your eye, getting to your brain and making you move. Furthermore, even in that case the task is not so simple, because a command to schedule a meeting might fail (e.g., if there are no free slots in the requested date), so some further understanding of how the calendar domain works is beneficial.

Another way to approach grounding is to reflect on whether intelligence can be disembodied and exist independently of a particular context. One might argue that it can, because, say, math is universal and 1–1 is always 0, no matter the context. However, one can also hold that it cannot, because after all such mathematical statements are only apparent to human intelligence, animals everywhere happily ignore the existence of zero and the laws of arithmetic. And there’s more: How can we be sure that our human minds can, indeed, apprehend all such mathematical (and other) truths? These philosophical reflections have practical consequences for Artificial Intelligence: If you believe in disembodied intelligence, you might — for instance — decide to work on general theorem provers detached from the world, whereas if you hold the opposite belief, you might first try to refine the scope of the specific intelligence you seek to match some desired context (physical or logical). I recently saw two good examples of the latter: in the paper Inverse reinforcement learning with natural language goals in which robots experiment with their environment in order to learn how to better connect it to language (using the Room-to-Room dataset, which is itself quite impressive), and another in the paper Lifelong and Continual Learning Dialogue Systems: Learning during Conversation, in which the user is capable of teaching the meaning of new language terms by grounding them in concrete actions.

Humans themselves seem to adopt the latter approach. A child will not perform arbitrary experiments to learn about how things work, but rather use the ones that his or her environment affords, and eventually master the connection between that and language. At the species level, cognitive capabilities are similarly shaped according to environment conditions and selected through evolution.

Grounding, moreover, is often relative and contextual. For example, what is the color “blue”? This question brings a few problems:

A painter, architect, or other color specialist may have a different conception of blue than someone who rarely thinks about colors.
If you are seeking a “blue” among some color options, your choice might differ depending on what these options are. Personally, I hate marine blue because it often tricks me into thinking it is black, except of course when it is beside an actual black piece and ruins my outfit.

The following slide (from Stanford’s NLU course slide deck) demonstrates the potential contextual influences regarding color interpretation:

A slide demonstrating the influence of context on color names.

Here’s a related real-life example of the above color ambiguity. I recently took the following picture of the new signs at Ibirapuera Park in São Paulo. Can you decide if they are green or blue? When I saw them live, they looked greenish, but now, looking at the digital version, they look blueish. The color’s identity is so unstable that it cannot survive digitization!

Regarding how specialists’ perceptions differ, consider this. Watercolorists interpret colors in a very characteristic manner and have special names for their colors, including the blues and greens, as you can see here:

Consequently, having firm roots in the real world is not always a straightforward mapping from language to entities. It often requires careful consideration of how the world works to establish the link. This should not come as a surprise; after all, we use language almost always to talk about the world.

Syntax + semantics

So far, we have emphasized the difference between language and the world where it is uttered, which leads to questions about how to connect them, for instance through grounding considerations. However, obviously the structure of a language must itself reflect the nature of the world to a considerable extent, because otherwise it would be useless for communication and thinking. Hence, it must be possible to study meaning within language only, without direct reference to external entities. Indeed, Machine Learning advances (notably involving Deep Learning) have revealed many ways in which these are much closer than might be expected at first.

Distributional semantics

A central idea in this regard is that word usage is not random; words appear near other words to express meaningful concepts. As was famously stated by a celebrated linguist:

“You shall know a word by the company it keeps.” — Firth, J. R. (1957)

One classic approach is to study various distributions regarding how words are used, from which distributional semantics can be derived. Here’s a wonderful example adapted from Jacob Eisenstein’s NLP notes (originally from Automatic retrieval and clustering of similar words). Suppose we are given a mysterious word, tezgüino, and asked to determine its approximate meaning from the following usage contexts examples:

“A bottle of _______ is on the table.”

2. “Everybody likes _______.”

3. “Don’t have _______ before you drive.”

4. “We make _______ out of corn.”

What do you think tezgüino is? It is probably some kind of alcoholic beverage, though not wine (which fails in context #4), right? It could be bad beer made of corn, though in that case I do not think everybody would like that (fails in context #2). It is certainly not an adjective such as “loud.” And it has some resemblance to “tortillas” (Contexts #2 and #4). If we put these considerations in a table, we can see how the meaning of each word gets distributed over the four contexts.

A table showing the relation of each word to a context.

In Machine Learning, this idea is implemented by various techniques that assign meaningful numeric vectors to words or sentences, known as distributed representation because meaning is spread in many dimensions of a vector. What exactly is distributed (e.g., words) over what (e.g., usage contexts) can vary depending on the technique. These representations are also known as embeddings, because the word or sentence in question is inserted into a vectorial space prior to further processing. Notorious examples include GloVe and word2vec. Topic modeling is one rather direct application of these and related techniques, implemented for instance in the Gensim Python library.

Once a text is embedded in such vectorial spaces, various downstream tasks are made easier (or possible at all), which takes us to the next topic.

Implicit understanding

As I promised in the beginning of part 1 of this article series, most of what we have seen so far is in regard to the explicit extraction and use of meaning. Even for embeddings, which produce less interpretable results, we can still find meaning if we look hard into what concepts each position in the vector correlates with. However, there are many interesting tasks that seem to require understanding but do not need (or allow) that to be explicit. These include well-known problems in modern NLP research, such as:

Sentiment analysis: Given a text (e.g., a product review), what is the emotion it conveys (e.g., “positive” vs “negative”)?
Question answering: Given a question, produce or locate the appropriate answer.
Summarization: Given a text, generate a smaller version of it, either by picking the most informative sentences (extractive) or by paraphrasing the text’s content (abstractive).
Inference: Given two sentences, is there an entailment or contradiction?
Relational knowledge extraction: Extract meaningful relations from text.

A key problem here is that the nature of the meaning being used remains elusive, despite impressive results. We can agree that a text summarization is good, but why exactly? If the summarization is abstractive, where exactly did the meaning come from in the original text? And why was it transformed in this or that way? It is hard to give a precise answer, also because the most successful systems for these tasks are based on Deep Learning approaches, whose behaviors are often hard to understand precisely. Perhaps that’s also why many clever benchmarks have been devised to help evaluate such systems, notably the General Language Understanding Evaluation (GLUE).

Large language models such as BERT or GPT-3, which often underly the specific tasks above, seem to capture a lot of general understanding, though in ways we still cannot fully grasp — this is an active area of research, which some call (amusingly) BERTology. Some of this implicit understanding can even be extracted and made explicit (see, e.g., Inducing Relational Knowledge from BERT). Nevertheless, despite their amazing feats, they do make very simple and systematic mistakes. One interesting approach to show that is through behavioral testing, as demonstrated in the paper Beyond Accuracy: Behavioral Testing of NLP Models with CheckList using the related CheckList tool. Incidentally, this is a great way to test NLP models in general, not just to stress the state-of-the-art ones.

A convenient way to try and learn many of the above models and techniques is through Microsoft’s NLP recipes GitHub repository, which includes many hands-on programming examples of NLP and NLU problems. The spaCy library can also be a good starting point for NLP projects in general. For industrial applications, besides programming custom ML solutions, one can also leverage cloud services, such as Azure Text Analytics offerings.

Beyond Natural Language

Once meaning is extracted from language, we need to store and use it somehow. We have seen that there are use cases in which this is immediate, for instance by creating a new calendar event in direct response to a user command. However, it is also possible to build complex long-term structures (e.g., knowledge bases), which can be reused in many contexts in the future and not just to act upon a user command. Here we approach the territory of Knowledge Representation and Reasoning, which can provide and leverage representation such as the ontologies and semantic networks we saw before.

Furthermore, it is important to remember that semantic concerns exist in many other areas beyond Natural Language, such as:

Logic
Programming languages
Formal methods and verification

Meaning, it turns out, is pervasive in computer science, it is the other side of many coins, it is the root of many trees! So…

So… what is the purpose of meaning?

After all this, can we solve Dexter’s riddle? At least in computer science, and NLU in particular, it seems clear: The purpose of meaning is to make computers do things we want from the words we say or write, either through natural or artificial languages. The better we are in establishing meaning, the better our computers can follow our purposes.

But wait! We have a new, harder, question now! What is the meaning of purpose?! Give me another 20 years please, I’ve barely begun investigating this.

Courses, references, and conferences

Here’s where to learn more.

Courses

Stanford’s CS224U: Natural Language Understanding. A great course with many of the points I mentioned above. There are recordings and class materials available online for past editions of the course.
Stanford’s CS224n: Natural Language Processing with Deep Learning, their well-known NLP course. There are recordings and class materials available online for past editions of the course.

Books

Eisenstein NLP notes. Among other NLP topics, it has several chapters on semantics; I learned a lot from this.
Spoken Language Understanding: Systems for Extracting Semantic Information from Speech: Many concepts used in dialogue systems apparently originate in spoken context, so even for text-only systems this background is relevant.
Linguistics: A Complete Introduction: An easy-to-read introduction to linguistics. I’m sure there are many other good books around, but this happens to be the one I’m reading. In particular, the chapters on semantics and pragmatics.

Conferences and workshops

The Association for Computational Linguistics (ACL) is the leading Computational Linguistics society. It promotes various conferences and workshops, notably:

Artificial Intelligence conferences in general (e.g., AAAI) also have good content on NLP and NLU, of course.

Acknowledgments

I’d like to thank Elizabeth Lingg, Aman Achpal, Yuancheng Tu, and other Microsoft colleagues for many of the references and discussions I have leveraged in this text.

Paulo Salem is on LinkedIn and paulosalem.com.

If you haven’t already, be sure to check out Part 1 of this two-part article series:

Natural Language Understanding: What’s the purpose of meaning? (Part 1 of 2)

Let me begin with my favorite memory on this topic — not a paper, but a cartoon. I recall to this day when, many years…

medium.com