Semantics, Ambiguity, and the role of Probability in NLU

Walid Saba, PhD
ONTOLOGIK
Published in
6 min readNov 28, 2020

The purpose of this post is to hopefully dispel some misconceptions about the notion of ‘ambiguity’ and to explain (upon request!) a claim I often make that NLU — and thus understanding — is a binary decision — either we understood the meaning of some utterance (or, equivalently, the thought being conveyed by that utterance) or we did not and there is no in between.

Ambiguity, Understanding, and Probability

AMBIGUITY is not a sin but a beautiful and genius human invention that, for effective communication, allowed us to ‘compress’ the thoughts we intend to convey in as little text as possible. Of course, since the whole purpose of communication is to convey our thoughts (in the linguistic messages we exchange), we also want to make sure that the ambiguity we are using can be systematically and also effectively decoded by the receiver of the message. So ambiguity in language is not some illness that is wreaking havoc on our NLU work, but should be thought of as a genius human ‘encoding scheme’.

UNDERSTANDING is the process of ‘decompressing’ our linguistic utterances and ‘uncovering’ all the missing text — we call this process disambiguation, but in a sense it is about recovering all the other information that are implicit in the message so that we get at the intended meaning — or so we get at the thought behind the message (elsewhere I discussed MTP — the missing text phenomenon — in some detail).

PROBABILITY has no place in the understanding process, since probability is meaningful only when we assign it to ‘possible’ outcomes. Unfortunately, this notion has led to some confusion since probability has often been equated with uncertainty. For example, before the 5th century BC we were uncertain as to whether the earth is flat or spherical. Much like we are, before we throw a die, uncertain as to what the outcome will be. And the seeming ‘uncertainty’ in both cases confuse some to think that speaking of the probability in both cases is meaningful. But that is not true. It is meaningful to speak of the probability of a ‘6’ showing up when we roll a die because ‘6’ is a possible outcome. But speaking of the probability of the earth being flat (or not) is meaningless, because the possible outcomes are 1 — the earth is either flat or it is not. The decision is binary. So all we can say is that the outcome is (still) unknown, but no possible outcome has any meaningful probability: Pr(‘earth is flat’), much like Pr(2 * 10=30), was 0, is 0, and will always be 0 — whether we knew at some point the final verdict or not.

Ambiguity — just a genius encoding scheme

Now let us consider various types of ‘ambiguities’ in our linguistic communication. Consider the following (very common) sentences:

(1) Sara likes to play bridge
(2) Sara has a Greek statue in every corner of her apartment
(3) Sara loves to eat pizza with her kids (with pineapple)
(4) Sara enjoyed the movie
(5) The White House criticized the recent remarks made by Beijing

I claim that every linguistically competent person (LCP) knows that ‘bridge’ in (1) refers to some game and not to the physical structure we build above water or valleys. Similarly, every LCP knows that in (2) we are not talking about ‘a Greek statue’ but several Greek statues and thus we all read (2) as ‘in every corner of her apartment, Sara has a Greek statue’ — i.e., we reverse the scope of quantifiers since we all know we cannot have the same statue in every corner of an apartment. Every LCP knows that in (3) the prepositional phrase ‘with her kids’ attaches to (modifies) Sara’s eating (event) and not the pizza (since unlike pineapple, Sara will not use her kids as a pizza topping). In (4) every LCP knows that what Sara enjoyed is watching some movie (and not making it, producing it, directing it, etc.) and in (5) we all know that the White House is an indirect reference to the president and people in his administration and not to the building, since buildings do not criticize, and similarly Beijing is an indirect reference to the leaders of China, and not to the city, since cities do not make remarks.

What is the point here? The point is this: while all of the sentences in (1) through (5) have several ‘potential’ meanings, the thought being conveyed by each is one — and that one meaning is available to us because we have the background knowledge to ‘decode’ the message and discover the intended thought being conveyed behind each. So while ambiguity was used as genius ‘encoding scheme’ — we have a genius algorithm that we use to decode (decompress) the message using our background knowledge. So if you like to speak of ambiguity in natural languages, remember that it is a problem for machines only, not us humans. And it is a problem for machines because, unlike us, machines do not have access to that missing information (we call ‘common background knowledge’) — information that we leave out and safely assume is available for all of us.

Probability

Noam Chomsky was famously quoted as saying that speaking of the probability of a sentence is a meaningless notion. I like to extend this and say speaking of the probability of specific meaning of a sentence is also equally meaningless. While a sentence could potentially have several meanings, behind every uttered sentence is a single thought being conveyed.

As we argued above, probability is meaningful only when we assign it to possible outcomes — so, since a ‘6’ showing up is a possible outcome when we roll a die, speaking of the probability of a ‘6’ showing up is meaningful. But since the earth being flat is not a possible outcome, speaking of the probability of the earth being flat, even before we knew the final outcome, is also meaningless. The same applies to language: since the intended meaning behind every utterance is already decided, and all we have to do is decode the message using our background knowledge (in some smart algorithm), assigning a probability to all other possible meanings is… you guessed it, a meaningless exercise (apologies to the Peter Norvig’s out there).

I guess that is the reason I go out of my way to differentiate between NLP and NLU — while various NLP/text processing tasks can use probability (correlations, etc.), and similarity (using embeddings, or whatever), language understanding is a different task: it is a systematic decoding process using background knowledge to discover the one and only thought being conveyed by a linguistic message.

In conclusion, ambiguity was not invented to create uncertainty — it was invented as a genius compression technique for effective communication. And it works like magic, because on the receiving end of the message, there is a genius decoding and decompression technique/algorithm to uncover all that was not said to get at the intended thought behind the message. Now we know very well how we compress our thoughts into a message using a genius encoding scheme, let us now concentrate on finding that genius decoding scheme — a task that we all call now ‘natural language understanding’.

___
https://medium.com/ontologik

--

--