The Missing Text Phenomenon, Again: the case of Compound Nominals

Walid Saba, PhD
ONTOLOGIK
Published in
5 min readApr 28, 2021

(Last updated April 28, 2021)

Note: The literature on ‘compound nominals’ is immense, and you will find the same phenomenon discussed under the label ‘compound nominals’ or ‘nominal compounds’ — so, I will use these terms interchangeably.

What are Nominal Compounds and why do they Matter?

In simple words, a nominal compound (henceforth, NC) is 0 or more adjectives followed by 1 or more nouns. You can think of it as some subject or topic of discussion (semantically, an entity) that can fill some ‘slot’ in a larger discourse (subject, agent, location, theme, etc.) So while I can discuss a certain ‘system’ I can also discuss a ‘computer system’ and I can further discuss a ‘distributed computer system’, and even an ‘advanced distributed computer system’, etc. So I can always qualify the head noun by using other nouns and adjectives. (I’m not sure if there’s any psycholinguistic research that did some experiments on our tolerance regarding the length of the nominal compound; personally, 4 is my limit and so something as long as ‘antique home furniture store’ is just about all I can tolerate. This is manageable: we’re talking about a store, and more specifically a furniture store, and even more specifically about a home furniture store, and finally about an antique home furniture store. That I can do. But longer than that and I start wishing there is something like a semantic calculator.

Now why are nominal compounds important and why did they receive that much attention in (classic, old school) computational linguistics? Well, mostly because they reflect the tight relationship of ordinary spoken language with human knowledge and our cognitive capacities. Specifically, when we use a nominal compound we assume the listener/reader can ‘discover’ the hidden relationship between all the adjectives and nouns in the sequence. So, when I refer to a nuclear engineer you will all know I am referring to an engineer that specializes in nuclear energy, and when I refer to a nuclear plant you will all know I am referring to a plant that produces nuclear energy. So, ‘produces’ and ‘specializes in’ are hidden relations that are assumed to be easily recovered by the listener/reader. Similarly,

Can you imagine the combinations? And that is just using 2 components of the sequence. Besides discovering the missing (hidden) relations, nominal compounds with three or more concepts have another difficulty: which modifications apply to which concept? Consider this:

Everyone knows that an ‘eastern philosophy professor’ does not refer to a philosophy professor who is eastern, but to a professor that teaches eastern philosophy, and so ‘eastern’ modifies ‘philosophy’, while in ‘amazing philosophy professor’ we are not talking about a professor that teaches ‘amazing philosophy’ but to an amazing professor who teaches philosophy. Things can get even more interesting when we consider what different adjectives do to the head noun. Consider this:

So an ‘expensive gun’ is a gun that is expensive (and a ‘large gun’ is a gun that is large, and so on). Simple. But what is a ‘fake gun’ — a gun that is not a gun? Our computers will short-circuit! Lieutenant Data of the StarTrek Enterprise will not ‘comprehend’ this. (also, a ‘toy piano’ is a ‘piano that is not really a piano’) And what about ‘fake passport’? Is it like a ‘counterfeit passport’ ? A ‘fake passport’ is in some ways an actual passport because it may serve the purpose of a real passport, while a ‘fake gun’ is not at all a gun because it does not function like a gun. So here we have the same adjective behaving differently depending on the type of the head noun.

Want things to be even more interesting? Well, here’s one more:

So ‘an old dancer’ refers to a ‘dancer who is old’ but a ‘beautiful dancer’ might refer to a ‘dancer who is beautiful’ or to a ‘dancer who’s dancing is beautiful’. And what if we combine the two (Olga is a beautiful old dancer)?

So what about the Missing Text Phenomenon?

All of the above discussion about nominal compounds was to highlight the main reason why natural language understanding is challenging (for machines). Namely, the challenges are due to the fact that almost all of our linguistic communication is compressed — we leave out too many details that we can safely assume the listener/reader knows how to recover by virtue of our common knowledge of the commonsense world. Almost all of our linguistic communication leaves out, and for efficient communication, what we believe should not (and need not) be sent in our messages. But at the same time, and since we want our message to be decoded properly, we only leave out the information that we can safely assume is available to (and recoverable by) the listener/reader.

This ‘missing text phenomenon’ (MTP) manifests itself in many challenging problems in language understanding. Here I detailed this phenomenon as it manifests itself in what we call nominal compounds.

Last Words

I have written before on this blog a few posts that touch on the MTP (Missing Text Phenomenon) and I have also presented a paper on this subject (albeit in the context of suggesting a generalization of the Winograd Schema Challenge). What I wanted to convey in this short article is that most of what we say in ordinary language communication is just a hint to the actual thoughts we are trying to convey and most of what we want to send in our linguistic messages is missing (or hidden)—and we humans can do this because we all know what we all know, but machines, unfortunately don’t know what we all know and that is why NLU is difficult. Thus:

trying to do language understanding just by processing the text
itself is equivalent to looking for something that is not there!

___
ONTOLOGIK — Medium

--

--