Why Commonsense Knowledge is not (and cannot be) Learned

Walid Saba, PhD
ONTOLOGIK
Published in
5 min readAug 28, 2022

(last edited August 29, 2022)

Commonsense (background) knowledge, at least the kind of knowledge that we fetch and rely upon in the process of language understanding: (i) cannot be learned by processing vast amounts of text because that knowledge is never explicitly stated in the text — and you cannot find what’s not there; and (ii) that background knowledge cannot be learned perceptually from observation since the vast amount of the crucial background knowledge is universal, and is not probablistic nor approximate, and so it cannot be susceptible to an incremental process of individual observations. The shared background knowledge needed in the process of language understanding is the kind of knowledge that obeys and respects the laws of nature and as such it has to be codified. In fact, that knowledge must be codified in a symbolic system that quantifies over variables of specific ontological types.

Commonsense Knowledge is not Learned 1 — It’s not in the Text

There’s a consensus among researchers investigating the neurological, psychological and evolutionary aspects of human linguistic communication that languages have evolved according to the information-theoretic principle of least effort. Specifically, it has been established that interacting communicative agents tend to produce utterances that minimize the complexity of coding a thought (by the speaker) as well as minimize the process of decoding linguistic utterances by the listener back to the intended thought [1] and in the process finding an optimal point where the effort of both speaker and listener is minimal.

In the process of minimizing both the speaker and listener efforts an optimization process has evolved that resulted in some form of language compression by introducing ambiguity to the language where the speaker does not have to spell out everything in detail but leaves out information that that the speaker can safely assume to be available for the listener. In this framework, efficient linguistic communication happens when the shared background knowledge between the speaker and listener is large [2]. What this means is that ‘shared background knowledge’ is hardly — if ever — stated in linguistic communication, if for nothing else but for the reason that, since it is available for (and shared by) speaker and listener, transmitting any of that information is a wasted effort. But while ambiguity, pervasive in natural language, has been shown to be necessary for efficient compression, it has proven to be difficult to handle by machines, and precisely because machines do not know what that ‘shared background knowledge’ is. Since that shared background knowledge is not in the text it is perplexing to find some highly visible research efforts suggesting that they are trying to extract this background (commonsense) knowledge from text (see for example, [3]).

To show why this type of background (commonsense) knowledge is not the text, consider the sentences in (1) and (2):

(1) Jon has a Greek statue in every room in his house
(2) Jon broke his leg in a car accident his family had near Toronto.

A 4-year old knows that ‘a Greek statue’ in (1) refers not to one, but to many statues since one cannot have a Greek statue (a physical artifact) in more than one location — in NLU lingo, the challenge in (1) is usally in resolving quantifier scope ambiguities. Similarly, a 4-year old knows where Jon broke his leg upon hearing (2), since the location of a sub-event is always the location of the main/larger event. This can be expressed as shown in figure 1 below — the challenge in (2) is in knowing the commonsense logic of events, a logic that will never be explicitly stated in the text.

Figure 1. (a) Background knowledge needed to properly understand the sentence in (1); and (b) background knowledge needed to properly understand the sentence in (2). Note that a symbolic logic that quantifies over variables of specific ontological type is needed to express and represent this background knowledge since these facts and rules can be slightly different with objects of different ontological categories.

To sum-up: trying to capture this type of shared background knowledge by ingesting large volumes of text is like looking for something that is not even there since — for effective communication — this type of background knowledge is never explicitly stated in linguistic communication. Incidentally, below is the result of testing sentence (1) on so-called largae language models that try to understand language by trying to find some meaningful statistical correlations in vast amounts of text.

We will not mention the name of the system we tested, but it is one of those that has spent lots of time, lots of money and lots of effort on statitical and machine learning technqiues in a futile attempt at understanding ordinary spoken language by memorizing patterns found in vast amounts of text.

Commonsense Knowledge is not Learned 2 — It’s not Perceptable

There are others who suggest that most of our knowledge can be ‘learned’ by perceptually observing the world and gradually building a model of it. The assumption is that what we call shared ‘background knowledge’ is also statistically and perceptually learned by observation (this is the opnion suggested in [4], for example). Unfortunately, this is also a faulty thesis since the vast amount of ‘background knowledge’ that is implicitly assumed in tasks that require high-level reasoning (e.g., planning and language understanding) cannot be approximate and cannot be learned differently by different individuals and thus it cannot be susceptible to individual observations.

In linguistic communication, and for efficient compression to minimize the effort exerted by speaker and listener, the amount of information that is left out and not included in messages encoded by the speaker must be the same background knowledge that the speaker can assume the listener has in order to decode and in the process successfully disambiguate the message. This is done by relying on quite a bit of shared background knowledge, and thus the entire process would fail if the shared background knowledge was not the same.

Since that background knowledge (such as the ones shown in figure 1 above) are not ‘approximate’ and cannot be different, they cannot thus be individually and perceptually learned — learnability theory itself precludes learning these universally valid cognitive templates — unless, of course, cognitive agents have an infinite amount of time.

Final Word

An important question now (assuming the reader has accepted the above argument) is this: what is the nature of this ‘shared background knowledge’, how vast is it, and how could it be codified. I shall return to this question in future posts.

References

  1. Fortuny, J., Corominas-Murtra, B. (2013), On the origin of ambiguity in efficient communication, Journal of Logic, Language and Information volume 22, pages 249–267. (available on the arxiv here)
  2. Bao, J., et. al. (2011), Towards a theory of semantic communication, 2011 In IEEE Network Science Workshop.
  3. One Man’s Dream of Fusing A.I. With Common Sense, New York Times, August 28, 2022 (here)
  4. Browning, Jacob and LeCun, Yann (2022), AI And The Limits Of Language, NOEMA, August 23, 2022.

__
https://medium.com/ontologik

--

--