ONTOLOGIK
Published in

ONTOLOGIK

Compositionality: the curse of connectionist AI

A while back a 2-day online workshop on compositionality and AI was organized by Gary Marcus and Raphael Milliere, with additional contributions form Allyson Ettinger and Paul Smolensky (one of the pioneers of tensor algebra in NNs). As often, and although compositionality touches on many subjects in AI, natural language was the focus of the discussion, and rightly so, as compositionality is best understood in the context of (natural) language — although it equally applies to formal (e.g., programming) languages, among other formalisms.

The Inverse of the Compositionality Function

As illuminating as the discussion was I found it to be lacking in a very critical aspect, namely in appreciating what compositionality entails, not only in semantic theories, but also the crucial aspect of compositionality as it relates to cognitive architecture, to learnability, and to computability and thus to the evolution of human thought. Typically, and this was also reiterated in the workshop, compositionality is often reduced to the well-known cliche (that has been rephrased in so many ways ever since Frege):

The Compositionality Principal: the meaning of a complex expression is a function of the meanings of the constituents and the way they are combined.

While true, this on it’s own is (almost) a vacuous statement, as Wlodek Zadrozny [1] formally argued over two decades ago. Indeed, almost any semantic formalism is compositional in that sense, including transformers in deep neural networks, since at every node the output is generated as ‘some’ function of all the constituent inputs. The point here is that the importance of compositionality — which is a function — is in its inverse! A formalism can capture the important computational and cognitive aspects of compositionality not if it can compose an output out of the values of constituents, but if the decomposition can be maintained and recovered!

Let us take a simple string like SANTANIA. This string could be obtained using a string append function as follows:

append(‘SAN’, ‘TANIA’)
append(‘S’, append(‘ANTA’, ‘NIA’))
append(‘’, ‘SANTANIA’)
etc.

Similarly, 24 could be the result of many (infinite!) possible arithematic operations:

12 * (8 / 4)
20 + 4
16 + (2 * 4)
etc.

The point here is this: compositionality is important as a high-level semantic procedure to obtain the meaning (value) of some composite as a function of the meanings (values) of the constituents, but more importantly, compositionality is not of that great value if our formalism does not support maintaining and recovering the constituents, because otherwise the decomposition is undecidable (there are infinite number of ways to decompose a value). Now connectionist models — in all its variations, including modern day deep neural networks (DNNs), do not preserve the composition since once tensors are composed, their decomposition is undecidable. In short, and as Fodor & Pylyshyn [2] showed over three decades ago, NNs do not maintain the syntactic structure, and as such, they cannot model compositionality.

Why is that important? I will first discuss why that is important from the standpoint of language and language learnability. Finally, I will discuss why compositionality (whose inverse cannot be modelled by NNs) is the curse of connectionism as a serious model for truly intelligent systems.

Why is Compositionality Important?

Consider the template of a simple sentence below:

The above is a template for (potentially) an infinite number of sentences — some of which can be quite complex. Here are some examples:

John LOVES to play guitar
Mary LOVES Carlos
Dave’s sister Linda
LOVES the boy next door
Almost every person LOVES attention
Everyone in our school
LOVES Ms. McDonald
etc.

Note that the “entity” a person may love is object of a very generic type, that is, what a person may love could be almost anything — it could be another person, an activity, a status, a property, etc. Note also that the person (and the entity) can be made up of a very complex syntactic expression that — in the end, will “converge” to an object of that type. How does that happen? How does it happen that a child knows that the simple reference in (1) and the complex reference in (2)

(1) John
(2) the boy next door that always wears an AC/DC t-shirt

both refer to objects of the same type, namely to person and thus they can occupy the same slot? The question is not only how does a child know that (1) and (2) both refer to a person, but how does a child workout this algebraic composition, in real-time, so that they can generate a meaningful sentence (or, inversely, understand one in real-time) since it fits the general template [person LOVES entity]. (Incidentally, that was the genius of Richard Montague who developed an algebra that formally shows how such infinite compositions can happen — but the genius of Richard Montague’s semantics would be left to another time and place.)

Research has shown that without compositionality language learnability could not be explained since computationally there is no explanation as to how we workout the filling of constituents in the right slot in real-time, without having these compositional structures that are decomposable. They must be decomposable since we need to verify that a certain value can sensibly occupy a certain slot in the composite. All this happens in real-time (when hearing an utterance, or when converting a thought to an utterance), and that is how a single template structure becomes productively the parent of an infinite number of thoughts. This was summarised by Peter Pagin [3] as follows:

Compositional semantics allows greater expressive power, simply because a systematic but non-compositional semantics of a language with the same expressive power would be intractable.

What the above implies is that language learnability and language acquisition could not happen without compositionality — there just is no other “technical” explanation of how we understand and convert thoughts to linguistic utterances in real-time without compositionality!

Having said that, the fact that we make up the meaning of the whole compositionally — i.e., as a function of the meanings of the constituents, is the simple and trivial part, it is the fact that we have access to the constituents — that is, that our composition is reversible (or that our formalism allows us to we maintain and thus recover the decomposition) is even more crucial. Connectionist architectures (including modern day NNs) cannot maintain the decomposition since tensor composition is not decidedly reversible. It can only preserve the composition if it admitted symbolic structures and variables that can be instantiated with values of different types. But if they admit that, then they become nothing but an implementation machine of symbolic systems. Either way, true compositionality is their curse. And without true compositionality we cannot speak of any thinking and reasoning.

Final Word

Talk of compositionality is music to my ears since some of the most penetrating minds have spent centuries studying this phenomenon — not only in relation to language, semantics and understanding but also to reasoning in general and, more importantly, as the only mechanism by which language learnability can happen in an effectively computable manner. Without compositionality (and a formalism that maintains and allows the recovery of the decomposition) we would have to make the ridiculous stipulation that we require infinite memory and an infinite amount of time, just to explain how we make (or understand) a single sentence in real time.

As a final remark, those that are mistakenly excited about so-called Large Language Models (LLMs) should ask this question: before we tackle natural language with this approach, why not show that this paradigm can handle simpler languages, say programming languages. There are millions of valid Python examples out there, so can LLMs ever learn to accept valid Python programs? The answer is no, and that is exactly due to “infinity” that compositionality can account for but NNs that do not admit symbolic structures to represent infinite objects, simply cannot.

References

  1. Zadrozny, W. (1998), Is Compositionality Formally Vacuous?, Linguistics and Philosophy, vol. 21, pp. 629–633
  2. Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1–2), pp. 3–71.
  3. Pagin, P. (2011), Compositionality, Complexity, and Evolution, PERILUS 2011, Symposium on Language Acquisition and Language Evolution The Royal Swedish Academy of Sciences and Stockholm University.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store