Language & Cognition: re-reading Jerry Fodor

Walid Saba, PhD
ONTOLOGIK
Published in
9 min readNov 18, 2020

--

In my opinion the late Jerry Fodor was one of the most brilliant cognitive scientists (that I knew of), if you wanted to have a deep understanding of the major issues in cognition and the plausibility/implausibility of various cognitive architectures. Very few had the technical breadth and depth in tackling some of the biggest questions concerning the mind, language, computation, the nature of concepts, innateness, ontology, etc. The other day I felt like re-reading his Concepts — Where Cognitive Science Went Wrong (I read this small monograph at least 10 times before, and I must say that I still do not comprehend everything that’s in it fully).

But, what did happen in the 11th reading of Concepts is this: I now have a new and deeper understanding of his Productivity, Systematicity and Compositionality arguments that should clearly put an end to any talk of connectionist architectures being a serious architecture for cognition — by ‘connectionist architectures’ I roughly mean also modern day ‘deep neural networks’ (DNNs) that are essentially, if we strip out the advances in compute power, the same models that were the target of Fodor’s onslaught. I have always understood the ‘gist’ of his argument, but I believe I now have a deeper understanding — and, in the process I am now more than I have ever been before, convinced that DNNs cannot be considered as serious models for high-level cognitive tasks (planning, reasoning, language understanding, problem solving, etc.) beyond being statistical pattern recognizers (although very good ones at that).

Productivity, Systematicity, and Symbolic Systems: The Turing Story

Although Fodor presents three arguments, namely Productivity, Systematicity and Compositionality, that he argues they collectively preclude DNNs from being considered a serious architecture for cognition, the three phenomena are just three manifestations of the same phenomenon which I like to summarize as follows:

What we certainly know about language and the mind is that thoughts (and subsequently the languages we use to express our thoughts) are productive, systematic, and compositional. Moreover, to have one is to have the others. And, to have these, the cognitive architecture must admit computation over symbolic structures. DNNs do not admit computation over symbolic structures, and so they fail to capture any of these, and thus these models cannot explain how our minds can have thoughts and how we can express these thoughts in our natural languages.

Of course, before we put the nails on the DNN coffin, two things must be established:

  1. thoughts (and subsequently the languages we use to express our thoughts) are in fact productive, systematic, and compositional
  2. DNNs cannot account for productivity, systematicity, and compositionality.

If (1) and (2) are established — as I believe Fodor (and at times with Zenon Pylyshyn, and others with Ernest (Ernie) LePore) in fact did back in the late 1980’s — then any talk of DNN-based AI is just that, talk. I happened to believe that his fatal criticism back then was never refuted, and moreover, it still applies because essentially modern day DNNs are just the connectionist models introduced in the early 1980’s but with more hidden layers (thus the ‘deep’) and amazing computing power to be able to build networks with billions of parameters. Paradigmatically, however, DNNs are the same models Fodor put to size over three decades ago.

Language is Systematic, Otherwise not Learnable

Ironically, and unfortunate for the DDN community, the systematicity argument in natural languages (and the thoughts they express) is a must if we want to accept that language is learnable, which I assume proponents of DNN not only accept, but advocate. Languages are learnable precisely because they are systematic and vice versa. That is, without systematicity, language learnability cannot be explained. Let’s see how this story goes.

Languages are learnable precisely because they are systematic, and vice versa.

Fodor argues that one cannot understand a simple sentence like ‘John loves Mary’ (or, equivalently, one cannot have the thought JOHN LOVES MARY) without understanding the sentence ‘Mary loves John’. If that were not the case, then a child understanding ‘John loves Mary’ should not make us secure that she now understands any sentence of the form person-loves-person. In other words, if language/thought was not systematic, then understanding ‘Mary loves John’ is atomic and does entail that the child now also understands ‘Steve loves Sara’, say. But that, ironically, makes language learnability impossible. Just imagine how many sentences a child needs to hear, if each one is learned atomically, and without any systematicity. We did not yet mention richer variations like ‘John loves the teenage girl living next door’ which is still part of the same template, since the teenage girl living next door ‘is a’ person. The argument Fodor is making — and even by giving a lot of leeway for connectionists to save themselves, is this: learning a simple structure like [person-loves-person] must be systematic, otherwise learnability cannot be explained. The irony here is that Fodor is explaining to the ML folks that to save your ‘learnability’ thesis — i.e., to even have a chance to argue that language is learned, you must admit systematicity, otherwise you have to give a child a couple of million years for the child to learn how to make simple subject-verb-object sentences!

The maverick Fodor goes on then to the knockout by showing the contradiction: (i) any plausible cognitive architecture must admit systematicity, otherwise language learnability cannot happen; and/but (ii) DNNs cannot account for systematicity!

That, is genius.

Before we argue why DNNs cannot account for systematicity, let us explain what systematicity requires. Systematicity cannot be denied (and indeed even connectionists admit that anyone who understands what ‘John loves Mary’ means must understand what ‘Mary loves John’ means). But how does that (i.e., systematicity) happen? This happens because ‘John loves Mary’ has a syntactic structure — that is, the sentence is not one composite blob, but a structure with constituents (think variables!) — variables that can take on ‘similar kinds/types of things’. But for this ‘productive’ change to happen, we must have access to the constituents, in short: we must have access to the variable components:

(s ((np Noun) (vp (vb BinaryRel) (np Noun))))

The indefinite and systematic replacement of variables in the above structure is what gives rise to the Productivity property: the sentences ‘John loves Mary’ and ‘Mary loves David’ are obtained by systematic instantiation of the above template and that allows to produce/generate countless sentences such that, when one is understood (or say, learned) the child has learned to make (or understand) an infinite number of similar thoughts. But this also gives rise to Compositionality. How? Well, since ‘John loves Mary’ has a different meaning from ‘John loves Sandy’, and since both have different meanings from ‘John knows Sara’ then clearly the meaning of these sentences is a function of the meanings of their components (or their constituents) — and that, roughly, is what compositionality is.

So where are we now. Well: language/thought is systematic, otherwise learnability cannot happen and it cannot be explained. For systematicity to occur the system must be productive, and compositionality is the only explanation of how we come to understand a potentially infinite number of thoughts from a handful of examples. Actually, as many researchers have also argued, nothing but compositionality can explain how we understand utterances at the speed of speech: the only explanation is that we are using compositional rules that we immediately apply in similar situations.

But for compositionality to work (to compute the meaning of a sentence from the meanings of it’s constituents and the way they are put together, we have to admit symbolic structures that contain variables/place holders. That much is certain — at least for anyone who is a serious student of language and computability: that is, language productivity and the real-time computational constraints that language understanding requires. Now if these facts about language and thoughts can be explained by a cognitive architecture other than the one Turing put forth — namely computing over symbolic structures, then by all means. Although connectionists believed they have an alternative architecture to explain these facts we know about thoughts and language, Fodor’s detailed analysis of their story proved, and unequivocally, otherwise.

How do DNNs Fail the Systematicity Test?

In a DNN one can indeed compose the meanings of ‘John’, ‘loves’ and ‘Mary’ into one meaning — in fact, that’s what most people doing ‘NLP’ in the modern day incarnations of DNNs do — some vector composition (sum, or average, or a weighted average, etc.) as shown in figure 1 below.

Figure 1. A small network representing ‘John loves Mary’

But that is not really compositionality, since ‘John loves Mary’ in the above network is just a label for us, and the node itself does not represent the components — in fact, the output of that node cannot be decomposed into its constituents — the possible decompositions of a tensor into possible components in undecidable, much like the decomposition of the scalar tensor 15 into components is also undecidable (it could be the result of 8 + 7, or 3 * 5, or (9–4) * 3, etc.) Now is that the only sentence the node labelled with ‘John loves Mary’ represents? Well, not quite. While it cannot really represent any other sentences, it can at least approximate the meaning of similar sentences like ‘John loves Sandy’ and ‘John likes Mary’, etc. using, perhaps, the fact that the vectors representing ‘Sandy’ and ‘Mary’ and those of ‘loves’ and ‘likes’ are very similar (by cosine similarity, say). But that does not explain where all of these would go, that clearly would not even be approximately similar:

A student in the 6th grade loves Mary
A student in the 6th grade loves a girl that plays on the soccer team
John, who is in the 6th grade loves Mary, who is a good soccer player
etc.

A child knows that all of the above are part of the template ‘Noun loves Noun’ because the child has learned (elsewhere) that like John, ‘A student in the 6th grade’ is also a Noun, and it then substitute that for the variable Noun. But none of that can happen in a DNN. In fact, in a DNN, every one of the above would require its own node. But how many nodes will our network have? It seems that Fodor has put DNNs in their place!

Again, to refuse the systematicity, is to question language learnability because it cannot be that we learn these sentences atomically — one by one (not enough time in our lifetime just for this simple pattern). But since the only knowledge in a DNN is in (the weights of) the nodes, where would that potential infinity of different syntactic structures go in a DNN? The answer is no where — a DNN cannot represent the infinite productive machinery of our thoughts and the sentences we make to express these thoughts. Recursive definitions over symbolic (variable) structures — that allows us to finitely represent infinite possibilities, has no place in a DNN. The only alternative for DNN is to refute systematicity, but then they would need to explain how a 4-year old comes to have a node for an infinite number of thoughts their mind can produce. (incidentally, this why Chomsky said once “the notion of “probability of a sentence” is an entirely useless one, under any known interpretation of this term” — although, unfortunately, his comment was not properly understood)

In layman terms, nodes in a neural network are atomic — thus a node N1 has no knowledge of (or no access to) the constituents/components that produced its output, and thus those constituents/components contributing to some node N2 are not relevant to N1. There are no variables that can be bound to or referenced in different nodes. Thus DNNs are stuck with an impossible mission: to account for the infinite productivity of thoughts, they would require an infinite number of nodes and parameters.

Incidentally, systematicity is not only needed for learnability — although that is the killer requirement, since otherwise nothing explains how language is learned — but systematicity is required to also explain the inferential aspects of language: if we hear ‘John and Mary went to the store’ then clearly ‘Mary went to the store’ on its own is implied— that is, that a structure like A & B must be breakable into constituents so that I can get from that B. And, again, the value at a node in a DNN that composes all inputs is not breakable into its constituents!

In conclusion, what Fodor has proved over three decades ago is that DNNs are trying to sell two stories that are contradictory: (i) language is learned; and (ii) language is learned without systematicity and symbolic structures. But these two claims cannot be reconciled: if you accept learnability, you have to admit systematicity and compositionality, and if you admit the latter, you must admit symbolic structures and variables, and DNNs have not been able to — and cannot — show how they can do this.

Jerry Fodor has caused me so many sleepless nights. In revenge, I hope this short post will also cause some of you many a sleepless nights :) Of course, I also hope that, like me, you will in the end realize that struggling with Fodor is worthwhile the effort.

______
https://medium.com/ontologik

--

--