Noam Chomsky, Linguistics and AI

Rebecca Wicker
6 min readJan 10, 2017


After my first Future Technologies & Innovation meeting at Pearson, and the issues it raised regarding AI, I was led to The Atlantic’s 2012 article on Noam Chomsky ‘On Where Artificial Intelligence Went Wrong’. It is quite dense and covers a variety of topics from AI and linguistics to the evolution of language and biological systems, so thought I would write up a sort of summary of the AI and linguistics conversation, as well as my asides — I will try and keep it brief. Also, bear in mind, I am coming at this from a linguistic background and not from a technology point of view. In the spirit of this article, I believe it is good to embrace and learn from all the disciplines rather than just thinking of your own, so I hope to provide a different insight into something that is seen as otherwise technical and mathematical — and I’m very welcome to comments about anything I have written, as I do not claim to be an expert in AI either.

The origin of language is a hotly debated topic in linguistics. Dominated by Noam Chomsky, his ‘Universal Grammar’ is the little black box where he thinks human being’s innate ability ‘to language’ comes from. He believes there’s a part of our brain where all our language is stored, from birth, and it unveils itself bit by bit as we grow older. There are language structures that we learn, little by little, that grow in complex structures until we’re finally on the language bike, no stabilisers, no one holding the bike steady, and maybe even ‘look Mum, no hands!’. This is far from how we originally thought we acquired language: mimicking (from the Behaviourist theory of monkey-see-monkey-do).

So where does this fit into AI? According to Chomsky, we’re producing AI in a behaviourist way, not in a Universal Grammar way.

“For Chomsky, the ‘new AI’ — focused on using statistical learning techniques to better mine and predict data — is likely to yield general principles about the nature of intelligent beings or about cognition.”

Chomsky’s quote above made me think of Alexa and Google Home; my household is now the proud owner of both (ah, the Holidays). When I ask something too complicated, outside her ‘stream of consciousness’, I get an “I am unable to answer that” — is it because I haven’t given her what she needs, or that she hasn’t been programmed to ‘think’ of that? Is she only able to mimic what I ask because she’s repeating something she’s done before? It’s the “I’m always learning” message I get from Crowdfire. Crowdfire can take all of my Twitter data and tell me what I should do based on what it was told was good to do, but it can’t do anything outside of its input thus only being able to directly mimic from input = behaviourism. And that’s not ‘intelligence’ to Chomsky, it’s just parroting.

Furthermore, Chomsky basically believes that language is too complex for us to understand and that neuroscience has been on the wrong track for the past couple of hundred years. Yikes. He thinks we’re maybe “looking in the wrong place” which, I think, is a valid notion when it comes to research and experiments. I remember this being explained, very simply, as poor construct validity. If, for example, you gave a fifth grade class a maths test in English in the Bronx — you may intend on testing their maths skills, but on some level (depending on each student) you’re also testing their language proficiency — if they don’t fully understand the language in which the question is written, is it because they don’t have the maths skills or the language skills? So when you’re testing their maths skills and they don’t answer, or ‘guess’ and get it wrong, you could distinctively say that their maths skills are low while not taking into account the language variable. Can we ever take into account every variable that changes, though? Is Monday always awful, or does the snow and ice in NYC and gloomy faces on the subway make it that way?

Chomsky likens this to getting rid of a physics department and replacing it with endless numbers of video tapes of what’s happening outside the video. After shoving all the data in a ginormous machine capable to analysing gargantuan volumes of data, you should expect to get a prediction of what’s happening next — you may even get a better prediction than what the physics department will ever give. At this point, success is defined as “getting a fair approximation to a mass of chaotic unanalysed data…but you won’t get the kind of understanding that the sciences have always been aimed at — what you’ll get is an approximation of what’s happening”. He says this is like analysing the weather — you can guess based on probability, statistics and assumptions, but in the end, that is not what meteorologists do: they get to the bottom of how it works.

He doesn’t disregard statistics or probability but he asks a valid question:

If you can use it, fine. But the question is what are you using it for? Is there any point in understanding noisy data? Is there some point to understanding what’s going on outside the window [of the video]?

It’s true that we do not acquire a language by sifting through data — there is trial and error, self-correction, inferences etc. and so why are we trying to create AI in the same way? By churning away at the numbers game and never-ending data analysis, according to Chomsky, we’re never going to get closer to understanding how language works. But is that ok? As long as AI can give us decent output, do we need them to sound human?

It almost echos what Larry Berger spoke about during NY EdTech Week about how we have SO much data and don’t completely understand what to do with it because we haven’t advanced as much as we’d like to. I felt like his ideas were very much ‘bursting the bubble’ of the tech enthusiast that thinks tech can solve all the world’s problem — possibly, but not yet. Essentially, “you can easily be misled by experiments that seem to work because you don’t know enough about what you’re looking for.”

You’re only as good as the company you keep, and Chomsky is no exception. His other friend, Dr David Marr defined a general framework for studying complex biological systems (see above) that Chomsky is a big fan of. With this in mind, Chomsky debates whether language even has an algorithm since language is acquired and produced so arbitrarily, and that by using statistical inference algorithms we’re still not getting to the crux of how/where language works and thus dealing with statistical fluff that isn’t so reliable.

If you get more and more data, and better and better statistics, you can get a better and better approximation to some immense corpus of text, like everything The Wall Street Journal archives — but you learn nothing about language.

Chomsky’s argument, using Marr’s framework, is that there is no algorithm for the system of language itself. There are for carrying out the processes of the cognitive system but that’s as far as it goes — language is comparable to biological systems, like the immune system, it’s vastly complex and can’t be inferred from statistics and data alone. Simply put, you can’t reduce the brain processes of language to a computer.

As a linguist, I’ve had a love-hate relationship with Noam, especially since he came up a lot in my Political Science BA, to me he was a jack-of-all-trades (but brilliant at all). I’ve never totally understood how the black box came to be…you can’t find it, describe it or know how it works — like someone magically dropped it in your brain. I do, however, like the fact that he (inevitably) talks about Turing, pulling on my nostalgic heartstrings since Alan Turing went to my school and The Imitation Game was filmed in the village (and school) I grew up in (limestone arches in all their glory). Turing, to Chomsky, was able to find the simplest of Marr’s formula — the computational aspect of language, such as ‘read’ and ‘write’ — but “you’ve got to start by looking at what’s there and what’s working and you see that from Marr’s highest level [implementational]”. If, indeed, AI can never be ‘intelligent’ without knowing the inner workings and mystery of Universal Grammar and the processes of language, has AI ‘gone wrong’?



Rebecca Wicker

Software Product Manager | Linguistic Consultant | Research: Linguistic Identity in SL | Editor | 🇱🇰🇬🇧🇺🇸 #pearson #edtech #esl #linguistics