AI: Nope, the revolution is here and this time it is the real thing.

Stephen José Hanson
9 min readMay 16, 2018

--

About 30 years ago, I wrote down a version of this title in a book review attempting to make a sharp demarcation between the end of the “good old fashioned AI — GOFAI” and what the neural network revolution had already wrought by the 1980s and 1990s:

“The neural network revolution has happened. We are living in the aftermath.
The Sun is still shining. The Artificial Intelligence threat has not really amounted to much and neural nets are showing promise in both theory and practice.”

-S. J. Hanson and C. R. Olson. Neural networks and natural intelligence: Notes from Mudville. Connection Science, 3:332-335, 1991.

Many of us at the time, realized what was happening as we were in industrial research (Bell, IBM, XEROX Parc, etc..) where neural networks were starting to turn up everywhere solving all sorts of problems whether anyone liked it or not. Neural networks seemed exciting and somewhat magical (-“the fairy dust approach — just sprinkle hidden units” —A. Lapides), but the push-back was coming as normative engineering and symbolic AI approaches were in direct conflict with these new methods that had no obvious explanation on how they worked and what principles to use to create one that would work (much like now). At the time I was in Bellcore (and spending half my time at the Cognitive Science Lab in Princeton), part of the residue of Bell Labs, once ATT divested. The nexus of much of the Neural Network research was happening between Bell Labs and Bellcore and at a handful of academic centers (UCSD, Caltech, CMU, U. Toronto). Although my observation in 1991, was accurate, since clearly at that point in time something dramatic had happened, and symbolic AI as we had known it had collapsed, Machine learning was being overrun with neural networks solving problems that symbolic AI approaches either had anemic or zero results, but alas, the revolution was not to be stable. Nonetheless, many of the new leaders in Machine Learning at the time (Tom Mitchell, Tom Diettrich, et al) jumped ship in the late 80s & 90s starting to show up at NIPS (Neural Information Processing Systems) appropriating Back-propagation on their various ML problems. In 1991, I was the 4th program chair of NIPS, and noted the shift in attention and wrote thusly in the preface:

“At some point in these early days, we could say the ‘Decade of Neural Networks’ began. We are roughly halfway through the maturation process and although the size, diversity and scope of the meetings have transformed the field many times now., the basic belief that a network of simplified computing elements can have important consequences across a multidisciplinary landscape is still at the heart the the NIPS meeting in its 5th year.”

NIPS preface, 1992, — SJ Hanson, J. Cowan and L. Giles

So although the field was growing and applications were appearing, there was still a nagging feeling that the problems solved were often little more than demos. The scale of the problems were just beyond that which used to plauge GOAI — their so called “toy problems” — to what neural networks would be applied to as “hobby problems”. Clearly the shift from normative science to actual application is a difficult road fraught with steep gradients, false forks, and dead ends. Nonetheless, by the late 1990s prior to the collapse of the neural network field engineering application was becoming normative. I wrote a SIEMENS research article in house rag — “Siemens Worldwide Focus”, entitled “Neural Networks at Work” dicussing the various engineering applications (rolling steel mills, cell technology, cardio medical devices, transportation, motor monitor etc..) where neural networks in SIEMENS had made significant cost reduction contributions, at the same time IEEE declared Neural Networks had entered a normative engineering phase from their point of view. Nonetheless, Neural Network practice had actually failed unless it was in a smaller niche. It was clear that one hidden layer BP was unable to “scale” to larger and larger problems often just due to unrealistic training time —even without the incomprehensible “data-mining” sample sizes that now exist.

With this context, let me turn to Michael Jordan’s recent post on Medium. First, Jordan’s extremely interesting exegesis on the foundations of AI has some important and cogent points, but I think is fundamentally flawed. One specific claim I think is seemingly compelling, but alas, historically inaccurate, is the idea that John McCarthy was AI’s/NeuralNets intellectual flag, but AI’s/NeuralNets substance transitioned to the intellectural content due to the mathematician/statistician Norbert Wiener. This transition started with the idea that there was initially a science goal of defining and creating Artificial intelligence and as it became sucsessful it slowly morphed to good engineering practices, applied intelligence, or as Jordan prefers “intelligent infrastructure”. Although I think this is only one projection of the larger picture which paints quite different story and context for what has emerged lately in the form of Deep-Learning.

I think that the origin and substance of AI and Neural computation combines many other clusters of early Machine Learning/AI and especially what became known as “cybernetics” coined by John Von Neumann, Norbert Wiener, and Warren McCullouch, from the 1940s-1950s and is tangled up and confounded with the origin of computation itself. Initially, in 1940, John von Neumann (JVN), who was director of the Institute of Advanced Study (IAS) in princeton (not affliated with Princeton University), was focused on creating brain-like computation and even referred to memory storage in the IAS computer as "neurons" and referred to programming as "learning". The nexus of computational invention and novel thought was clearly intersecting in this sleepy little town in middle New Jersey. Beside Von Neumann (and Einstein who seemingly had no interest in any of this — often would wander around looking for tea, during fledgling and classic cybernetics talks), Alan Turing was a graduate student in Mathematics at Princeton University, finishing his Ph.D., Walter Pitts (of McCulloch and Pitts) was brought to the IAS by Von Neumann, to help with the development of the IAS computing device, one of the very first computers, as he was impressed with their paper connecting brain, logic, language and neural networks(“A Logical Calculus of the Ideas Immanent in Nervous Activity”) . Interestingly, Claude Shannon was also hanging out at IAS (as a Research Fellow) and Von Neumann (McCulloch and Wiener) helped create a number of influential workshops on cybernetics and information theory, partially paid for by nearby Bell Labs, Murray Hill, NJ, that could have easily in the early days of NIPS been confused for a prototypical NIPS workshop ( the “Macy workshops” 1953). The intersection with AI as concieved by John McCarthy also intersects with cybernetics, as McCarthy left Caltech to finish his PhD at Princeton with Von Neumann and others. The Psychologist George Miller (who told me most of this as oral history) was also at the IAS during this time period and worked with Von Neumann on Language and Markov processes and picked up on information theory (using it to characterize working memory: “7+ or -2”), prior to meeting Noam Chomsky and then heading in a new directions to create psycho-linguistics and eventually Cognitive Neuroscience.

So contrary to Jordan’s interesting narrative, the present AI/Neural Information processing explosion, really evolved from this yeasty mix, and in point of fact McCarthy's creation of AI (at Dartmouth as junior faculty), was a reaction to the 1st rise of Neural Networks primarily due to JVN and with the what amounted to the most recent "successful" neural network model due to the psychologist Frank Rosenblatt--the perceptron (a student of the psychologist Egon Brunswick). All of this “perceptron” work must have appeared terribly unprincipled to McCarthy at the time. Consequently the 60s and 70s was the growth of McCarthy’s AI (symbolic, logic, language) in opposition to Neural Networks and the Cybernetics and Information theory that had led to Neural networks. And once Marvin Minksy sunk the “perceptron” with scaling and complexity arguments, and given the impoverished state of computation at the time and the lack of vast amounts of data, neural networks were left for dead. For the next 15–20 years McCarthy’s AI or mutations of it flourished till around 1985-1986.

In the mid 1980s several events occurred that were dropped like bombs destroying most of McCarthy’s AI. In each case, the events were either directly dependent on neuroscience observations or modeling human behavior. John Hopfield, Terry Sejnowski, Geoff Hinton, Dave Rumelhart and Jay McClelland and others (Steve Grossberg incredibly persisted during this dark Neural Network winter) revived Neural network research, and the nascent VonNeumann/McCullouch/Wiener/information theory/cybernetics modeling in direct contrast to AI's symbolic logic approaches. Machine Learning at that time was actually an offshoot of AI, not neural networks or neural computation. As NIPS appeared --initially as the socalled "Hopfest" at Caltech, which would be reminiscent of those meetings in the 40s and the 50s that Von Neumann, Weiner and McCulloch initiated.

Specifically, Backpropagation (Rumelhart) revived this "neural network" agenda, Boltzmann Machines, ICA (independent component analysis), TD (temporal differences) and many other algorithms inspired or based on biological systems were developed due to a set of common features and principles that as Terry Sejnowski has eloquently stated elsewhere (also in a new book he has written) about the origin of NIPS:

“NIPS grew out of the neural networks community in the 1980s with the goal of exploring the computational principles found in brains, which is different from brain modeling. These principles included learning, massively parallel processing, a high degree of connectivity, and dealing with high-dimensional problems in vision, speech, natural language and motor control. The latest versions of convolutional neural networks borrowed additional principles from what we know about the visual cortex, such as convolutional linear filters (simple cells), pooling (complex cells), ReLU activation (sharp threshold), normalization (feedback inhibition) and deep hierarchies.”

— Terry Sejnowski (email to NIPS board, 2018)

This is, of course, in stark contrast to Jordan’s claims that the recent trends in AI never had much to do with neuroscience or cognitive science — but rather somehow were conincidental to the emergent engineering science in the larger neural network community. Dave Rumelhart for one, who deeply understood the connections between Neural Computation, Cognitive Science and later, Cognitive Neuroscience, was always looking for connections between the brain function and computation. He would have likely found such dogmatic claims as “proof by lack of imagination”, as he did with many “just so” claims about back-propagation when it was first appeared.

Part of the driving force in the initiation of the Neural Networks field, was the discovery of new methods and ideas that arose from observations of the natural world — the brain in particular. Using most recent brain imaging technologies we can finally see various brain areas that are implicated as we learn, encode, organize and retrieve episodic events, and yet we still don’t understand learning. Biological learning systems are not engineering principles, but clearly engineering principles can mimic features of biological learning systems. Clearly the recent explosion of research using an algorithm called Deep-Learning, where the mere addition of layers to a neural network causes an “unreasonable” increase in the representation and learning of feature detectors. There’s a lot of clues in all of this, but no complete theories (altho some have suggested a “compression” type of explanation of DLs due to successive recoding of information, while interesting, it is not really a theory of DL learning ).

There are already biologically and psychologically relevant science being done with Deep Learning. Recently DeepMind researchers found that in merely learning navigation, DL would begin to represent “place-cells”, also shown in small mammals navigating a familar area. Cognitive science is also starting to return to Neural Networks through recent events, as Jay McClelland’s re-booting of the Contempory Neural Networks in Psychology conference last year and in Frontiers where some of this work was recently published. So I predict this is just the beginning of the growth of Neuroscience and Cognitive Science and its connections to Neural Information Processing more specifically. This is a time of AI reconstruction and rebirth, rather then waiting for “Intelligent infrastructure” to paddle itself up a some well navigated river.

Bio: Stephen José Hanson is Full Professor of Psychology at Rutgers University and Director of Rutgers Brain Imaging Center (RUBIC) and an executive member of the Rutgers Cognitive Science Center. He has been Department Chair of Psychology at Rutgers and the Department Head of Learning Systems Department at SIEMENS Corporate Research and a research scientist in the Cognitive Science Laboratory at Princeton University. He has held positions at AT&T Bell Laboratories, BELLCORE (AI and Information Sciences Department), SIEMENS Research, Indiana University and Princeton University . He has done modeling in a number of diverse areas including animal learning theory (conditioning theory),human-computer interaction, behavioral genetics, complex skills learning, and neural network learning algorithms. He has more recently focused his research in model-based or computational neuroimaging. He specializes in the learning sciences and computational neuroimaging. He has studied and published over 100+ papers and book chapters as well as edited books on learning in humans, animals and machines. He was General Chair (1992) for Neural Information Processing Conference and elected to the NIPS foundation board in 1993 where he is still on the Advisory Board, he was also a founding member of the McDonnell-Pew Cognitive Neuroscience Advisory Board which for over a decade helped launch the fields of Cognitive Neuroscience and Computational Neuroimaging.

--

--