Neurosymbolic AI to Give Us Machines With True Common Sense

Inside IBM Research
The Startup
Published in
15 min readJul 8, 2020


By Katia Moskvitch

“The dog hid under the bed. Again.”

At any other time, IBM computer scientist Danny Gutfreund, then at IBM’s Haifa lab in Israel, would’ve probably largely ignored his older brother’s comment. But this was the third year in a row that the dog, Bona, would hide under the bed in the morning before the start of the fireworks marking Israel’s Independence Day. Somehow the dog knew about the imminent loud bangs; something in her brain made her connect the dots. The year was 2012, and the odd yearly pattern in the dog’s behavior quickly became the subject of dinner conversations. The puzzle was too tempting — there’s still so much we don’t know about the brain, be it dog’s (learning from complex associations from just a few examples) or human’s. Yet, we’ve been trying to imitate it in machines for decades. Intrigued by Bona’s behavior, Danny started working in artificial intelligence (AI).

While researchers have been making great progress in AI, they still haven’t been able to give machines the special ingredient that makes us ‘us’: common sense. We just know, seeing a person walk in soaking wet, that it’s raining outside. Dogs have some basic common sense too, or rather what’s called rapid learning; Bona knew, from observing her owners perhaps set up a BBQ on that specific day, year after year, that fireworks were coming. No robot can even do that — yet.

Today, machines translate languages, recognize objects and spoken speech. But ask a smartphone assistant something more complex than a basic command, and it will struggle. Machines with common sense, which rely on an emerging AI technique known as neurosymbolic AI, could greatly increase the value of AI for businesses and society at large. Such AI would also require far less training data and manual annotation, as supervised learning consumes a lot of data and energy — to the point that if we keep on our current path of computing growth, by 2040 we’ll exceed the ‘power budget’ of the Earth. There’s simply not enough data or power to continue on with today’s AI.

This is exactly what a new collaboration between IBM and MIT, Harvard and Stanford universities, financed by the US defense department’s research agency DARPA, aims to change. The idea is to get computers to learn like humans — by developing the same basic building blocks for learning as a six-months-old infant. The researchers want AI not to just recognize objects, but to be able to understand what it sees and apply reasoning to act accordingly in a new situation.

“There is in principle no barrier to creating artificially intelligent systems that match or exceed our capabilities,” says David Cox, computer scientist and IBM director of the MIT-IBM Watson AI Lab in Cambridge, MA.

Early attempts to simulate the brain

go back to mid-1950, with the coining of the term “artificial intelligence” in1955. The field really kicked off the year after. Back then, the approach to AI resorted to symbols representing objects and actions, similar to how humans process information. That’s the essence of the once-mainstream approach to machine learning called symbolic AI that is still used today, albeit not very widely. It is based on the idea that humans make sense of the world by creating internal symbolic representations and rules for dealing with them, based on logic. These rules can be turned into a computer algorithm, capturing our daily knowledge — describing, for instance, that if a ball is thrown and there is no wall, it should keep going straight. But if there is a wall, it should bounce back. The computer uses these structured representations of knowledge and applies logic to manipulate them, gaining new knowledge to ‘reason’ somewhat similar to humans.

The IBM 702: a computer used by the first generation of AI researchers.

The field looked promising for over a decade, with the US government in the 1960s pouring billions into symbolic AI research. But the technology wasn’t progressing as fast as expected, the systems were costly and needed constant updating, and would often become less precise when more rules were added to them. AI researchers themselves were getting pessimistic, with doubts spilling into the media. Government funding turned into a trickle and the research stalled around 1974, tossing us straight into the first AI winter, with overpromises followed by under-deliveries.

Symbolic AI simply could not cope with the messiness of the real world. For instance, to create a code that would comb through hundreds of images and select those of a specific person, such systems had to compare any new images to the original one. But if in the new images the person was pictured from a different angle, the program floundered. There were some spikes of progress: in the early 1980s, a team led by computer scientist Terry Sejnowski decided to challenge symbolic AI and developed a data-fueled program that could learn how to pronounce words, just like babies do.

The 1990s saw major developments through really powerful tools of probabilistic models and statistical inference, the paradigm that gave us the modern field of machine learning. “People might not have called it AI, but it was as much as today’s AI,” says Josh Tenenbaum, an MIT professor of computational cognition. “It’s all about trying to make good guesses, building models that make inferences from patterns of observed data to the underlying causes.” This probabilistic approach led to advances in natural language processing and machine learning, and drove technologies we now take for granted such as scalable internet search at Google.

Then in 2009, Stanford University computer scientist Fei-Fei Li created ImageNet, giving a boost to a different approach to machine intelligence: deep learning. While deep learning has been around since the 1960s, it was the combination of large-scale datasets (such as ImageNet), strong computing machinery, increasingly powerful computing machinery (such as Graphical Processing Units, or GPUs) and advances in algorithms and programming languages that created the perfect conditions for this revolution.

Deep learning relies on neural networks originally inspired by an attempt to replicate the nerve cells in the brain, the neurons, and all the complex interactions they have when you make a split-second decision like putting your hand out to catch a vase falling from a shelf. Machines can’t yet do tasks like this. Still, during the past decade deep learning-based AI has made huge progress, processing mountains of data, computing complex problems humans struggle with and creating models predicting future outcomes based on previous patterns.

The development of deep learning triggered an AI boom and the field exploded around 2012, launching the era of convergence of bits a computer relies on to process information and synthetic neurons, the basic computing units of a neural net. Researchers and companies suddenly realized that data had much more value than they had ever imagined.

Dan Gutfreund, Computer Scientist, IBM Research

That was the year when Danny Gutfreund became the manager of one of the most ambitious AI initiatives yet — IBM’s Project Debater. His colleague at the Haifa lab, computer scientist Noam Slonim, decided to build it after the supercomputer IBM Watson, using a combination of symbolic AI and probabilistic inference, outwitted two humans in the TV quiz show Jeopardy! in February 2011. Fast-forward eight years, and Project Debater, a neural nets-based brain embodied by a black monolith with blinking blue lights, confronted a debate champion Harish Natarajan on 11 February 2019. “I suspect you’ve never debated a machine,” Project Debater said to its rival. “Welcome to the future.” Natarajan chuckled, slightly uneasy at first, but quickly got used to speaking to his digital opponent as if it were human. Able to sift through hundreds of millions of articles and answer queries based on the data it acquired, the AI could be of use to businesses.

And, crucially, Project Debater’s digital brain follows similar processes humans go through — to an extent. Its neural nets are driven by data, learning from examples. That’s why neural networks are great in recognizing patterns, be it in language or imagery. But while we only need one or two examples to recognize an object or understand a sentence with an unfamiliar word, a neural net needs hundreds.

Harish Natarajan with his opponent, IBM Project Debater in San Diego, 2019

Still, deep learning has led to dramatic advances in many areas. In computer vision, instead of searching for specific pixel patterns, such as edges, like symbolic AI would, the neural net’s algorithm is first trained on many images over time. It then creates a model so that when faced with a new picture, it outputs a probability over all possible predictions, leading to accurate image recognition. Deep neural networks have also greatly improved natural language processing, enabling machines to perform complex translation to multiple languages. They help us find errors and inconsistencies in heaps of tax returns and assist us in the design of new materials by creating predictive models for unknown molecules — exactly the tasks where symbolic AI fails.

But deep learning isn’t without its limitations.

One significant challenge is that neural nets can’t explain how objects relate to each other. As they rely on available data, they can’t reason — they can’t have common sense. “Common sense is all of the implicit knowledge that we have that’s never written down anywhere,” says Cox. “I know that if I take a cup and put it on the table, the table will support it. And even if we ingest a giant corpus of natural language into a machine, we’re not going to find a lot of examples of somebody stating that fact.” For all their awesomeness, neural nets don’t work the way human brains do — and likely never will.

The yearning to solve this common-sense riddle

brought Danny Gutfreund from sunny Haifa to Cambridge, home to MIT and Harvard, on the eastern coast of Massachusetts Bay. He wanted to try something new.

To help machines reason like us, Gutfreund looked to mix the symbolic AI of the past with neural nets, fusing logic and learning. Neural nets, he reasoned, would enhance symbolic AI systems by splitting the world into symbols — recognizing images and translating pixels into a symbolic representation. And symbolic AI algorithms would inject into the neural nets common sense reasoning and domain knowledge. They would apply logic and semantic reasoning to describe relationships between objects, predict various interactions, answer questions and make decisions — just like a human would. He wanted to give neurosymbolic AI a try — a new field, with a handful of groups exploring it.

In Boston, Gutfreund encountered the babies.

Or rather, the data from lengthy research spanning several decades into how babies perceive the world. “Many people imagine young babies as passive recipients of environmental experience. They look passive because they can’t do anything,” says Rebecca Saxe, an MIT professor of cognitive neuroscience. Indeed — very young infants can’t yet sit, walk, or talk, and the way they seem to learn may remind how researchers pre-train machine learning algorithms. Scientists ingest vast banks of data into software as passive experience, and let machines extract statistics, patterns and structure.

But human infants are not passive, says Saxe. “Right from the very moment that they are born, they are making choices of what their experience is like,” she says. They learn by extracting structure from vast amounts of experience — but they are actively choosing what to look at and what to learn from.

Rebecca Saxe speaking at TEDxCambridge

For years, Saxe has been trying to understand human cognition, observing five months old infants and studying their gaze. Her work, as well as that of Harvard cognitive psychologist Elizabeth Spelke and others, has given crucial insights into the processes inside the still-growing brain — from the way a baby looks at an object and for how long. For their experiments, the researchers resort to near-infrared spectroscopy (NIRS), studying neural activity with light. “You shine light through a baby’s scalp, and then use a detector to measure the amount of reflectance of two different wavelengths,” says Saxe. “That tells you the relative oxygenation of the blood in the brain, because when neurons are more active, they consume more oxygen.”

The measurements have helped her understand changes in the neural activity. A baby may be looking at something for longer because it’s a familiar object, or because he or she likes it, or finds it surprising, or perhaps scary. Different triggers lead to sparks of activity in different brain regions. “It surprises me that you can measure a baby’s cognition from their gaze,” says Saxe. “It surprises me that you can disentangle their different motivations using neuroimaging. It’s pretty wild, but it seems to be working.”

Intrigued by Saxe’s results, Gutfreund, Cox and their IBM AI colleagues in Boston decided to have a chat with her about a possible collaboration. What if we combine psychology and neuroscience with machine learning, they reckoned, to eventually try to apply theories about infant cognition to AI algorithms? “The hypothesis is that as we make AI more like babies in those ways, we will get insights that will push new AI away from doing just pattern classification, which it mostly is right now, and towards being actual reasoning and cognition,” says Saxe.

In addition to Saxe, the IBM team also teamed up with Elizabeth Spelke and Josh Tenenbaum, MIT professor of cognitive science and computation, along with Harvard psychology professor Tomer Ullman and MIT computer scientist Vikash Mansinghka. The MIT and Harvard researchers have an intriguing approach to AI, drawing on the insights of Spelke, Saxe and others about infant minds: They posit that humans are born with a pre-programmed rough understanding of the world, in some ways analogous to the game engines used to build interactive immersive video games. This “game engine in the head” provides the ability to simulate the world and our interactions with it, and serves as the target of perception and the world model that guides our planning.

Crucially, this game engine learns from data, starting in infancy, to be able to model the actual situations — the endless range of “games” — we find ourselves in. It is approximate yet gets more and more efficient — to the point that very quickly, humans make instant mental approximations that are good enough to thrive in the world. And, the researchers think, it’s possible to replicate this type of system in a machine by embedding ideas and tools from game engine design inside frameworks for neurosymbolic AI and probabilistic modeling and inference known as probabilistic programs.

In August 2019, the researchers got to work,

aiming to give machines true common sense — by reverse-engineering a child’s brain. Soon, more scientists joined, including other developmental psychologists, computational neuroscientists, computer scientists and cognitive scientists from MIT, Harvard and Stanford. With the blessing of DARPA, which awarded the collaboration several million dollars for a four-year project to research and build computational models mimicking core cognitive capabilities of babies, Gutfreund and colleagues embarked on an ambitious adventure. Because, according to DARPA, the absence of common sense is the most significant barrier between the narrowly-focused AI applications of today and the more general, human-like AI systems hoped for in the future.

David Cox

“I think this is where we have to go,” says Cox. “AI has gone in common waves of winters and springs: we overpromise, then underdeliver. We’re in an AI spring right now. And I think it’s existential that AI research moves in this direction — learning like babies do.”

Recently, researchers from the MIT and Harvard teams created an algorithm that relies on the combination of neural networks, symbolic AI, and powered by a probabilistic physics inference model, to track and react to objects as they move and may become suddenly hidden from view. Babies already know by the time they are three months old that if this happens, the object they cannot see anymore will remain in place and not vanish.

To get the machine to learn this common-sense knowledge, the researchers relied on a deep neural network to identify the physical properties of the objects — their shape type, location and velocity. The model translated the pixels in the video to symbolic representations. Then, feeding on the symbols, a probabilistic physics-based reasoning model tracked how the scene unfolded, indicating any unexpected event — such as an object suddenly vanishing. The machine did well — when the cube in the simulation suddenly disappeared after the blocking object was removed, the software flagged it as an implausible event — just like a baby would look at the empty space for longer, surprised at the violation of physics.

And it’s not just vision that makes us reason the way we do. It’s also language. “Our ability to bridge between from our perceptual systems, including vision, to language is crucial to our intelligence: that we can talk about the things that we perceive and imagine scenes when we talk,” says Roger Levy, a cognitive scientist at MIT. “I can tell you about a silver dusted porcupine that’s living five miles underwater, on the ocean floor near a coral reef. That definitely doesn’t exist in the world, it’s absurd. But you probably have a very rich picture in your mind of what that’s like right now — because you can go from perception to mental representations to language and back.” We should be able to recreate all of this in a machine, he adds, and also include all the other sensory modalities that connect to language.

There are multiple groups looking into the language

side of machine intelligence, among them another team at IBM. Led by Alexander Gray, VP of IBM AI Science based in the company’s Yorktown lab near New York, the researchers are relying on recent advances in statistical AI for natural language processing. “Classical AI is not cool anymore; deep learning is cool. So we’re definitely in a minority — or you can look at it as we’re ahead of the game,” laughs Gray. “We think we’re ahead of the game.”

His aim is to gradually move from pure black box neural net models to models that can be understood as logic-like knowledge — but not necessarily the knowledge elicited from humans. “You can’t rely on a bunch of humans to write down all the knowledge in the world,” says Gray. “Instead, we’re going to learn that knowledge, to acquire it automatically from text.”

For the past few years, Gray and his team have been using so-called semantic parsing, translating a natural language sentence into a logic-like sentence — mapping the words to explicit symbolic concepts. “Take the phrase ‘Mary had a little lamb’ — we will identify the word Mary, and map it to the concept of a person within a knowledge graph, allowing the use of other rich information, such as the fact that a person is a kind of mammal, which is a kind of living thing, and so on. This allows us to apply common sense knowledge that the machine can use to perform more general tasks,” says Gray.

Another part of the research program will automatically acquire that knowledge. “The advantage of using a model which has a logic-like form is that you can then perform reasoning to get the answer to more sophisticated questions,” says Gray. “This is a possible path to true natural language understanding.”

Another team at the MIT-IBM Watson AI Lab is also interested in combining vision and language. The researchers developed an algorithm called the Neuro-Symbolic Concept Learner, where an AI with two neural networks answers questions about objects in images. One network creates a table with characteristics of the objects such as color, location and size. The other one is trained on question-answer pairs, such as “What’s the color of the cube?” — “Red.” That neural net then transforms each question into a symbolic AI program that references the table to get an answer.

That’s perception — the equivalent of a photon of light hitting our retina and streaming the visual data into the brain, which then translates it into something we can describe in language. Crucially, the researchers have been able to gradually relax the amount of innate knowledge that the system has to have. First, it had to know the different kinds of objects that were there, their colors and sizes. Then the system knew that there was something called color but it didn’t know that blue or red are colors — it had to figure it out from context and learn implicitly how it was tied to language. And finally, the system didn’t even know that color was a concept, it had to figure it out and learn what the color corresponds to. “They’ve been on this interesting progression where the system is given less and less and it has to learn — basically, developing common sense,” says Cox.

These experiments are just scratching the surface —

there’s a lot more to learn about the brain. Perhaps in a roundabout way, our progress in giving machines the ability to learn and reason like humans will also help us understand how babies know that objects don’t teleport. Perhaps it will even help Danny Gutfreund finally find out how his brother’s dog knew that every year at around the same time there would be fireworks.

But most importantly, this neurosymbolic AI research should help us build machines of the future, autonomous systems able to accomplish tasks without external input — drastically important in critical situations such as natural disasters or industrial accidents.

We may be in an AI spring right now, but there’s a long way to get to an AI summer — and to stay there without overhyping the research and underdelivering. “We’re still working in a petri dish. But we are now finally starting to mimic the human ability to acquire common sense knowledge in an unsupervised way, by acting and being in an environment of learning,” says Cox. “Yes, AI research is still a very simple toy world. But a toy world that at last holds a lot of promise.”