Neuro-Symbolic AI or can we create an AI that is good at (almost) everything?

8 min readFeb 28, 2023

Neuro-symbolic AI is a strand of AI research that has been around for a while but that recently got more and more interest. It tackles interesting challenges in AI like trying to learn with less data, to transfer knowledge to new tasks and to create interpretable models. Never heard of it? Well, it’s quite a niche field so don’t worry. The term “neuro-symbolic AI” might sound mysterious and definitely abstract. Let’s begin to understand the “neuro” and the “symbolic” by looking at our own way of thinking.

At the heart of neuro-symbolic AI is the question of how to get from sensory experience (“neuro”) to abstract thinking (“symbolic”). “Dogs are not cats” is a statement that we all understand and (likely) agree with. This statement might sound trivial but when you think more about it a few things become striking:

How do we learn to make the mapping between the dogs we see in the real world and the word “dog”?
Why do we all understand what a “dog” is? Well, maybe because dogs are physical things and we have all seen them but what about more abstract things like “justice”?
How do we have a common understanding that a “dog” is not a “cat”? After all, they are both furry and cute animals.

What is more, when we make a general statement like “Dogs are not cats” we don’t refer to any dog or cat in particular and we completely abstract away from all the specificities of dogs like their color, size, fur texture etc. What matters are the defining features that dogs share and which make them different from cats e.g. their distinct skull shapes. This is called a concept and we use a symbol like the word “dog” to refer to the concept. Having symbols is a powerful tool because it allows for abstract thinking. Together with a grammar that defines how things relate to each other e.g. “A is not B” or “A is part of B” we can make infinite statements about our world and draw connections between things that are not connected in the physical world for example through analogies. It also allows us to make inferences. This means that we are able to draw conclusions: “All dogs are mammals” and “All mammals are cute” allows us to conclude “All dogs are cute” (as you can tell I love furry animals).

Cognitive Scientists still don’t really understand how the repeated reflection pattern on our retina produced by seeing dogs leads to the abstract concept “dog” that we can reason about in our language. Or phrasing the problem the other way around: What is the neural correlate (e.g. neuron, group of neurons, pattern of activity) of the concept “dog”? This is called the neuron-cognition gap and is one of the most exciting frontiers in Cognitive Science in my opinion. But what is clear: we humans are all able to achieve this.

How about artificial intelligence? When it comes to “seeing” and recognizing objects in the world, AI research has made impressive leaps in the past years. In Computer Vision, researchers have created systems that distinguish areas with different meanings based on visual differences (semantic segmentation). But even if the AI “sees” the difference between a lawn and a dog, does it have a concept of those things? It is safe to say that the AI’s concept does not have the richness of our concepts of real-world things. Through our senses, we mix different modalities and create multi-modal concepts. What does the dog look like? How does it feel to touch the fur? How does it sound? How does it move around and interact with us? Some dog fans probably also have encyclopedic knowledge about dog breeds and other useful facts like life expectancy, temperament of certain breeds and common diseases. The rich multi-modal sensory experiences and the factual knowledge make our concepts multifaceted to an extent that is not yet achieved within AI.

If we now assume that the AI has acquired a concept of “dog” by seeing a lot of images of dogs: How can it use its knowledge of dogs to form statements like “Dogs are not cats” based on its experience of cat and dog images? In other words: how is data translated into symbols? This is really the key question of neuro-symbolic AI.

Computer Vision models can distinguish between dogs and the background based on visual dissimilarity. Source: https://iq.opengenus.org/panoptic-segmentation/

Grounding symbols in data

In computer programming we use logic to express things like “if DOG then MAMMAL”. The concepts that a Machine Learning model learns however come in a different form. They are real-valued vectors with multiple dimensions that correspond for example to the pixel values of an image. They cannot be easily “plugged” into a logical formula. Researchers have experimented with different ways of grounding symbols in data and geometry plays a key role. The vectors learned by the Machine Learning model can be represented in a coordinate system. When we map every vector representation for dogs in a coordinate system, a shape emerges that corresponds to the concept dog. The same is true for a higher level concept like mammal which would correspond to a bigger space that includes “dogs”. We clearly see a dog-is-part-of-mammals relationship which then can be expressed as the logical statement “All dogs are mammals”. Having symbols coupled to those vector representations means that whenever the concept of “dog” slightly shifts through learning more about dogs, the meaning of the symbol changes with it.

The vector representation of a dog image can be mapped into a coordinate system. Repeating the same for multiple data points, we see shapes emerge that correspond to concepts. The geometric shapes of the concepts “dogs” and “mammals” are semantically interpretable as part-of-relationships. These can then be expressed in logical terms.

To sum it up: Neuro-symbolic AI tries to connect the learning from sensory experience with the reasoning with abstract symbols. There is however not one single way to achieve this. Let’s look at some challenges that researchers in the field tackle.

Are large language models good thinkers?

With the rapid advances of Large Language Models (LLMs) one could easily get the impression that models that have great language abilities also have great reasoning abilities. This however is not necessarily the case. Joshua Tenenbaum and his colleagues call this the “good at language -> good at thought” fallacy. We confound language abilities with good thinking because much of our thinking is consciously experienced in linguistic form. It gets even trickier: Even if a model performs well on a reasoning task, it doesn’t mean that it learns to reason. Let me explain this quite confusing finding. Guy Van den Broeck and his colleagues did some interesting experiments training a BERT model on a simple logical reasoning task (forward chaining). They led several “BERTs” learn the same logical reasoning task and all of them individually performed really well. Looking at the high performance one could assume that BERT has learned the reasoning problem. But what they found was that a BERT model trained on one distribution fails to generalize to the other distributions within the same problem space. This means that BERT was not able to generalize what it had learned which is always a bad sign in Machine Learning (and in life). But if it did not learn logical rules, then what did it actually learn? It learned statistical features because this is what BERT and most other ML models do. This finding can serve as a warning: If we see that a model produces the expected output, we often fall into the trap of believing that it “thinks” just like us. It is usually only when we change the context slightly, that we realize the fundamental differences between how ML models “think” and how we do. And it seems like LLMs are not well equipped to solve all kinds of reasoning tasks.

We shouldn’t throw the baby out with the bathwater though because LLMs can with a little bit of help become better at commonsense reasoning. We use our common sense all the time to fill in information gaps, for example when we hear these sentences: “It’s going to snow. I’ll have to wake up 30 minutes earlier”. Through our experience and understanding of context, we can conclude that we need more time in the morning because we have to free the car from snow before leaving for work. Antoine Bosselut and his colleagues thought that enhancing LLMs by giving them additional structured knowledge in the form of knowledge graphs might improve their ability to “fill in the gaps” with common sense. While LLMs already encode a vast amount of knowledge from text corpora, they were able to teach the model the structure of knowledge in the way they wanted it to represent that knowledge. They then analyzed which parameters of the model changed during fine-tuning meaning at which points in the model learning took place. They found that most of the parameter changes happened in the decoder and specifically in the attention heads where different representations get mixed while the encoder and feedforward layers changed little. This indicates that the transformer model learned how to express and access previously learned information rather than learning many new relationships from the knowledge graph itself. Combining LLMs with knowledge graphs can also provide a factual grounding of knowledge and more stable and interpretable concepts which is a major shortcoming of current LLMs.

Taking Neuro-symbolic AI from the ivory tower into the real world

At this point you may wonder: What is all this neuro-symbolic AI research good for outside of the research cosmos? I was surprised to hear that some people already turn research into products. The company Elemental Cognition builds natural language understanding solutions using neuro-symbolic AI. By “understanding” Director of AI Research Adi Kalyanpur means to fluently engage with, simplify, inform and confirm understanding. They have developed different neuro-symbolic AI models that at their basis all process input through neural networks that produce probabilistic outputs. These outputs are converted to a symbolic model which performs logical reasoning. The output of the reasoning step is then converted back into a ML model to generate natural language output. One of their most interesting use cases is a virtual travel agent which helps customers book a world trip. The problem is quite complex when you think about it: there are hundreds of possible destinations that the agent should consider, the duration of the trip can vary from a week to up to a year, there are millions of possible flights, layover and schedule combinations. What’s more, flight availability changes constantly and customers’ preferences for a destination or schedule might change during the conversation as they better understand what the implications of each decision are. Having a neuro-symbolic system in place helps to interact dynamically by processing natural language input and producing language output while transforming natural language instructions into rules that can be processed by a reasoning engine. This way Elemental Cognition has managed to build a flexible conversational agent that relies on fact-based knowledge.

What’s next?

Is neuro-symbolic AI the right way to solve the hardest problems in AI? I genuinely don’t know as of now it is mostly applied to toy problems and is quite domain-specific. Wherever this research endeavor leads, for me the most important thing is that researchers in this field are asking crucial questions and are thinking outside the box. It is quite refreshing to see critical minds amidst the hype around large language models and the “bigger is better” mentality. Let’s see what becomes the next big thing in AI but I think structure and learning will play an important role.

Neuro-Symbolic AI or can we create an AI that is good at (almost) everything?

Written by Clara Swaboda