Why is current deep learning technology a dead end for Artificial General Intelligence?

Maciej Wolski
The Startup
Published in
15 min readFeb 6, 2020

--

Introduction

To not question things is to agree to stay in the same place. Often during that process, your mind can go to wrong directions. But still, you can learn a lot, during the exploration of uncharted territories of the human potential.

Excuse me that I will move away from the main topic for a moment, but first I want to share something.

More than 10 years ago I started to learn in a hard way what is the power of continuous effort. I suffered from an unusual health ailment that made basic things problematic like walking, breathing and sleeping. And believe me, this was just a small part of the whole experience.

I did anything I could to overcome it. But no one could help me, even the most esteemed professors. The proposed treatments made things only worse, some negative effects of the physical intervention will stay with me until I will die.

Without any other choice I have decided to learn about human physiology and biochemistry. It took me 10 years to fully recover, but I saw the incremental progress all the time. And the effort was totally worth it.

That taught me a lot about questioning things and my problem-solving potential. After thousands of hours of learning, I found the most precious piece of information in a very unpopular medical book, written on the other side of the world. I connected the dots and for the last 4 years, I enjoyed perfect health again. I also discovered how to extend my mental and physical performance.

Some people like doing risky things and experiencing an adrenaline rush, ignoring or not feeling the fear at all. I don’t feel now any limitations in expanding my knowledge in any chosen area. What stands between me and my goal is just time.

Artificial Intelligence is one of these areas. Once again I have spent countless hours to analyze a hard subject. And after years of research and practical experiments, I am ready to share some of my conclusions. The rest is a foundation of AGICortex — the realistic architecture of Artificial General Intelligence with an early stage proof of concept available.

So before we start, one additional comment. If what you will read in a moment will make sense — all conclusions were possible to make because of decades of hard work of all involved researchers. And I don’t have any doubt that each human generation can improve the work of the previous one.

I don’t pretend to have all the answers, but I am quite confident about the direction.

Having a reliable background in biochemistry, while working since 2007 in the software development industry just helped a lot to open new perspectives of thought.

1) Backpropagation is perfect for Narrow AI, but terrible for AGI

Backpropagation is the fundamental algorithm of most currently popular Deep Learning applications. It allows the neural network to improve and find the optimal shape. And I can’t imagine right now a better solution to a single problem. But when we try to teach a network another task or just utilize a significant portion of new data some problems may arise, trained abilities partially disappear in a process called “catastrophic forgetting”. This really makes continual learning hard.

And isn’t intelligence about constant adaptation?

Of course, there are some ideas about how to overcome that problem, like slowing the modification of parts crucial for previously learned tasks. But to fit new data the number of neurons must be increased.

The brain has many more neurons than our networks so it is not a problem, right?

Well, yes but it is also much more energy-efficient than our technology. Like hundreds of thousands of times.

You see, the brain is modular and can switch on and off large parts of itself. You have billions of neurons in your head, but you only use a small portion of them at a time.

Only those that are useful. Backpropagation uses all neurons at each iteration.

Brain’s compartmentalized architecture allows it to learn different things, while still exchanging information between different regions.

So why not create multiple deep networks and connect them? Yes, there were already such attempts, but advanced intelligence is much more than that — what you will at least partially discover in the rest of this article.

2) Unsupervised learning

The brain does not need thousands or millions of examples to learn something. It also does not need a supervisor all the time as our deep networks.

It clearly signals that the method of learning is quite different — more incremental and complex. You can learn something, even if you don’t have clearly defined label or mapped outcome.

Of course, we did not build flying machines by imitating wing moves — but the birds served as a proof that when we introduce a stronger force that will overcome gravity — the objects heavier than air can fly.

Even if our AGI will not learn exactly like the brain does, it will be able to do it in an unsupervised way.

3) Cognitive maps vs routes

Imagine for a moment a solution to your problem as a cognitive route from the input data to the desired output. This is a cognitive route and deep learning strives for mastery in this discipline.

Deep learning is a form of associative memory between inputs and outputs.

In the past, people found places by moving along the directions provided by their friends or family.

„Move straight through the forest, until you will find a river — cross it, turn left and stop near the weird curved tree. Go to the hill with three big rocks on top.”

Worked fine until somebody cut off the tree or somehow moved the rocks. The additional drawback was that this knowledge had a very narrow application — to a single task.

What we do now instead of relying on such hints? We create maps that are reusable — we can find multiple routes from various start points to unlimited end points.

Imagine now building a knowledge representation that will allow you to get to any place in the physical or mental space.

This is the difference between Narrow and General AI.

And it has serious implications for neural architectures as well.

4) Predictive processing

Our brain always actively predicts what will happen in the next seconds. And constantly updates its reality model.

If you ever wondered why humanity is characterized by such significant curiosity — here is the answer.

Besides our urge to avoid the pain or get rewards for fulfilling the basic needs, we have also another driver.

When our basics are good, we want to improve our cognitive map — a mental model of the world. Whenever there is a lack of understanding — we want to know why something is as it is.

Trust me, I know something about it because I am a bit extreme case. I never stop until I find a satisfying explanation. So I feel this desire quite strongly.

This predictive processing saves our mental energy but also motivates us to become better and better, expand our knowledge and skills.

Improve our cognitive map of knowledge and skills to be used at wish in the future.

Predictive processing is also the source of our intuition. But more about this in one of the next points.

5) Energy-efficiency

Let’s dig deeper into this topic. Our body is self-assembling biotechnology, including the brain. The energy was something really scarce in the previous centuries and millennia — something you really had to respect. And not just spend however you wish.

Our brains say „no” to as many things as they can. When your brain processes a continuous data stream it is filtering the inputs through the thalamus and sending to neocortex only those parts of information that are important.

When a result of unconscious mental processing is perceived as good enough, it is not bothering your consciousness. When you have an important task to do, a large-scale network called DMN (Default Mode Network) is partially disabled and CEN (Central Executive Network) is taking over.

You don’t have full control over your mind. You can learn how to override this, but it is a part of a different story.

So, your brain has 86–100 billion neurons, depending on the study. Most of them are densely packed in a part called the cerebellum, responsible mostly for movement of your body.

Around 16 billion are in the neocortex which is uniquely developed in primates, especially humans what resulted in our special capabilities.

Deep Learning utilizes all neurons all the time, that is why we don’t have networks with such size. Besides, it is so energy-inefficient that even processing units with operation frequency counted in billions of Hz, do not allow us to compete with the brain, which operates usually at just 10–40Hz.

Something is seriously wrong…

Intelligence is not only about introducing a stronger force, like more raw computing power. It will not be the same as with flying machines.

https://www.technologyreview.com/s/613630/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/

Google’s TPUs (Tensor Processing Units)

6) Multi-sensory data representation

Language is one of the hardest problems in AI. We can now create systems so good in text generation, that it can be perceived as written by humans. But is there any understanding behind it or just a heavy statistical relationship between words?

Before a child learns language it has experienced a lot of multi-sensory stimuli. Every meaning is not characterized by the neighboring words in some dataset — but a rich set of experiences that include vision, sound, smell, taste, touch, emotions and developed common sense.

Only then on top of those experiences, we put a correct label, replacing whatever child called eating, a need to go to the bathroom or anything else.

We can classify unknown objects by similarity of their attributes to already known classes. Even if we don’t know the exact name.

A sound can easily induce emotional reaction and smell bring back memories.

Because it is all connected, even if processed separately.

We need to reflect that in the neural architectures.

7) Embodied experience

How to explain the world to AI, which knows it only from fixed datasets of still images or text without all the background context.

How to teach it the meaning of the word „gravity” if it never experienced it.

If we want machines with human-like abilities, we must recognize that at least our bodies are biological machines. But I am pretty sure that brain is too.

One of things that make us humans is our rich sensory experience in a body. Our mind renders the world from colorless, soundless and tasteless atoms, interpreting the incoming signals. At the same time creating a reality model that allows us to understand.

Only then we know that whenever we throw something on Earth it falls down. But a dot on the digital map moving downwards is not affected by the gravity.

Because digital and physical world may have very different rules.

8) The continuous stereo data stream

We have this duality in our bodies — double eyes, ears, hands, legs, brain hemispheres.

It helps us to perceive and interact with the world in a new way, confronting one side with the other. Stereo vision helps us to measure depth in the visual scene, sound allows to locate the source and duality in the brain helps us to confront different sides of our mental capacity: hard logic with creativity.

At the same time, continuity makes us sure that the person who was your friend a second ago, still is the same and you don’t need to process his visual or vocal attributes to recognize it.

You can easily divide the background from the moving or newly introduced objects. The same is for sounds.

Your brain actively predicts what is going on and saves energy, improving confidence and learning in real-time.

9) Non-random initialization

Neural networks are randomly initialized because gradient-based methods used to train them to break down when all values are the same.

But do they need to be random? Our brain is not.

Human brain’s semantic space

What explains why we share similar semantic space in our brains.

A neural network can be optimized and forced to get the desired form. This is not a problem in a supervised environment. But random initialization can become one if we will have autonomous AI in the real world, where we would like to have the process of an adaptation done in the similar way each time, regardless of time and location.

I believe that in the future we will use Neuroevolution and pre-trained weights more.

We think about self-programming machines. Languages are good for humans, they perfectly handle the numbers as the native speakers.

10) The emotional state as a general rating system

We may think of emotions as unique only to humans and even inferior to the hard logic.

But it is a general rating system in the body. We can quickly evaluate our state through emotions, distinguishing between energized or even euphoric attitude and depressed low-energy states.

Whenever we need to make a decision it is based on our feelings. Even if we will think a long time about it, evaluating different options — in the end, we pick what „feels” best.

We already tried to mimic that with reinforcement learning, but it is only part of the story.

Emotions can be quantified because they are constructed with various levels of neurochemicals like serotonin, dopamine, noradrenaline and others.

Our fight or flight response depends on the high level of dopamine which motivates us to pick action quickly. But mostly to the level of noradrenaline: if it is low we experience fear and try to escape, but when it is high we find the courage to fight or at extreme levels we experience anger or even violent madness.

11) Digital neuromodulators

In the same manner, digital neurochemicals can lead to the autonomous AI allowing it to switch on and off large-scale neural subnetworks.

In the human brain high level of neuromodulator acetylcholine increases the activity of neurons related to memory, internally directed cognition, thinking, reasoning.

While dopamine increases the importance of external cognition and faster picking the good enough action.

Orexin modulates energy availability and is increased when we are most awake and decreasing when we are going to sleep or have an immune response. Digital orexin could help optimal energy expenditure in case of devices not connected to a power source all the time, like autonomous robots.

Our body uses chemical substances to inform itself and auto-regulate. I think there is a big potential in doing similar things in the area of Artificial Intelligence.

12) Artificial intuition

For Deep Learning it is only one way to process data. From input to output, a very reactive approach. But we have something really powerful — intuition.

Our neural units can become pre-activated when all conditions match, but we have not yet experienced the final effect of something in reality. We just feel that it will happen.

The same mechanism fills our consciousness with thoughts that came out of nowhere, but the mind thinks they will be useful.

Intuition can help us prepare, but also makes data processing easier. If the context is right — we can lower the thresholds for the object or sound recognition.

This mechanism sometimes is wrong and we make mistakes with recognizing other people as our friends or hearing words that were not said.

But most of the time it saves a lot of energy and probably saved a lot of human lives, sending a warning about potential danger.

13) Hidden brain — glial cells

What for many years was considered merely a filler in the brain — glial cells may have a really strong impact on its operation. We have more glial cells than neurons. They support them, providing nutrients and taking care of the garbage — toxic metabolic byproducts and external dangers by triggering the immune response.

The current state of the art neural networks completely ignore them. But they literally control the neurons.

Besides doing the maintenance, they influence the neuronal spikes and possibly also measure the data prediction error.

Astrocytes are not electrically excited for a short glimpse of time but chemically stimulated for a much longer period. And they can communicate globally the state of things among themselves, not only where more resources are needed, but how to improve the operation of the whole system.

14) Subcortical components

Current neural networks not only ignore astrocytes, but also subcortical components.

Besides neocortex, we have a rich supply of modules that support it in data processing. Thalamus, hippocampus, striatum, amygdala to name a few of them.

And they fulfill really important roles.

Without the hippocampus and entorhinal cortex, we would not have our memories and it would be hard to orient in physical space. Thalamus filters data and relays it to the correct parts of the brain. Striatum and amygdala modulate responses to the input data.

And there is also one really interesting part — claustrum. Scientists found that when electrically stimulated it serves as a consciousness on/off switch. (https://www.newscientist.com/article/mg22329762-700-consciousness-on-off-switch-discovered-deep-in-brain/)

15) Causal reasoning

One of the capabilities that makes us really powerful is causal reasoning. We can find the probable cause of the effect in our brains, by doing mental simulations — imaging or reminding the steps of the process.

Correlation does not mean causation, as expressed in a famous saying.

Statistics is not enough, we need relations, rich context information and multi-sensory experiences.

16) Mental simulator

As mentioned at the beginning of the article, our brains literally renders our reality from atoms that do not have any color, smell or sound.

Because it is so good at it — it can also simulate things that did not happen. This is the reason why our imagination exists, but also why our dreams happen.

It allows us to experience and learn without any damage in reality. The mental simulator is also a foundation of our conscious experience.

We created abstract things, existing only in our minds. We imagined many inventions before making them happen. It was truly our source of evolutionary advantage.

It needs to be present in the AGI architecture.

17) Incremental learning

Incremental learning is the total opposite of training the neural network with the fixed dataset.

It allows the machine to learn new things all the time and update existing knowledge to improve itself.

Sure, we can always re-train the model with the new data but it is very energy inefficient and with larger neural architectures will be problematic to do all the time.

Although intuitively incremental learning will lead to so-called overfitting, there are possible techniques to overcome that. And it is much easier with enough data that will be incrementally added to the memory.

Human beings are not masters in everything, they are good with things they had enough experience. And can update their mental models at any time to pick better solution.

18) Master algorithm

I believe that we will find the master algorithm, that will unlock the key to advanced Artificial Intelligence. The data pre-processing will be different, so-called hyperparameters (elements that are not subject to training) will be different for each case — but the learning algorithm may be the same in the whole Artificial Cortex. Deciding what is more and less relevant, what should be remembered and what forgotten.

Plus what should be remembered as a negative example.

19) Hardware — dedicated processing units

Processing units for general purposes like CPUs are not so efficient as dedicated ones. To achieve efficiency required by complex cognitive architectures we absolutely need much more parallelism.

Because of many layers of abstraction, when you are doing something on your computer it is translated between all those layers like graphical user interface, frameworks and libraries, operating system, programming languages to machine code. And it takes time.

Multiply it by billions of operations each second and you have a full understanding of what is going on.

Programming languages are good for prototyping and experimenting, adapting the solution to own needs. But the crucial part needs to be in the processing unit, just as each CPU has Arithmetic Logic Unit inside. Because computer computes.

We need to implement our key algorithms in hardware. Many companies are already doing that, but are they going in the right direction?

20) Self-assembling vs constructed

We don’t understand the brain fully. It is very complicated. There are contradictory research results about brain structure. We cannot rely only on neuroscience to build our AGI.

But we can understand its high-level functions. What it does and more or less how.

Because the brain is a self-assembling processing unit — it does a lot of things because of its biological and physical needs and limitations.

Many observed actions or attributes may be related to the fact that they are necessary for this structure to work and survive. Not because it is required for high-level intelligence.

Having a correct balance between neuroscience, computer science and mathematics — should allow us to finally build the machines with human-like capabilities.

Summary

With all those technologies around that have developed tremendously throughout our lives, especially in recent years… it is easy to forget that in this field we are still in the STONE AGE, comparing to what we will achieve as humanity in the future.

Our computation era has just begun. And changing the perspective opens the mind.

What can bring us somewhere, usually is not the same that will push us forward so much again.

Evolution usually makes giant leaps forward when someone does something slightly different from the others — expanding the range of the available options.

And I believe that we can make large progress by looking at different ways of making Artificial Intelligence possible.

Maciej Wolski, founder of AGICortex — realistic AGI architecture and PoC, a company creating a dedicated AI chip and visual tools for autonomous, explainable AI. Before that, I spent a few years working on the AI R&D projects.

--

--

Maciej Wolski
The Startup

Futurist. Technologist. Independent AGI researcher. Neuroscience enthusiast. Extreme learning machine.