Intuition Machines versus Algebraic Minds
I owe at least Gary Marcus a more detailed examination of each of his assertions of the limitations of Deep Learning. The problem with researchers who have not really performed their due diligence in examining Deep Learning is that they truly do not understand the massive developments occurring in this field. In fact, they are performing a disservice by casting Fear, Uncertainty and Doubt (FUD) into a field of study that has the highest potential for achieving AGI. The bigger problem is not the scarcity of funding for other AGI research activities but the outsized public spending China has deployed relative to the US and Europe.
However I digress and I shall switch my focus to examine each of Marcus’ critiques. As I go through each critiques, I will explain to you the path forward using the cognitive paradigm that I have been embarking on (i.e. Intuition Machines).
Any conversation about achieving AGI is incomplete without requiring a mechanism of how an automation interacts with its environment:
Jeff Hawkins has a principle that intuitively makes a lot of sense, yet is something that Deep Learning research has…medium.com
Deep Learning are very good universal approximators (the best we currently have). Of course there’s more to AGI than just learning patterns. I’ve written about that in the article above. However, until one conjures up a better pattern recognizer, then advances in AGI will continue to be driven by Deep Learning. There are many advanced methods that have incorporated these approximators. GANs introduced the idea of competitive learning by combining a recognizer and a generator. RL + DL (see DeepMind Atari Game Play) showed that by employing Q-learning and DL, learning to play by only observing pixels was possible. AlphaZero with MCTS + DL showed that learning more advanced strategies could be bootstrapped from scratch with self-play. None of these newer methods were envisioned when the term “Deep Learning” was invented. However, the primary reason for the success of any of these new methods is the use of Deep Learning as a component. Remove Deep Learning from the equation and you have nothing.
Now its time to pick apart Gary Marcus’ arguments and give you a glimpse of how they’re going to be solved by Deep Learning (or an Intuition Machine).
3.1. Deep learning thus far is data hungry
This is because most experimental setups are meant to begin from scratch without any a priori knowledge about the environment. Humans infants, in contrast, have eons of evolution to take advantage of.
However, if we were to examine the latest developments with AlphaZero then this limitation of requiring a lot of data seems to be removed. AlphaZero learned to play grandmaster level chess without learning from recorded game play. Rather, it did so just by defining only the rules of the game of chess. It learned chess playing strategies (different opening moves, gambits, control of parts of the board, end game play, zugzwang etc.) all in just 4 hours. In other words it learned all that humanity learned in the many centuries that chess has been played in an incomprehensible short time.
Conveniently ignore this development at your own peril.
3.2. Deep learning thus far is shallow and has limited capacity for transfer
Our understanding of transfer learning in Deep Learning is still nascent. We have seen advances in transfer learning when used for similar domains. So for example, one can train faster with networks that are pre-trained. One of the most impressive developments in this area is the work done to generate high resolution images. This builds up a network by progressively learning from smaller networks into bigger and more capable networks.
However, what we are unable to train the network to do is learn the appropriate invariants such that what is learned is only what is important to learn. Deep Learning seems to learn everything and we don’t understand how to disentangle what is unimportant in another domain. So for example, if you change the dimensions of a video game, a RL + DL system trained to play this game will not be able to play the game. The scaling just messes up the learned model.
This is an interesting problem, but it doesn’t appear to be an insurmountable one.
3.3. Deep learning so far has no natural way to deal with hierarchical structure
Fundamentally, Deep Learning builds representations in continuous vector spaces that doesn’t capture the concept of hierarchy. However, work like Capsule Networks, Hyperbolic spaces and Graph Convolutional Networks have mechanisms to capture these hierarchies.
There are thousands of paper in DL that are published every year. I don’t expect Marcus’ busy schedule to have time to keep up with the literature.
3.4. Deep learning thus far has struggled with open-ended inference
New research papers in non-stationary environments with multiple competing and cooperating neural networks are now being published. Open-endedness requires creating strategies with imperfect information. There is also significant research in areas of semi-supervised learning where some information is labeled while most of the information is not. Deep Learning systems have been shown to do well in this area.
One example of an environment with imperfect information is the game of Poker. There have been significant advances in the use of Deep Learning to play in competitive poker and it’s doing very well.
I can boldly say that the state-of-the-art in open-ended inference is ruled by Deep Learning based techniques.
3.5. Deep learning thus far is not sufficiently transparent
This is true, however are we not discussing this in the context of Artificial General Intelligence? Are humans themselves sufficiently transparent?
The inferences humans make with their intuition are just as unexplainable as the inferences that a Deep Learning intuition machine makes. Please read my 2018 predictions on my take on explainable Deep Learning.
3.6. Deep learning thus far has not been well integrated with prior knowledge
This is also unrelated to AGI. If I were to give a caveman a semantic network or Newton’s equations of motion, he wouldn’t be able to incorporate that into his own knowledge base. Humans don’t have a mechanism like that found in the Matrix where you can simply download knowledge.
The way humans incorporate prior knowledge is through a K-12 school curriculum that covers years of teaching.
Nevertheless, let’s ignore the absurdity of the requirement for a moment. In NLP that employs Deep Learning, recognition engines that incorporate semantic networks have shown to have better performance than ones that do not.
3.7. Deep learning thus far cannot inherently distinguish causation from correlation
The average human can’t either. This is not an AGI requirement.
3.8. Deep learning presumes a largely stable world, in ways that may be problematic
One of the biggest unsolved problems of Deep Learning is learning how to forget. Forgetting is important in how internal mental models of the world are created. When we have the kind of higher level intelligence that can perform simulations and experiments of the world and in one’s imagination, you get to situations where you recognize that previous knowledge is incorrect and therefore must be adjusted accordingly.
The malleable assessing what has been learned and what knowledge needs to be changed because of inconsistency in what is observed in the real world is a difficult skill.
Even the current president of the United States lacks this skill set.
3.9. Deep learning thus far works well as an approximation, but its answers often cannot fully be trusted
Trust is an emergent social behavior. (Unless it is encoded in a Blockchain)
How can we trust self-driving cars when underneath the covers they use Deep Learning?
We will trust them because the insurance companies will run the numbers and begin to start charging premium for people who chose to manually drive their own cars.
Read this article about the complexities of human compatible AI.
3.10. Deep learning thus far is difficult to engineer with
Hell yeah! That’s why Deep Learning is sometimes known as “Deep Alchemy”. Not every revolution comes with little effort!
Ideally we would like to have systems with a more biological architecture. I address these issues in an article on “biologically inspired architecture”. If you really want to understand the complexities of building Deep Learning architectures, then perhaps you can start with this article.
Marcus appears to have conflated the requirements for a highly engineered cognitive system with the requirements for AGI. In some cases, the problems are related, but not every time.
However, there’s just one reason why Gary Marcus is wrong about Deep Learning and AGI. General Intelligence as we find in humans are not ‘algebraic minds’, rather they are intuition machines. That makes the difference in how we believe the approach to AGI should proceed. Marcus’ Algebraic Mind approach reminds me of the “Intelligent Design” arguments made by creationists. According to them, there is no probable way that evolution can evolve an eye-ball. However, this is the argument that Deep Learning researchers assert: innate cognitive machinery can be grown from learning methods and not through design.
We will likely discover AGI before we understand how it works. In the history of science, theory rarely ever comes before discovery. What will likely happen is that Deep Learning methods will discover more advanced learning methods that lead us to AGI and it will take a while before we understand these methods. This are not unlike the new game play discovered by AlphaGo or AlphaZero. We see it the first time and believe these moves to be a mistake, only to discover the brilliance after the game is won. However, even if we go back to re-examine the move, our understanding of the reasoning behind it remains limited. This is because insight as a consequence of intuition is always very difficult to explain.