The Boogeyman Argument that Deep Learning will be Stopped by a Wall
I always am seeking out arguments against my present beliefs (or models of reality). Gary Marcus has a new essay titled “Deep Learning: A Critical Appraisal” where he points out all the many flaws of Deep Learning. Marcus has a vested interest in seeing Deep Learning fail, after all, he wrote a book in 2001, which he still is very proud of, that disparaged the nascent Artificial Neural Network research back then. He writes:
To understand human cognition we need to understand how basic computational components are integrated into more complex devices- such as parsers, language acquisition devices, modules for recognizing objects, and so forth.
Marcus is very motivated to point out the lack of success of neural networks at every opportunity. His latest essay is one in his many attempts to claim higher understanding by criticism.
Nevertheless, let’s explore Marcus’ newest arguments because it may be valuable in pointing out flaws that we may have bias in noticing. Marcus enumerates the following flaws in present day Deep Learning:
Deep learning thus far is data hungry
Deep learning thus far is shallow and has limited capacity for transfer
Deep learning thus far has no natural way to deal with hierarchical structure
Deep learning thus far has struggled with open-ended inference
Deep learning thus far is not sufficiently transparent
Deep learning thus far has not been well integrated with prior knowledge
Deep learning thus far cannot inherently distinguish causation from correlation
Deep learning presumes a largely stable world, in ways that may be problematic
Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted
Deep learning thus far is difficult to engineer with
These are all valid arguments. These arguments apply for not just Deep Learning but any algorithm that gains knowledge from digesting data. This applies to all machine learning algorithms. Just replace the phrase “Deep Learning” with “Machine Learning” and Marcus arguments are equally valid. There is no deep insight here that any researcher in the Deep Learning field is unaware of. These are all known unknowns. What I mean is that we all know the flaws and we are seeking to discover new algorithms to fix these flaws. I’m of course not the only researcher who is aware of the limitations of deep learning described in his essay. Marcus’ essay got some immediate responses in Twitter:
The key question of course is to ask, is Deep Learning flawed enough that it is the wrong approach to move forward? If it is the wrong approach then which other approaches out there are more promising?
To his own credit, Marcus does make an effort to address these two questions.
To avoid being cast as the most known skeptic of deep learning, Marcus points out that Deep Learning is one of many tools that may emerge. Marcus now doesn’t say that Deep Learning is wrong (as he usually does), but takes a more conservative stance that it will be one of useful tools in a toolbox of many other tools. This argument highlights the fundamental flaw of Marcus’ thesis since 2001. Being a cognitive psychologist he observes capabilities found in humans and then deduces that there are all kinds of cognitive machinery that needs to exist for each capability to work. However, he doesn’t have an explanation as to (1) how each kind of machinery works and (2) how these many kinds of machinery coordinate to get anything accomplished.
Where Marcus is entirely wrong is he fails to comprehend that Deep Learning is in fact the stepping stone tool that other cognitive tools will leverage to achieve higher levels of cognition. We’ve already seen this in DeepMind’s AlphaZero playing systems where conventional tree search is used in conjunction with Deep Learning. Deep Learning is the wheel of cognition. Just as the wheel enabled more effective transportation, so will Deep Learning achieve effective artificial intelligence.
We can have wheels made of stone, wheels crafted from wood and wheels with inflatable rubber tires, yet they are all round. There are of course alternatives to wheels for land transportation (i.e skis, hovercrafts, maglevs and hyperloops) but few will have the practicality of the conventional wheel. Deep Learning are an instance of an intuition machine and intuition is the wheel for higher level cognition and not yet another tool. There is no other cognitive mechanism that we are aware of other than intuition than can give us general intelligence (GOFAI has failed us for decades because it assumed that rational cognition was the basis of intelligence). Marcus’ criticisms are analogous to saying wheels made of stone aren’t any good because they are difficult to create, aren’t perfectly round and don’t provide any cushion. Here’s the real problem, the human mind is not an “Algebraic Mind” as the title of his book proclaims. Marcus will just have to get over himself and come to the realization that he’s been wrong since 2001. To build AGI you work first from intuition and then you work up the stack and not the other way around:
All innate cognitive machinery of the human brain are intuition based components. There are no logical components as found in our digital computers. Our rationality comes from learning through experience and it is not some hardwired built in machinery.
We will over time develop more advanced forms of Deep Learning. The learning algorithm will change from one that is meta-learning driven. The simplistic neurons will changes into kinds with multiple thresholds and of more complexity. There’s really no looking back here. The methods of Deep Learning are being established and continue to be refined. Knowledge discovery requires search, and search has two extremes: exploration and exploitation. The solution of the future will of course be an algorithm that understands the best balance between the two.
From my own perspective, the path toward Artificial General Intelligence (AGI) is clear as day. I acknowledge all of the short comings that Marcus points out, however without a doubt the methodology and techniques that are being invented by the Deep Learning research community are slowing chipping away at the problem. To quote the stonecutter credo:
When nothing seems to help, I go and look at a stonecutter hammering away at his rock perhaps a hundred times without as much as a crack showing in it. Yet at the hundred and first blow it will split in two, and I know it was not that blow that did it, but all that had gone before.
The true game here that is being played by Gary Marcus (which he successfully parlayed into an acquisition of his firm Geometric Intelligence by Uber) is in criticizing the dominant AI paradigm of today. By pointing out its flaws, he’s able to convince lesser knowledgeable investors of an alternative and perhaps more profitable path. Investors take great pride in having contrarian investment strategies. Investors, like most humans, would like to believe that their success was based on their own individuality and not just plain luck. In a majority of all investors, it just happens to be the latter.
Politics in science has always been present and its not going to disappear any time soon. We are familiar with the feud between Nicholai Tesla and Thomas Edison. Edison died a wealthy man, in stark contrast to Tesla who died penniless. Yet the scientific contributions of Tesla arguable surpasses Edisons. Yet, Edison is famous today and Tesla is likely only well known because an electric car company is named after him. The Canadian conspirators have successfully parlayed their Deep Learning meme to great effect. Jurgen Schmidhuber was justified to have felt like Tesla in that the world seemed to have overlooked his own contributions.
The game is also played by DeepMind, if you read the AlphaGo Zero paper (the most significant development in AI since DL) you will find that DeepMind never uses the term “Deep Learning”. That is because they intentionally would like to change the narrative. DeepMind discovered something extremely significant that differs from Deep Learning in its original conception. Unfortunately, DeepMind has not figured an appropriate term for the “self-play” they discovered, so they awkwardly call it “Reinforcement Learning”. Yann LeCun is a little bit more savvy with branding, that’s why he came up with “Predictive Learning” to describe a yet to be discovered solution to unsupervised learning. (Note: A recent note by LeCun is he wants to rebrand Deep Learning as Differential Programming)
We all strive not to be just another brick in the wall in the invention of “the last invention of man”. I just finished watching the AlphaGo film on Netflix. It’s amazing that DeepMind had the foresight to ensure that this event would be captured in film. The stars of this movie are of course Demis Hassibis and David Silver. However, just like Stan Lee has a cameo role in every Marvel film, some grainy footage of Sergey Brin had to be spliced in. Eric Schmidt had his own cameo role but it didn’t look contrived. ;-)
Has Deep Learning hit a wall? Very far from it. 2018 as predicted will be a banner year. The notion of a wall that will stop Deep Learning progress is at best a boogeyman argument that is not only imaginary but its main purpose is to mislead. You have the choice to agree with Marcus’ arguments and wait for some unknown that is better, or you can recognize the path to AGI has been clearer than it has ever been and use the methods that have lead to remarkable success in recent years.