Thoughts on Gary Marcus’ Critique of Deep Learning

Photo by Clever Visuals on Unsplash

Gary Marcus has recently published a detailed, rather extensive critique of Deep Learning. While many of Dr. Marcus’s points are well-known among those deeply familiar with the field and have been somewhat well-publicized for years, these discussions haven’t yet reached many who are newly involved in decision-making in this space. Overall, the discussion the critique has generated seems clarifying and useful.

I have decided to write up my thoughts because, while I think Dr. Marcus’ critique is thoughtful, necessary and often justified, I disagree with some of the conclusions.

To start, Dr. Marcus’ assessment that Deep Learning, as originally defined, is merely a statistical technique for classifying patterns is spot on in my opinion. I also concur with his assessment that Deep Learning techniques will be a stepping stone towards and perhaps a component of future Artificial Intelligence systems, but, contrary to popular expectation, will not lead directly to Artificial Intelligence per se.

Why is that?

I believe it’s important to talk about this rather fundamental issue, because it doesn’t seem to be exposed well, neither in Dr. Marcus’ paper nor in much of the public discussion about Artificial Intelligence:

One cannot talk productively about Artificial Intelligence without a clear definition of intelligence. The Oxford English Dictionary defines intelligence as “the ability to acquire and apply knowledge and skills”.

The definition does not include who or what does said acquisition and application, but it’s somewhat obvious that there is some kind of “agent”, and that agent is capable of (skillful) action in addition to acquiring knowledge. One could perhaps furthermore argue that all acquisition of knowledge is in service of the agent’s ability to decide how to act sometime in the future.

So it seems that action is a fundamental concept in this context. Let’s look into this a bit deeper.

What does a (perhaps artificially) intelligent agent do, and why does it do what it does? It seems to me that the field has been shying away from this question, perhaps because answering it clearly is difficult.

I believe the correct answer is both profound and seemingly trivial.

Autonomous intelligent agents act to survive. Or in other words, they act to avoid dispersion by the forces of their environment that either, by way of natural laws, point towards increasing entropy, or even actively conspire against them in a competition for scarce resources (e.g. predators). Furthermore, one could argue that the very reason for the existence of intelligence is the need of complex agents to survive in a hostile world, i.e. to maintain their form against the odds. To go even further, from another angle one could claim that any (intelligent) action is ultimately in service of survival of the agent.

There are very interesting points to be made about how an agent knows how to act in order to survive, and Karl Friston makes these throughout his recent work on Active Inference, by way of physics and information theory. I won’t go into any detail here, and just reiterate that action is a fundamental and required concept for intelligence. It is how an agent interacts with the world. It’s how it bends the odds in its favor, towards survival.

This leads us back to the point I’m trying to make.

Deep Learning, per se, does not have a concept of action. It doesn’t have to. It is merely, as Dr. Marcus points out, a technique for building pattern recognizers that is sometimes used as a component in Reinforcement Learning, a field of study that does include agents and action and is therefore inherently conceptually closer to Artificial Intelligence than Deep Learning.

[Please note that I’m using the term “Reinforcement Learning” because it’s the most well-known and accepted term for the type of learning that goes on in autonomous agents. However, I believe that Reinforcement Learning, as strictly defined, has fundamental issues that will need to be overcome on the path towards Artificial Intelligence. For one, the requirement of reward/value functions seems to be causing rather difficult complications that are avoidable by framing the problem differently.]

I think there is an opportunity to write a similar critique of the state of the art in Reinforcement Learning. It would be a better basis for reasoning about the path towards Artificial Intelligence and whether we are on the right track.

We could have a great discussion about e.g. what survival could mean in context of artificially intelligent agents, and how different techniques in Reinforcement Learning do or do not capture the nature of it properly.

But not all the points Dr. Marcus makes in his current critique would apply, because one can do Reinforcement Learning just fine without Deep Learning.

With that, let’s look at Dr. Marcus’ “ten challenges”:

3.1. Deep learning thus far is data hungry
I was expecting a point about Deep Learning requiring large amounts of labeled data to train models because the learning process is so inefficient in its use of information, and that a more principled approach could be more efficient. 
However, Dr. Marcus took it in a different direction, first touching on human infants’ ability to generalize, then towards an argument that models should be able to learn directly from high-level concepts expressed in language. 
There’s no doubt that language is essential for true human-level intelligence. In fact, one could make an argument that language facilitates the abstraction and generalization that is required for learning and, ultimately, problem solving at human level. 
However, autonomous agents without sophisticated language (e.g. cats) can act quite intelligently shortly after birth as well. We would be quite happy about being able to build such an agent. Therefore, I believe that it’s a fair point, but I don’t agree with the conclusion.
3.2. Deep learning thus far is shallow and has limited capacity for transfer
I guess this is a good clarification of terminology for people who had incorrect assumptions about what “deep” means in the term “Deep Learning”. The rest of the section points out Deep Learning’s challenges with model overfitting, leading to insufficient generalization. Fair point. There’s some overlap with the next argument about hierarchical structure as some of the generalization could be implemented through it.
3.3. Deep learning so far has no natural way to deal with hierarchical structure
A valid point. This especially resonates when thinking about about representation and execution of (complex) action, which seems to be naturally hierarchical. I suspect there is progress to be made around framing action differently, e.g. as a result of top-down, hierarchical inference of the model (in particular, Karl Friston’s Active Inference which seems to solve or even dispense with a number of difficult problems of current techniques). 
3.4. Deep learning thus far has struggled with open-ended inference
From the title, I expected the author to make a (IMHO valid) point about Deep Learning’s rather fundamental inability to do continuous, iterative training. But he just appears to rehash his earlier points about Deep Learning not having an inherent concept of language connected to high-level representations of the model, and therefore inability to make complex logical inferences based on abstract rules expressed through language. As a rather fascinating side note on this point, it turns out that even humans of normal intelligence, but who don’t have language can’t make these inferences either. So the conclusion doesn’t resonate.
3.5. Deep learning thus far is not sufficiently transparent
A fair point. I guess in the long term, this will come down to optimizing the models to find a balance between model accuracy and complexity (leading to simpler, optimal models), plus teaching the models high-level representations as a joint distribution with the corresponding terms in some kind of descriptive language so they can explain their decisions as they go.
3.6. Deep learning thus far has not been well integrated with prior knowledge
Cats have arguably no explicit knowledge of Newton’s laws, yet apply them expertly to solve difficult kinetic problems just based on playful experience. On the other hand, perfectly intelligent adult humans often have trouble making fairly basic logical inferences based on language. I don’t understand why this should be required. In my opinion, artificially intelligent agents should, like biological intelligent agents, be able to learn about the world from the ground up from experience, perhaps optionally acquiring contextualization with prior knowledge as they go.
3.7. Deep learning thus far cannot inherently distinguish causation from correlation
It is not at all clear that agents have to be able to do that in order to be considered “intelligent”. Humans often can’t do it (superstition). It appears that the ability to form hypotheses about plausible causation, based on sequences of observations, is very often good enough.
3.8. Deep learning presumes a largely stable world, in ways that may be problematic
A great point, but not to the extent Dr. Marcus seems to be making it. Humans have trouble overriding previously learned, strong beliefs, too. But it’s true, vanilla Deep Learning has no mechanism to override learnings or re-learn selectively. There’s a lot of promising work around model structure learning and structure optimization that may address this issue in the future.
3.9. Deep learning thus far works well as an approximation, but its answers often cannot fully be trusted
As are humans’ answers. The net error rate does not seem to be the problem here. Being able to construct adversarial examples due to overfitting models is a real problem though and points indeed towards fundamental problems with overfitting, so overall a fair point.
3.10. Deep learning thus far is difficult to engineer with
Again, a fair point. Deep Learning tools have evolved a lot, and the amount of work required to train and use rather straightforward models isn’t daunting anymore. But there are indeed inherent challenges with debuggability.

Dr. Marcus doesn’t touch on a bunch of other, in my opinion important issues of Deep Learning that makes it challenging to use even as a component of Reinforcement Learning systems:

  • No first-class representation of uncertainty, and therefore no explicit processing thereof. However, representation of uncertainty is obviously crucially important for building agents that need to decide between actions.
  • No concept of a “belief” or a “hypothesis”. If one accepts that agents make inferences about their environment, one needs to have these concepts covered.
  • No clear measure of model complexity, no clear process of model optimization, and therefore difficulty to directly and efficiently address the accuracy/complexity tradeoff.

I do agree with Dr. Marcus about the dangers of overhyping the state of the art, and the necessity to develop techniques for robust unsupervised learning, in particular in the context of Reinforcement Learning.

Also agree, to my earlier point, that models that can act upon the environment are essential. These models should probably be constructed bottom up, from basic movements and kinetics on up. In my opinion, a lot of human knowledge can be mapped onto concepts of location and kinetics (derivative(s) of location), and those can ultimately be mapped onto proprioception.

Model structure learning and model optimization is IMHO a required component here.

I hope these thoughts were useful. Comments are welcome and appreciated.