4 Counterpoints for Dr. Gary Marcus

Founder of mysterious AI startup Geometric Intelligence, Gary Marcus, recently wrote an opinion piece for the NY Times on why AI is stuck. He argues that the field is hiding a “dirty little secret” because domain-specific ML models are not built with human level intelligence or Artificial General Intelligence (AGI). His plan to solve this rut is to increase the size of AI research labs by thousands of scientists, to initiate large international collaboration, and to raise billions of dollars of funding.

Contrarian views are well appreciated in AI, given the high level of hype. However, I think his argument is ignorant of recent research and his solution is already underway. So firstly, I would like to address a few of his open jabs at current AI research.

1. AI Cannot Understand A Car Chase

Marcus criticizes current deep learning limits by posing questions that canonical deep networks (CNN/RNNs) are unprepared to answer.

Such systems can neither comprehend what is going on in complex visual scenes (“Who is chasing whom and why?”) nor follow simple instructions (“Read this story and summarize what it means”).

Here I point to a paper on relational reasoning from Deepmind. In the paper, researchers pair a CNN for image processing with an LSTM for addressing questions in what they call a Relational Network (RN). The RN is capable of handling questions such as “There is a tiny rubber thing that is the same colour as the large cylinder; what shape is it?” The RN network achieved superhuman accuracy on the CLEVR dataset. Beating humans with an accuracy of 95.5% to 92.6%.

An example of the images the RN was asked about. This one comes from the CLEVR dataset. CLEVR is described as a “Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning.”

The Relational Network is not designed to answer questions about car chases, but given the right tinkering it may be able to! Deepmind’s paper is evidence that this area is still growing.

This research into relational reasoning, compounded with expanding work in physical/spatial reasoning (also from Deepmind), gives hope that we will see soon see more breakthroughs in this field.

2. AI Cannot Follow Directions

In describing what machine learning models ought to be able to do, Marcus basically describes Siri/Google Assistant/Alexa.

Such systems can neither comprehend what is going on in complex visual scenes (“Who is chasing whom and why?”) nor follow simple instructions (“Read this story and summarize what it means”).

If you were to ask a voice assistant “Who is Tony Robbins”, you would get a fairly decent summary of who he is. His face, height, and wikipedia page all pop-up, and his description is even dictated to you by Google’s speech system.

The query and the summarization are following simple instructions. It’s just that it’s not streamlined into a single “end-to-end” deep learning model.

My problem is not with Marcus’s request to have a system that follows simple instructions, it’s that his specifications are not specific enough. To many readers of his article, it would seem that AI is further behind that it actually is.

Granted, some may see this example as Google embedding “cheap tricks” to make it seem like this whole process is fancy AI. But there’s a blurred line between what qualifies as “artificial intelligence” and “cheap tricks.” After all, spam detectors used to be state-of-the-art AI, and now they’re just mundane systems of cheap tricks. However, this example is composed of AI deep learning systems: a voice recognition system, a search engine and filtering system, as well as a parser to read his Wikipedia summary.

But according to Marcus’ specifications, current AI systems can follow simple directions.

3. AI Cannot Invent Ideas Without Trial And Error

Marcus’s daughter however, can.

Not long ago, for example, while sitting with me in a cafe, my 3-year-old daughter spontaneously realized that she could climb out of her chair in a new way: backward, by sliding through the gap between the back and the seat of the chair. My daughter had never seen anyone else disembark in quite this way; she invented it on her own — and without the benefit of trial and error, or the need for terabytes of labeled data.

My problem here is that Marcus is glorifying the capabilities of the human mind and downplaying advances in one-shot learning.

To address the glorification of the human mind: the human mind always has some trial and error before insight. I’d be willing to bet that Marcus’s daughter has been in a chair before. I’ll go further and assume that she has tried to get out of her chair before. Something I have witnessed my own 2 year-old cousin do incessantly. Therefore, Marcus’ daughter must have conducted experiments via trial and error before she miraculously left her chair. However, I totally agree that she probably did not need terabytes of data to make her discovery.

To address recent advances in this area: we have models that only need a few training examples to learn novel movements. UC Berkeley’s Artificial Intelligence Research lab (BAIR) has led efforts on this. Their work has produced models that can be optimized for a wide range of simulated movement tasks within a few training steps. Their blog post is pretty cool, and their paper advances more than just reinforcement learning. What this means is that we are getting closer to building models that learn as fast as Marcus’s daughter. It may not sound like much, but it’s a breakthrough that challenges the current paradigm of how machine learning models should be trained.

Marcus is right. Current AI systems cannot learn without trial and error. However, neither can humans. But(!), we may soon have systems that can learn novel techniques just as fast as us.

4. AI Systems Don’t Have General Intelligence(?)

I just don’t understand what his point is here. In this part of the article, Marcus compares his daughter to a deep-learning system.

If my daughter sees her reflection in a bowl of water, she knows the image is illusory; she knows she is not actually in the bowl. To a deep-learning system, though, there is no difference between the reflection and the real thing, because the system lacks a theory of the world and how it works.

Here, specifics on what “deep-learning system” he is referring to would be helpful. If we are talking about an image recognition system or an image captioning system, then Marcus’ point is nonsensical. A) Deep learning systems don’t have a reflection and B) these models don’t interact with the real world.

If we are talking about a deep learning system embedded in some autonomous vehicle (car, robot, drone) being confused by reflection, that’s much more reasonable. Yes, a self-driving car mistaking a reflection of another car for the real thing could lead to erratic and possibly dangerous behavior. However, these systems often have other sensing equipment (e.g. lidar) to detect nearby targets to avoid collision. Or in this case, realize that there reflections are just that.

But do AI systems have a theory of the world and their place in it? Yes, because autonomous vehicles, by necessity, know where they are, where they need to go, and how to get there. But do all image captioning systems have such a model? No, because that would probably be overkill for an image captioning system.

A Rebuttal To My Arguments, And A Nod To Marcus

Deep learning still has many challenges to overcome before fully fledging out even the most basic features of its applications. Images with noise in them can completely throw off image recognition systems; some images can even be manipulated with noise to be classified as a specific category other than what the image actually is. (See this paper by Ian Goodfellow @ Google Brain or research more as “targeted adversarial attacks.”)

Marcus points this out:

Some of the best image-recognition systems, for example, can successfully distinguish dog breeds, yet remain capable of major blunders, like mistaking a simple pattern of yellow and black stripes for a school bus.

But I don’t think this means that research is stuck when it comes to approaching this problem. In fact, the current research being done on targeted adversarial attacks implies that we are expanding our understanding of how these flaws arise. And in this process, I’m sure defenses against these flaws will also advance.

Marcus’s Solution for Artificial General Intelligence

Is not new. It’s simply to add more people and add more money. We are already seeing billions of dollars being thrown at AI startups and millions in funding to AI labs across the world (see Canada’s initiative).

And, we have some extremely experienced and innovative researchers working on these problems every day. As well as a flood of amateurs (like myself) for simple ways to solve these problems. Just look at Mikel Bober-Irizar for an example of how fast this field is moving. At 16 years old, he has already published a paper helping to advance applications of deep learning systems in video captioning.


While I think Marcus is asking for too much generality in current machine learning systems, I think his general direction is worth understanding. Yes, we don’t have foolproof image recognition systems. Yes, we don’t have machine learning models that streamline natural language processing with being able to perfectly summarize articles. Yes, we still need GPUs and a crap ton of data to train good models and meta models (see BAIR’s paper mentioned above).

But this field is still fast moving, and I believe in AI researchers. Thus, I believe that current state-of-the-art AI will continue to advance and meet most of Marcus’ criticisms. Except the reflection vs. reality criticism, I don’t understand that. But, I still have a lot to learn, and my understanding of the limits of this field may be quite incorrect. Time will tell.

Until then, I hope that the audience of the NY Times article will continue to practice healthy skepticism not only of AI optimists, but also of AI contrarians.

UPDATE: Marcus’s Response

Dr. Gary Marcus read my article and personally gave me his feedback via Twitter. Specifically, he had 2 counter points, a criticism of a paper, and a recommendation for anyone interested in further reading.

Counter point 1 — Marcus’s solution for AGI includes top-down processing

In my response to Dr. Marcus, I neglected to address his comments about the future of artificial general intelligence. He believes that more powerful AI models will need to built to focus on reasoning and conceptual relationships, something that current deep learning techniques do not do. Machine learning models are widely criticized for learning how to solve problems without actually “understanding” the problem they are approaching. Understanding a problem usually entails the ability to communicate the relationships between concepts while solving a problem. Such a paradigm is usually referred to as “bottom-up” processing.

For more technical details on why deep learning systems do not have top-down processing, see another AI expert’s explanation of what neural networks actually do.

Counter point 2 — AI summarization is still years away

I have to concede this point. Marcus is right that my example of Google “summarizing” information about Tony Robbins is a bit of a stretch. Wikipedia has extremely well structured data curated and edited by humans, so that example is not true summarization. What Marcus is looking for is a system that can summarize open-ended, unstructured textual information without human intervention. We may still be far away from this. If you’re interested in research on this topic feel free to google “Natural Language Processing.”

A criticism of Relational Networks

Marcus points out that Deepmind’s system relies heavily on lots of data in order to build relationships between a limited number of objects. Comparatively, humans can understand relationships between an arbitrary number of objects using very little data. In laymen’s terms, deep learning systems are not as good at generalizing relationships as humans. Humans perceive chases through cars, politics, and even in pursuit of loved ones; breakthrough AI systems are still learning how to size different colored balls. Although AI is far from solving this problem, I think that there is still a lot of room for growth.

Lastly, a recommendation

For anyone interested in further reading on the topic of AI from a cognitive science point of view, read “The Algebraic Mind” by Dr. Gary Marcus. It approaches understanding the brain in a computational way that has now become the standard in Cognitive Science.

I look forward to reading this book and more from Dr. Marcus in the future! Thanks for all of your comments and feedback to those of you who responded to this article.

Thanks to Dr. Gary Marcus for his feedback and Mikel Bober-Irizar for his support!

That’s all folks! If you enjoyed, feel free to ❤ this article. If you have comments, message me on twitter @ngundotra. Feel free to respond and share.