A research program is not a set of techniques: A brief response to Yoshua Bengio’s December 26 reply to me

Gary Marcus
4 min readDec 27, 2019

--

Dear Yoshua,

Thank you for your speedy response to my Medium post on defining deep learning. Although your reply is strongly worded, you inadvertently confirmed my article’s central point: a distinction between deep-learning-as-open-ended-methodology and what I called core deep learning is desperately needed.

To refresh your memory, I defined core deep learning as follows:

let’s call the central set of techniques that characterized early deep learning (and in fact the great majority of what has been published so far) — multliayer perceptrons, convolutional nets, and so forth — core deep learning.

Most of your reply talks about deep-learning-as-open-ended-methodology, essentially defining deep learning in terms of an ongoing research program, quite apart from the techniques that made that research program well-known. I am glad that you have clarified that this is your preferred sense of the word going forward; that is your undisputed right, and clarity about how you intend the term can only help.

But, crucially, at the end of your essay — the only place in which you consider your NeurIPS “state of deep learning” slide that my essay had called attention to — you briefly set aside the research program meaning, in order to fall back on a different way of talking about things.

There you write

The limitations I pointed out in my NeurIPS talk (and many earlier) refer to a set of commonly used deep learning techniques…” [emphasis added].

The part in bold is pretty much a paraphrase of what I said you would need: a term for the “central set of techniques that characterized early deep learning” that is distinct from the term you use for your research program going forward.

Case closed: deep learning is now getting used in two rather different senses, often without any sort of clarification. One sense is aspirational and methodological, the other (I proposed core deep learning for this) is far more concrete, referring to things that already exist. The former sense that you prefer is becoming increasingly common, but the latter is hardly unique. (For example, when people have said that deep learning has improved results on image labeling, they have typically been referring primarily to a specific set of extant techniques, not a future research program.)

It’s sloppy to mix the two interchangeably — a research program is not a set of techniques — and it can only create confusion, both within the field and in the general public. Clarity here is important.

For example, from a scientific perspective, when people ask whether deep learning makes a good model of the brain, that question is only coherent if it is made with respect to specific models or classes of models. One can’t meaningfully ask, for example, whether an in-progress research program is a good model of the brain; one can only ask whether a particular class of models (eg CNNs) is a good model of the brain. Likewise, some have wondered whether current techniques could suffice for general intelligence; it’s a different question whether future, unknown techniques might. Such questions are quite distinct.

Clarity about future research vs current techniques matters for engineering, too. When executives ask, can deep learning help solve problem X, they mostly mean, “can we use the tools of deep learning as they are currently available to help solve problem X?”, rather than “might we someday be able to use some set of tools that have not yet been invented, that are being pursued to by people following a particular research program?” Academics care about the latter, but engineers aiming to put something into production need to know about the former.

That said, I have no wish whatsoever to hold you or anyone else to the current techniques. That’s the very opposite of what I want. My conclusion, to remind you, was not to “freeze” your existing techniques (as you seem to think) but to urge the field to move beyond them:

The real question, now, is this: what do we have to add the core, to get a system that is genuinely capable of reasoning, language, planning and so forth?

There is no reason at all to freeze anything. Indeed your personal keenness to extend beyond current techniques is exactly what I love about your recent work.

Where a large, visible contingent of the field is still primarily focusing on scaling existing techniques, with minor modification, using bigger networks, bigger databases, and bigger clusters, often in the service of proof of concept demos that work for games but perhaps not in the real world, you are working on inventing a new suite of tools. More power to you — that’s exactly what we need, and all I have ever wanted the field to do.

Best regards,

Gary

--

--

Gary Marcus

CEO & Founder Robust.AI; co-author (with Ernest Davis) Rebooting.AI. Also proud dad, Founder of Geometric Intelligence, acquired by Uber, & Emeritus Prof., NYU.