A Response to Yann LeCun’s Response.

I appreciate the interest and debate around my post, and Yann’s response on facebook. Let me respond to the response.

[I chose to have it here and not on facebook, because, while I have an old an inactive facebook account, I rather not use it. I already spend tons of time on one social network and try to not be dragged into another. Also, here I have better formatting options, and better control over the content over time. ]

Yann referred to my previous clarification post as back-paddling. I do not think this is correct. It elaborated on some points in the original post and changed the tone, but the message itself did not change. Anyways, here are some more back-pedaling clarifications in response to Yann’s response:

I am not against the use of deep learning methods on language tasks.

I mean, come on. I am a co-author on many papers that use deep learning for language. I give a talk called “Doing Stuff with LSTMs”. I recently published a book about neural network methods for NLP. Deep learning methods have been transformative for NLP, I think this part is well established by now.

What I am against is a tendency of the “deep-learning community” to enter into fields (NLP included) in which they have only a very superficial understanding, and make broad and unsubstantiated claims without taking the time to learn a bit about the problem domain. This is not about “not yet establishing a common language”. It is about not taking time and effort to familiarize yourself with the domain in which you are working. Not necessarily with all the previous work, but with basic definitions. With basic evaluation metrics. Claiming “state of the art results on Chinese Poetry Generation” (from the paper’s abstract) is absurd. Saying “we evaluate using a CFG” without even looking at what the CFG represents is beyond sloppy. Using the likelihood assigned by a PCFG as a measure that “captures the grammaticality of a sentence” is just plain wrong (in the sense of being incorrect, not of being immoral).

[and writing that a matrix of 1-hot encoded vectors is visually similar to Braille code and therefore “an inspiration to why our approach could work”, (Zhang and LeCun, 2015, arxiv versions 1 through 4 out of 5) is just silly.


When I say that “you should respect language” I am not saying that you should respect others previous efforts and methodologies (though that could work well for you also), but that you should pay attention to the nuances of the problem you are trying to solve. And at least learn enough so that your evaluations are meaningful.

Some “core deep-learning” researchers had done the switch nicely, and are making very good contributions. Kyunghyun Cho is perhaps the most prominent of these.

Now, to the arxiv part:

I think Yann’s response really missed the point on this one. 
I do not mind posting papers quickly on arxiv. I recognize the obvious benefits of arxiv publishing and fast turnarounds. But one should also acknowledge its shortcomings. In particular, I am concerned about the conflation of science and PR that arxiv facilitates; the rich-get-richer effects and abuse of power; and some of the current arxiv publishing dynamics in the DL community.

It is OK to post early on arxiv. It is NOT OK to misrepresent and over-claim what you did. Sloppy papers with broad titles such as “Adversarial Generation of Natural Language” are harmful. It is exactly the difference between the patent system (which is overall a reasonable idea) and patent trolling (which is a harmful abuse).

It is OK to claim the idea of using the softmax instead of the one-hot outputs in WGANs for discrete sequences. 
It is NOT OK to flag-plant on the idea of applying adversarial training to NLG, as this paper does.

Yann’s argument may be: “but people can read the paper and see what the actual contribution was, and this will correct over time”. The correction over time may be correct, but in the short and medium terms these broad overclaiming papers from famous groups are still very harmful. Most people don’t read the papers in depth but only the title and sometimes the abstract and sometimes the intro. And when the papers come from established groups, people tend to trust the claims without verification. “Serious researchers” might not fall for this, but the general population sure does get mislead. And by the general population I mean people who are not actively working in this exact sub-field. This includes practitioners in industry, colleagues, prospective students, prospective reviewers of papers and grants. In the short time since this paper came out, I already heard, on several occasions, “oh, you are interested in generation? have you tried using GANs? I saw this recent paper in which they get cool results with adversarial learning for NLG”. This will be extremely harmful and annoying for NLG researchers who apply for grants in the coming year (remember, many grants are reviewed by a panel of capable but non-specialized experts), as they will have to either waste precious space and effort in dealing with this paper and with Hu et al and explaining why they are irrelevant, or be dismissed as working on this “already solved problem”, despite the fact that neither the paper in question nor Hu et al actually did very much, and despite the fact that both papers have terrible evaluations.

The fast pace of arxiv can have a very positive effect on the field, but, “with great power comes great responsibility” and we have to be careful not to abuse the power. We can make arxiv publishing even more powerful by acting responsibly and pushing towards a more scientific publication culture, in which we value and encourage proper evaluation and precise representation of results, and discourage (and develop a system for penalizing!) populist narratives, over-claiming and exaggerations.