Ten Deserving Deep Learning Papers that were Rejected at ICLR 2017

Carlos E. Perez
Intuition Machine
Published in
6 min readFeb 8, 2017
Credit: https://unsplash.com/search/reject?photo=tnxRFtXI9dI

I first wrote about the deluge of papers that were submitted to ICLR 2017. The paper I described “Rethinking Generalization”. Very curious to note that the papers described in my posts just happened to mysteriously jump to the top of the list: http://prlz77.github.io/iclr2017-stats/ ¯\_(ツ)_/¯.

Now that the list has be culled down to a few “deserving” submissions, I will take this opportunity to highlight the some excellent papers that for one reason or another did not make the cut. There’s a lot of subjectivity that goes on in judging papers and a lot of times it is dependent on the present world view of its reviewers. In the cut-throat research environment, not every research can make the cut. That’s just the unfortunate reality. However, I would like to take this opportunity to mention some good papers that deserve to make the cut.

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks by Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher. One of the really novel papers out there that show how to incrementally grow a neural network. Absolutely shocking that this paper was rejected. The reason why this paper is important is that it shows how a network can grow through the use of transfer learning and domain adaptation. There are not many papers that explore this area.

Hierarchical Memory Networks by Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald Tesauro, Yoshua Bengio. Another NLP paper, very surprising that this gets rejected consider the all-star author list. This is one of the first papers out there that explores the notion of a hierarchy of memory. Most memory augmented networks tend to have flat memory structures. The paper should not have been dismissed so lightly.

RL²: Fast Reinforcement Learning via Slow Reinforcement Learning by Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel. The reviewers must be smoking something to not be convinced that this is ground breaking research! I guess the were not impressed with the “RL²” naming convention. Anything about meta-learning should be selling like hotcakes, yet this paper, despite having prominent authors, gets slammed. Unimaginable!

Demystifying ResNet by Sihan Li, Jiantao Jiao, Yanjun Han, Tsachy Weissman. I liked this paper because it gave some insightful rules of thumb on how to make use of residual or skip connections. 2016’s hottest innovation, and some folks make an attempt at deconstructing the technique, yet get slammed for their efforts. One complaint was that simplified models were used in this study. This complaint of course is absurd, of course you want to use simplified models to characterize an otherwise complex system.

A Neural Knowledge Language Model by Sungjin Ahn, Heeyoul Choi, Tanel Parnamaa, Yoshua Bengio. Yet another NLP paper that gets rejected. Fusing knowledge bases with Deep Learning should be a very big thing, yet this paper gets dismissed for lack of novelty. The main complaint was the writing style, which is just unfortunate.

Knowledge Adaptation: Teaching to Adapt by Sebastian Ruder, Parsa Ghaffari, John G. Breslin. I did not notice this paper until I made a second pass through the rejection list. I’m a bit biased here in that I’m always seeking out work on Domain Adaptation and Transfer Learning. This paper has some very nice ideas. Unfortunately, it doesn’t seem to make the cut of the esteemed reviewers.

Tensorial Mixture Models by Or Sharir, Ronen Tamari, Nadav Cohen, Amnon Shashua. I’m a big fan of this paper, see “Holographic Models”. Unfortunately, the skepticism of the reviewers were too high to overcome.

On the Expressive Power of Deep Neural Networks by Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein. Research is really going to the dogs if fundamental theoretical and experimental papers like this get tossed aside in favor of the usual alchemy that stands in for “Deep Learning research”.

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond by Levent Sagun, Leon Bottou, Yann LeCun. Wow, an egg-in-the-face for these prominent authors. I guess fundamental experimental data just isn’t sexy enough to make the cut. The comment was, “interesting empirical data, but no theory”. Talk about being completely unrealistic expectations.

An Empirical Analysis of Deep Network Loss Surfaces by Daniel Jiwoong Im, Michael Tao, Kristin Branson. This is a shocker in that I mentioned it in my Rethinking Generalization post about additional evidence about the SGD being an implicit regularization. What I had failed to do in that post was place a hyper-link to the paper. But it is indeed surprising that researchers who have dug up some very impressive data are left with nothing but the humiliation of a rejected paper.

Research that boldly tries to improve our understanding and experience should not be penalized because of writing style or not having exhaustively enough data. At the bleeding edge, having the kinds of data and experiments are more difficult to come by. I see one of the problems of research that’s novel and innovative is that it’s unfamiliar to the reviewers. This unfamiliarity demands the paper live up to higher standards, and unfortunately due to the newness of the approach, the authors are unable to deliver in a timely manner.

There’s just too little credit given to research work that perform experimental studies on the nature of Deep Learning. In these situations, it comes with the territory that simplified models are used so as to be able to perform tractable analysis. However, one should not always expect a strong theoretical result, rather the experimental results in themselves are valuable enough in that they give a characterization of how these machines behave. In the absence of this kind of research, we will be mostly in the dark with our alchemy.

In the end though, there’s a lot of subjectivity involved in determining the merits of a research paper. It is too easy to forget that there was a time in the recent past that papers on Convolution Networks were very commonly rejected in Computer Vision conferences. See Yann LeCun (in 2012):

This followed a long string of rejected paper about convolutional nets. Given this negative bias, I decided to no longer send convnet-based papers to CVPR, because it’s was a complete waste of time, energy, and good will.

One major concern is that the current research environment is going to get a lot worse for Deep Learning researchers. The field is moving too rapidly and it is very easy to find reviewers who have a world view that isn’t current with the latest research. So you really end up with reviews that criticize style versus substance. The number of good papers that have been rejected is a reflection of thisknowledge (or perhaps cultural) gap.

P.S. LipNet: End-to-End Sentence-level Lipreading is one other paper that had received a lot of heat for its rejection. I’m unfortunately not in a position to jump into that fracas considering it’s not a domain that I’m familiar enough with.

Update: Andrej Karpathy has run some analysis based on statistics gathered from his Arxiv-Sanity site: “ICLR 2017 vs arxiv-sanity”. I checked his numbers and it is appears that Arxiv-sanity popular numbers are skewed towards implementation ideas (crazy ideas are more popular) and not fundamental research. The last 7 papers, don’t reach the Arxiv-sanity popularity cutoffs.


More unique ideas here: https://gumroad.com/products/WRbUs