2019 in Review: 10 AI Papers That Made an Impact

Published in
10 min readDec 31, 2019


The volume of peer-reviewed AI research papers has grown by more than 300 percent over the past three decades (Stanford AI Index 2019), and the top AI conferences in 2019 saw a deluge of paper. CVPR submissions spiked to 5,165, a 56 percent increase over 2018; ICLR received 1,591 main conference paper submissions, up 60 percent over last year; ACL reported a record-breaking 2,906 submissions, almost doubling last year’s 1,544; and ICCV 2019 received 4,303 submissions, more than twice the 2017 total.

As part of our year-end series, Synced spotlights 10 artificial intelligence papers that garnered extraordinary attention and accolades in 2019.

AAAI Outstanding Paper:

How to Combine Tree-Search Methods in Reinforcement Learning

Authors: Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Institutions: Technion, INRIA-Villers les Nancy

Abstract: Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero). Referring to the planning problem as tree search, a reasonable practice in these implementations is to back up the value only at the leaves while the information obtained at the root is not leveraged other than for updating the policy. Here, we question the potency of this approach.Namely, the latter procedure is non-contractive in general, and its convergence is not guaranteed. Our proposed enhancement is straightforward and simple: use the return from the optimal tree path to back up the values at the descendants of the root. This leads to a \gamma^h-contracting procedure, where \gamma is the discount factor and h is the tree depth. To establish our results, we first introduce a notion called multiple-step greedy consistency. We then provide convergence rates for two algorithmic instantiations of the above enhancement in the presence of noise injected to both the tree search stage and value estimation stage.

ICLR Best Papers:

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Authors: Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville
Institutions: Montreal Institute for Learning Algorithms (MILA), Microsoft Research

Abstract:Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.

ICML 2019 Best Paper Awards:

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Authors: Jonathan Frankle, Michael Carbin
Institution: Massachusetts Institute of Technology

Abstract: Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the “lottery ticket hypothesis:” dense, randomly-initialized, feed-forward networks contain subnetworks (“winning tickets”) that — when trained in isolation — reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10–20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

Authors: Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem
Institutions: ETH Zurich, Max Planck Institute for Intelligent Systems, Google Research

Abstract: The key idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than 12 000 models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on seven different data sets. We observe that while the different methods successfully enforce properties “encouraged” by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.

Rates of Convergence for Sparse Variational Gaussian Process Regression

Authors: David R. Burt, Carl Edward Rasmussen, Mark van der Wilk
Institutions: University of Cambridge, PROWLER.io.

Abstract: Excellent variational approximations to Gaussian process posteriors have been developed which avoid the O (N3) scaling with dataset size N. They reduce the computational cost to O (NM2) , with M<=N the number of inducing variables, which summarise the process. While the computational cost seems to be linear in N, the true complexity of the algorithm depends on how M must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing M more slowly than N. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, M = O(logD N) suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase M in continual learning scenario.

CVPR Best Paper Award:

A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruction

Authors: Shumian Xin, Sotiris Nousias, Kiriakos N. Kutulakos, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan, Ioannis Gkioulekas
Institutions: Carnegie Mellon University, University of Toronto, University College London

Abstract: We present a novel theory of Fermat paths of light between a known visible scene and an unknown object not in the line of sight of a transient camera. These light paths either obey specular reflection or are reflected by the object’s boundary, and hence encode the shape of the hidden object. We prove that Fermat paths correspond to discontinuities in the transient measurements. We then derive a novel constraint that relates the spatial derivatives of the path lengths at these discontinuities to the surface normal. Based on this theory, we present an algorithm, called Fermat Flow, to estimate the shape of the non-line-of-sight object. Our method allows, for the first time, accurate shape recovery of complex objects, ranging from diffuse to specular, that are hidden around the corner as well as hidden behind a diffuser. Finally, our approach is agnostic to the particular technology used for transient imaging. As such, we demonstrate mm-scale shape recovery from pico-second scale transients using a SPAD and ultrafast laser, as well as micron-scale reconstruction from femto-second scale transients using interferometry. We believe our work is a significant advance over the state-of-the-art in non-line-of-sight imaging.

ICCV Best Paper Award (Marr Prize):

SinGAN: Learning a Generative Model from a Single Natural Image

Authors: Tamar Rott Shaham, Tali Dekel, Tomer Michaeli
Institutions: Technion, Google Research

Abstract: We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.

CoRL Best Paper Award:

A Divergence Minimization Perspective on Imitation Learning Methods

Authors: Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shixiang Gu
Institutions: University of Toronto, Vector Institute

Abstract: In many settings, it is desirable to learn decision-making and con- trol policies through learning or bootstrapping from expert demonstrations. The most common approaches under this Imitation Learning (IL) framework are Be- havioural Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent meth- ods for IRL have demonstrated the capacity to learn effective policies with ac- cess to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, due to multiple factors of variation, directly comparing these methods does not provide adequate intuition for understanding this differ- ence in performance. In this work, we present a unified probabilistic perspective on IL algorithms based on divergence minimization. We present f-MAX, an f- divergence generalization of AIRL, a state-of-the-art IRL method. f-MAX enables us to relate prior IRL methods such as GAIL and AIRL, and un- derstand their algorithmic properties. Through the lens of divergence minimiza- tion we tease apart the differences between BC and successful IRL approaches, and empirically evaluate these nuances on simulated high-dimensional continu- ous control domains. Our findings conclusively identify that IRL’s state-marginal matching objective contributes most to its superior performance. Lastly, we ap- ply our new understanding of IL method to the problem of state-marginal match- ing, where we demonstrate that in simulated arm pushing environments we can teach agents a diverse range of behaviours using simply hand-specified state dis- tributions and no reward functions or expert demonstrations.

ACL 2019 Best Long Paper:

Bridging the Gap between Training and Inference for Neural Machine Translation

Authors: Wen Zhang, Yang Feng, Fandong Meng, Di You, Qun Liu
Institutions: Chinese Academy of Sciences, Tencent, Worcester Polytechnic Institute, Huawei Noah’s Ark Lab

Abstract: Neural Machine Translation (NMT) generates target words sequentially in the way of predicting the next word conditioned on the context words. At training time, it predicts with the ground truth words as context while at inference it has to generate the entire sequence from scratch. This discrepancy of the fed context leads to error accumulation among the way. Furthermore, word-level training requires strict matching between the generated sequence and the ground truth sequence which leads to overcorrection over different but reasonable translations. In this paper, we address these issues by sampling context words not only from the ground truth sequence but also from the predicted sequence by the model during training, where the predicted sequence is selected with a sentence-level optimum. Experiment results on Chinese→English and WMT’14 English→German translation tasks demonstrate that our approach can achieve significant improvements on multiple datasets.

NeurIPS Outstanding Paper Award:

Distribution-Independent PAC Learning of Halfspaces with Massart Noise

Authors: Ilias Diakonikolas, Themis Gouleakis, Christos Tzamos
Institutions: University of Wisconsin-Madison, Max Planck Institute for Informatics

Abstract:We study the problem of distribution-independent PAC learning of halfspaces in the presence of Massart noise. Specifically, we are given a set of labeled examples (x, y) drawn from a distribution D on R d+1 such that the marginal distribution on the unlabeled points x is arbitrary and the labels y are generated by an unknown halfspace corrupted with Massart noise at noise rate η < 1/2. The goal is to find a hypothesis h that minimizes the misclassification error Pr(x,y)∼D [h(x) 6= y].
We give a poly (d, 1/) time algorithm for this problem with misclassification error η + . We also provide evidence that improving on the error guarantee of our algorithm might be computationally hard. Prior to our work, no efficient weak (distribution-independent) learner was known in this model, even for the class of disjunctions. The existence of such an algorithm for halfspaces (or even disjunctions) has been posed as an open question in various works, starting with Sloan (1988), Cohen (1997), and was most recently highlighted in Avrim Blum’s FOCS 2003 tutorial.

Journalist: Yuan Yuan | Editor: Michael Sarazen

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.




AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global