8 Petuum Papers Accepted to NIPS 2018

We are happy to share the news that eight papers with Petuum authors have been accepted to NIPS 2018, the 32nd Conference on Neural Information Processing Systems, and will be presented at the event in December.

NIPS is one of the top events in the artificial intelligence field and almost 5,000 papers were submitted this year — an unprecedented amount. The event’s acceptance rate was around 20% (in line with last year) and a total of 1011 papers were accepted. Petuum is among the top eight corporate research institutions with the most accepted papers, along with Google Research, Microsoft, Deepmind, Facebook, and IBM Research. Additionally, Petuum CEO and founder Eric Xing is among the top four authors with the most accepted papers.

We are very proud of our researchers and the work they are doing to push the artificial intelligence and machine learning fields forward in pursuit of our goal to industrialize artificial intelligence. Over the course of the next several months, we plan to publish blog posts describing the details of each paper. For now, though, here’s an overview of the pieces that NIPS has accepted.

Symbolic Graph Reasoning Meets Convolutions

Authors: Xiaodan Liang, Zhiting Hu, Eric Xing

Beyond local convolution networks, we explore how to harness various external human knowledge for endowing the networks with the capability of semantic global reasoning. Rather than using separate graphical models (e.g. CRF) or constraints for modeling broader dependencies, we propose a new Symbolic Graph Reasoning (SGR) layer, which performs reasoning over a group of symbolic nodes whose outputs explicitly represent different properties of each semantic in a prior knowledge graph. To cooperate with local convolutions, each SGR is composed of three modules: a) a primal local-to-semantic voting module where the features of all symbolic nodes are generated by voting from local representations; b) a graph reasoning module that propagates information over a knowledge graph to achieve global semantic coherence; and c) a dual semantic-to-local mapping module that learns new associations of the evolved symbolic nodes with local representations, and accordingly enhances local features. The SGR layer can be injected between any convolution layers and instantiated with distinct prior graphs. Extensive experiments show incorporating SGR significantly improves plain ConvNets on three semantic segmentation tasks and one image classification task. More analyses show the SGR layer learns shared symbolic representations for domains/datasets with a different label set given a universal knowledge graph, demonstrating its superior generalization capability.

Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation

Authors: Christy Y. Li, Xiaodan Liang, Zhiting Hu, and Eric Xing

Generating long and coherent reports to describe medical images poses challenges to bridging visual patterns with informative human linguistic descriptions. We propose a novel Hybrid Retrieval-Generation Reinforced Agent (HRGR-Agent) which reconciles traditional retrieval-based approaches populated with human prior knowledge, with modern learning-based approaches to achieve structured, robust, and diverse report generation. HRGR-Agent employs a hierarchical decision-making procedure. For each sentence, a high-level retrieval policy module chooses to either retrieve a template sentence from an off-the-shelf template database or invoke a low-level generation module to generate a new sentence. HRGR-Agent is updated via reinforcement learning, guided by sentence-level and word-level rewards. Experiments show that our approach achieves the state-of-the-art results on two medical report datasets, generating well-balanced structured sentences with robust coverage of heterogeneous medical report contents. In addition, our model achieves the highest detection accuracy of medical terminologies and improved human evaluation performance.

Deep Generative Models with Learnable Knowledge Constraints

Authors: Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui Qin, Haoye Dong, and Eric Xing

The broad set of deep generative models (DGMs) have achieved remarkable advances. However, it is often difficult to incorporate rich structured domain knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a principled framework to impose structured constraints on probabilistic models but has limited applicability to the diverse DGMs that can lack a Bayesian formulation or even explicit density evaluation. PR also requires constraints to be fully specified {\it a priori}, which is impractical or suboptimal for complex knowledge with learnable uncertain parts. In this paper, we establish mathematical correspondence between PR and reinforcement learning (RL), and, based on the connection, expand PR to learn constraints as the extrinsic reward in RL. The resulting algorithm is model-agnostic to apply to any DGMs and is flexible to adapt arbitrary constraints with the model jointly. Experiments on human image generation and templated sentence generation show models with learned knowledge constraints by our algorithm greatly improve over base generative models.

Unsupervised Text Style Transfer using Language Models as Discriminators

Authors: Zichao Yang, Zhiting Hu, Chris Dyer, Eric Xing, and Taylor Berg-Kirkpatrick

Binary classifiers are often employed as discriminators in GAN-based unsupervised style transfer systems to ensure that transferred sentences are similar to sentences in the target domain. One difficulty with this approach is that the error signal provided by the discriminator can be unstable and is sometimes insufficient to train the generator to produce fluent language. In this paper, we propose a new technique that uses a target domain language model as the discriminator, providing richer and more stable token-level feedback during the learning process. We train the generator to minimize the negative log likelihood (NLL) of generated sentences evaluated by the language model. By using a continuous approximation of discrete sampling under the generator, our model can be trained using back-propagation in an end-to-end fashion. Moreover, our empirical results show that when using a language model as a structured discriminator, it is possible to forgo adversarial steps during training, making the process more stable. We compare our model with previous work using convolutional neural networks (CNNs) as discriminators and show that our approach leads to improved performance on three tasks: word substitution decipherment, sentiment modification, and related language translation.

Neural Architecture Search with Bayesian Optimisation and Optimal Transport

Authors: Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric Xing

Bayesian Optimisation (BO) refers to a class of methods for global optimization of a function f that is only accessible via point evaluations. It is typically used in settings where f is expensive to evaluate. A common use case for BO in machine learning is model selection, where it is not possible to analytically model the generalization performance of a statistical model, and we resort to noisy and expensive training and validation procedures to choose the best model. Conventional BO methods have focused on Euclidean and categorical domains, which, in the context of model selection, only permit the tuning of scalar hyper-parameters of machine learning algorithms. However, with the surge of interest in deep learning, there is an increasing demand to tune neural network architectures. In this work, we develop NASBOT, a Gaussian process-based BO framework for neural architecture search. To accomplish this, we develop a distance metric in the space of neural network architectures that can be computed efficiently via an optimal transport program. This distance might be of independent interest to the deep learning community as it may find applications outside of BO. We demonstrate that NASBOT outperforms other alternatives for architecture search in several cross validation-based model selection tasks on multi-layer perceptrons and convolutional neural networks.

DAGs with NO TEARS: Continuous Optimization for Structure Learning

Authors: Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric Xing

Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales super-exponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this paper, we introduce a fundamentally different strategy: we formulate the structure learning problem as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, which also makes implementation effortless. The proposed method outperforms existing ones, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree.

Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems

Authors: Mrinmaya Sachan, Avinava Dubey, Tom Mitchell, Dan Roth, and Eric Xing

As machine learning becomes more widely used in practice, we need new methods to build complex intelligent systems that integrate learning with existing software and with domain knowledge encoded as rules. As a case study, we present such a system that learns to parse Newtonian physics problems in textbooks. This system, Nuts&Bolts, learns a pipeline process that incorporates existing code, pre-learned machine learning models, and human-engineered rules. It jointly trains the entire pipeline to prevent propagation of errors, using a combination of labeled and unlabeled data. Our approach achieves a good performance on the parsing task, outperforming the simple pipeline and its variants. We further use Nuts&Bolts to show improvements on the end task of answering these problems.

Sample Complexity of Nonparametric Semi-Supervised Learning

Authors: Chen Dan, Leqi Liu, Bryon Aragam, Pradeep Ravikumar, Eric Xing

We study the sample complexity of semi-supervised learning (SSL) and introduce new assumptions based on the mismatch between a mixture model learned from unlabeled data and the true mixture model induced by the (unknown) class-conditional distributions. Under these assumptions, we establish an $\Omega(K\log K)$ labeled sample complexity bound without imposing parametric assumptions, where $K$ is the number of classes. Our results suggest that even in nonparametric settings it is possible to learn a near-optimal classifier using only a few labeled samples. Unlike previous theoretical work which focuses on binary classification, we consider general multiclass classification ($K>2$), which requires solving a difficult permutation learning problem. This permutation defines a classifier whose classification error is controlled by the Wasserstein distance between mixing measures, and we provide finite-sample results characterizing the behavior of the excess risk of this classifier. Finally, we describe three algorithms for computing these estimators based on a connection to bipartite graph matching and perform experiments to illustrate the superiority of the MLE over the majority vote estimator.