Reinforcement Learning in ICML 2019

Yuxi Li
15 min readMay 18, 2019

--

ICML 2019 is approaching. I collect invited talks, tutorials, and workshops about reinforcement learning (RL) and related deep learning, machine learning and AI topics, and RL papers. Comments are welcome.

Table of Contents

Invited Talks

Tutorials

Workshops

Papers

In the following, I collect (probably) all papers (directly) related to RL and put them into various topics. Comments are welcome, e.g., about the categorization, or if I miss some (important) papers. My email: yuxili@gmail.com. Thanks!

Papers: Value Function

Diagnosing Bottlenecks in Deep Q-learning Algorithms
Justin Fu (University of California, Berkeley) · Aviral Kumar (University of California Berkeley) · Matthew Soh (UC Berkeley) · Sergey Levine (Berkeley)

The Value Function Polytope in Reinforcement Learning
Robert Dadashi (Google AI Residency Program) · Marc Bellemare (Google Brain) · Adrien Ali Taiga (Université de Montréal) · Nicolas Le Roux (Google) · Dale Schuurmans (Google / University of Alberta)

Statistics and Samples in Distributional Reinforcement Learning
Mark Rowland (DeepMind) · Robert Dadashi (Google AI Residency Program) · Saurabh Kumar (Google) · Remi Munos (DeepMind) · Marc Bellemare (Google Brain) · Will Dabney (DeepMind)

Nonlinear Distributional Gradient Temporal-Difference Learning
chao qu (Ant Financial Service Group) · Shie Mannor (Technion) · Huan Xu (Georgia Tech)

Sample-Optimal Parametric Q-Learning with Linear Transition Models
Lin Yang (Princeton) · Mengdi Wang (Princeton University)

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models
Michael Oberst (MIT) · David Sontag (Massachusetts Institute of Technology)

Composing Value Functions in Reinforcement Learning
Benjamin van Niekerk (University of the Witwatersrand) · Steven James (University of the Witwatersrand) · Adam Earle (University of the Witwatersrand) · Benjamin Rosman (Council for Scientific and Industrial Research)

Making Deep Q-learning methods robust to time discretization
Corentin Tallec (Univ. Paris-Sud) · Leonard Blier (Université Paris Sud and Facebook) · Yann Ollivier (Facebook Artificial Intelligence Research)

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features
Lin Yang (Princeton) · Mengdi Wang (Princeton University)

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette (Stanford University) · Emma Brunskill (Stanford University)

Revisiting the Softmax Bellman Operator: New Benefits and New Perspective
Zhao Song (Baidu Research) · Ron Parr (Duke University) · Lawrence Carin (Duke University)

Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen (University of Illinois at Urbana-Champaign) · Nan Jiang (University of Illinois at Urbana-Champaign)

Dynamic Weights in Multi-Objective Deep Reinforcement Learning
Axel Abels (Université Libre de Bruxelles) · Diederik Roijers (VUB) · Tom Lenaerts (Vrije Universiteit Brussel) · Ann Nowé (Vrije Universiteit Brussel) · Denis Steckelmacher (Vrije Universiteit Brussel)

Papers: Policy

Understanding the Impact of Entropy on Policy Optimization

Zafarali Ahmed (Mila — McGill University) · Nicolas Le Roux (Google) · Mohammad Norouzi (Google Brain) · Dale Schuurmans (Google / University of Alberta)

Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann (Carnegie Mellon University) · Lihong Li (Google Inc.) · Wei Wei (Google) · Emma Brunskill (Stanford University)

Quantifying Generalization in Reinforcement Learning
Karl Cobbe (OpenAI) · Oleg Klimov (OpenAI) · Chris Hesse (OpenAI) · Taehoon Kim (OpenAI) · John Schulman (OpenAI)

Off-Policy Deep Reinforcement Learning without Exploration
Scott Fujimoto (McGill University) · David Meger (McGill University) · Doina Precup (McGill University / DeepMind)

Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
Casey Chu (Stanford University) · Jose Blanchet (Stanford University) · Peter Glynn (Stanford University)

POLITEX: Regret Bounds for Policy Iteration using Expert Prediction
Nevena Lazic (Google) · Yasin Abbasi-Yadkori (Adobe Research) · Kush Bhatia (UC Berkeley) · Gellért Weisz (DeepMind) · Peter Bartlett (“University of California, Berkeley”) · Csaba Szepesvari (DeepMind/University of Alberta)

Collaborative Evolutionary Reinforcement Learning
Shauharda Khadka (Intel AI) · Somdeb Majumdar (Intel AI Lab) · Tarek Nassar (Intel AI Lab) · Zach Dwiel (Intel AI Lab) · Evren Tumer (Intel Corporation) · Santiago Miret (Intel AI Products Group) · Yinyin Liu (Intel AI Lab) · Kagan Tumer (Oregon State University US)

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules
Daniel Ho (UC Berkeley) · Eric Liang (UC Berkeley) · Xi Chen (UC Berkeley) · Ion Stoica (UC Berkeley) · Pieter Abbeel (UC Berkeley)

Safe Policy Improvement with Baseline Bootstrapping
Romain Laroche (Microsoft Research) · Paul TRICHELAIR (Mila — Quebec AI Institute/McGill University) · Remi Tachet des Combes (Microsoft Research Montreal)

Fingerprint Policy Optimisation for Robust Reinforcement Learning
Supratik Paul (University of Oxford) · Michael A Osborne (U Oxford) · Shimon Whiteson (University of Oxford)

Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN
dror freirich (Technion) · Tzahi Shimkin (Technion Israeli Institute of Technology) · Ron Meir (Technion Israeli Institute of Technology) · Aviv Tamar (Technion Israeli Institute of Technology)

Predictor-Corrector Policy Optimization
Ching-An Cheng (Georgia Tech) · Xinyan Yan (Georgia Tech) · Nathan Ratliff (NVIDIA) · Byron Boots (Georgia Tech)

Optimistic Policy Optimization via Multiple Importance Sampling
Matteo Papini (Politecnico di Milano) · Alberto Maria Metelli (Politecnico di Milano) · Lorenzo Lupo (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)

Projections for Approximate Policy Iteration Algorithms
Riad Akrour (TU Darmstadt) · Joni Pajarinen (TU Darmstadt) · Jan Peters (TU Darmstadt + Max Planck Institute for Intelligent Systems) · Gerhard Neumann (University of Lincoln)

Transfer of Samples in Policy Search via Multiple Importance Sampling
Andrea Tirinzoni (Politecnico di Milano) · Mattia Salvini (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)

Hessian Aided Policy Gradient
Zebang Shen (Zhejiang University) · Alejandro Ribeiro (University of Pennsylvania) · Hamed Hassani (University of Pennsylvania) · Hui Qian (Zhejiang University) · Chao Mi (Zhejiang University)

Policy Consolidation for Continual Reinforcement Learning
Christos Kaplanis (Imperial College London) · Murray Shanahan (DeepMind / Imperial College London) · Claudia Clopath (Imperial College London)

Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Josiah Hanna (UT Austin) · Scott Niekum (University of Texas at Austin) · Peter Stone (University of Texas at Austin)

Trajectory-Based Off-Policy Deep Reinforcement Learning
Andreas Doerr (Bosch Center for Artificial Intelligence, Max Planck Institute for Intelligent Systems) · Michael Volpp (Bosch Center for AI) · Marc Toussaint (University Stuttgart) · Sebastian Trimpe (Max Planck Institute for Intelligent Systems) · Christian Daniel (Bosch Center for Artificial Intelligence)

CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
Yi Su (Cornell University) · Lequn Wang (Cornell University) · Michele Santacatterina (TRIPODS Center of Data Science — Cornell University) · Thorsten Joachims (Cornell)

More Efficient Policy Value Evaluation through Regularized Targeted Learning
Aurelien Bibaut (UC Berkeley) · Ivana Malenica (UC Berkeley) · Nikos Vlassis (Netflix) · Mark van der Laan (UC Berkeley)

Learning Novel Policies For Tasks
Yunbo Zhang (Georgia Institute of Technology) · Wenhao Yu (Georgia Institute of Technology) · Greg Turk (Georgia Institute of Technology)

Remember and Forget for Experience Replay
Guido Novati (ETH Zurich) · Petros Koumoutsakos (ETH Zurich)

Online Control with Adversarial Disturbances
Naman Agarwal (Google AI Princeton) · Brian Bullins (Princeton University) · Elad Hazan (Google Brain and Princeton University) · Sham Kakade (University of Washington) · Karan Singh (Princeton University)

Action Robust Reinforcement Learning and Applications in Continuous Control
Chen Tessler (Technion) · Yonathan Efroni (Technion) · Shie Mannor (Technion)

Control Regularization for Reduced Variance Reinforcement Learning
Richard Cheng (California Institute of Technology) · Abhinav Verma (Rice University) · Gabor Orosz (University of Michigan) · Swarat Chaudhuri (Rice University) · Yisong Yue (Caltech) · Joel Burdick (Caltech)

Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
Seungyul Han (KAIST) · Youngchul Sung (KAIST)

Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
Shiau Hong Lim (IBM Research) · Arnaud Autef (Ecole Polytechnique)

A Theory of Regularized Markov Decision Processes
Matthieu Geist (Google) · Bruno Scherrer (INRIA) · Olivier Pietquin (GOOGLE BRAIN)

Online Convex Optimization in Adversarial Markov Decision Processes
Aviv Rosenberg (Tell Aviv University) · Yishay Mansour (Google and Tel Aviv University)

Batch Policy Learning under Constraints
Hoang Le (Caltech) · Cameron Voloshin (Caltech) · Yisong Yue (Caltech)

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
Rui Zhao (Siemens & Ludwig Maximilian University of Munich) · Xudong Sun (Ludwig Maximilian University of Munich) · Volker Tresp (Siemens AG and University of Munich)

Reinforcement Learning in Configurable Continuous Environments
Alberto Maria Metelli (Politecnico di Milano) · Emanuele Ghelfi (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)

On the Generalization Gap in Reparameterizable Reinforcement Learning
Huan Wang (Salesforce Research) · Stephan Zheng (Salesforce Research) · Caiming Xiong (Salesforce) · Richard Socher (Salesforce)

Papers: Reward

Provably Efficient Imitation Learning from Observation Alone
Wen Sun (Carnegie Mellon University) · Anirudh Vemula (CMU) · Byron Boots (Georgia Tech) · Drew Bagnell (Carnegie Mellon University)

Imitating Latent Policies from Observation
Ashley Edwards (Georgia Institute of Technology) · Himanshu Sahni (Georgia Institute of Technology) · Yannick Schroecker (Georgia Institute of Technology) · Charles Isbell (Georgia Institute of Technology)

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
Daniel Brown (University of Texas at Austin) · Wonjoon Goo (University of Texas at Austin) · Prabhat Nagarajan (Preferred Networks) · Scott Niekum (University of Texas at Austin)

Imitation Learning from Imperfect Demonstration
Yueh-Hua Wu (National Taiwan University) · Nontawat Charoenphakdee (The University of Tokyo / RIKEN) · Han Bao (The University of Tokyo / RIKEN) · Voot Tangkaratt (RIKEN AIP) · Masashi Sugiyama (RIKEN / The University of Tokyo)

Papers: Model

An investigation of model-free planning
Arthur Guez (Google DeepMind) · Mehdi Mirza (DeepMind) · Karol Gregor (DeepMind) · Rishabh Kabra (DeepMind) · Sebastien Racaniere (DeepMind) · Theophane Weber (DeepMind) · David Raposo (DeepMind) · Adam Santoro (DeepMind) · Laurent Orseau (DeepMind) · Tom Eccles (DeepMind) · Greg Wayne (DeepMind) · David Silver (Google DeepMind) · Timothy Lillicrap (Google DeepMind)

Calibrated Model-Based Deep Reinforcement Learning
Ali Malik (Stanford Universtiy) · Volodymyr Kuleshov (Stanford University) · Jiaming Song (Stanford) · Danny Nemer (Afresh Technologies) · Harlan Seymour (Afresh Technologies) · Stefano Ermon (Stanford University)

Learning Latent Dynamics for Planning from Pixels
Danijar Hafner (Google Brain & University of Toronto) · Timothy Lillicrap (Google DeepMind) · Ian Fischer (Google) · Ruben Villegas (University of Michigan) · David Ha (Google) · Honglak Lee (Google / U. Michigan) · James Davidson (Google Brain)

Papers: Exploration

Distribution Reinforcement Learning for Efficient Exploration
Borislav Mavrin (University of Alberta) · Hengshuai Yao (Huawei Technologies) · Linglong Kong (University of Alberta) · Kaiwen Wu (University of Waterloo) · Yaoliang Yu (University of Waterloo)

Exploration Conscious Reinforcement Learning Revisited
Lior Shani (Technion) · Yonathan Efroni (Technion) · Shie Mannor (Technion)

Dead-ends and Secure Exploration in Reinforcement Learning
Mehdi Fatemi (Microsoft Research) · Shikhar Sharma (Microsoft Research) · Harm van Seijen (Microsoft Research) · Samira Ebrahimi Kahou (Microsoft Research)

Learning to Explore via Disagreement
Deepak Pathak (UC Berkeley) · Dhiraj Gandhi (Carnegie Mellon University Robotics Institute) · Abhinav Gupta (Carnegie Mellon University)

Model-Based Active Exploration
Pranav Shyam (NNAISENSE) · Wojciech Jaskowski (NNAISENSE) · Faustino Gomez (NNAISENSE SA)

Papers: Exploration: Bandits

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
Chicheng Zhang (Microsoft Research) · Alekh Agarwal (Microsoft Research) · Hal Daume (Microsoft Research) · John Langford (Microsoft Research) · Sahand Negahban (YALE)

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Branislav Kveton (Google Research) · Csaba Szepesvari (DeepMind/University of Alberta) · Sharan Vaswani (Mila, University of Montreal) · Zheng Wen (Adobe Research) · Tor Lattimore (DeepMind) · Mohammad Ghavamzadeh (Facebook AI Research)

Decentralized Exploration in Multi-Armed Bandits
Raphael Feraud (Orange Labs) · REDA ALAMI (Orange Labs — Paris Saclay University — INRIA) · Romain Laroche (Microsoft Research)

Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits
Martin Zhang (Stanford University) · James Zou (Stanford) · David Tse (Stanford University)

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model
Gi-Soo Kim (Seoul National University) · Myunghee Cho Paik (Seoul National University)

Bilinear Bandits with Low-rank Structure
Kwang-Sung Jun (Boston University) · Rebecca Willett (U Chicago) · Stephen Wright (University of Wisconsin-Madison) · Robert Nowak (University of Wisconsion-Madison)

Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards
Shiyin Lu (Nanjing University) · Guanghui Wang (Nanjing University) · Yao Hu (Alibaba Youku Cognitive and Intelligent Lab) · Lijun Zhang (Nanjing University)

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
Julian Zimmert (University of Copenhagen) · Haipeng Luo (University of Southern California) · Chen-Yu Wei (University of Southern California)

Exploiting structure of uncertainty for efficient combinatorial semi-bandits
Pierre Perrault (Inria Lille — Nord Europe) · Vianney Perchet (ENS Paris Saclay & Criteo AI Lab) · Michal Valko (DeepMind)

Correlated bandits or: How to minimize mean-squared error online
Vinay Praneeth Boda (LinkedIn Corp.) · Prashanth L.A. (IIT Madras)

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits
Arghya Roy Chaudhuri (Indian Institute of Technology Bombay) · Shivaram Kalyanakrishnan (IIT Bombay)

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging
Ping-Chun Hsieh (Texas A&M University) · Xi Liu (Texas A&M University) · Anirban Bhattacharya (Texas A&M University) · P R Kumar (Texas A & M University)

Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem
Junyu Cao (University of California Berkeley) · Wei Sun (IBM Research)

Data Poisoning Attacks on Stochastic Bandits
Fang Liu (The Ohio State University) · Ness Shroff (The Ohio State University)

On the design of estimators for bandit off-policy evaluation
Nikos Vlassis (Netflix) · Aurelien Bibaut (UC Berkeley) · Maria Dimakopoulou (Stanford) · Tony Jebara (Netflix)

An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule
Touqir Sajed (University of Alberta) · Or Sheffet (University of Alberta)

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case
Alina Beygelzimer (Yahoo Research) · David Pal (Expedia) · Balazs Szorenyi (Yahoo Research) · Devanathan Thiruvenkatachari (New York University) · Chen-Yu Wei (University of Southern California) · Chicheng Zhang (Microsoft Research)

Papers: Representation

Learning Action Representations for Reinforcement Learning
Yash Chandak (University of Massachusetts Amherst) · Georgios Theocharous (Adobe Research) · James Kostas (UMass Amherst) · Scott Jordan (University of Massachusetts Amherst) · Philip Thomas (University of Massachusetts Amherst)

Provably efficient RL with Rich Observations via Latent State Decoding
Simon Du (Carnegie Mellon University) · Akshay Krishnamurthy (Microsoft Research) · Nan Jiang (University of Illinois at Urbana-Champaign) · Alekh Agarwal (Microsoft Research) · Miroslav Dudik (Microsoft Research) · John Langford (Microsoft Research)

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning
Yilun Du (MIT) · Karthik Narasimhan (Princeton)

The Natural Language of Actions
Guy Tennenholtz (Technion) · Shie Mannor (Technion)

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
Marvin Zhang (UC Berkeley) · Sharad Vikram (UCSD) · Laura Smith (UC Berkeley) · Pieter Abbeel (OpenAI / UC Berkeley) · Matthew Johnson (Google Brain) · Sergey Levine (Berkeley)

DeepMDP: Learning Continuous Latent Space Models with Theoretical Guarantees
Carles Gelada (Google Brain) · Saurabh Kumar (Google Brain) · Jacob Buckman (Johns Hopkins University) · Ofir Nachum (Google Brain) · Marc Bellemare (Google Brain)

Papers: Hierarchical RL

Finding Options that Minimize Planning Time
Yuu Jinnai (Brown University) · David Abel (Brown University) · David Hershkowitz (Carnegie Mellon University) · Michael L. Littman (Brown University) · George Konidaris (Brown)

Option Discovery for Solving Sparse Reward Reinforcement Learning Problems
Yuu Jinnai (Brown University) · Jee Won Park (Brown University) · David Abel (Brown University) · George Konidaris (Brown)

Per-Decision Option Discounting
Anna Harutyunyan (DeepMind) · Peter Vrancx (PROWLER.io) · Philippe Hamel (Deepmind) · Ann Nowe (VU Brussel) · Doina Precup (DeepMind)

Papers: Multi-agent RL

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Jakob Foerster (Facebook AI Research) · Francis Song (DeepMind) · Edward Hughes (DeepMind) · Neil Burch (DeepMind) · Iain Dunning (DeepMind) · Shimon Whiteson (University of Oxford) · Matthew Botvinick (DeepMind) · Michael Bowling (DeepMind)

Multi-Agent Adversarial Inverse Reinforcement Learning
Lantao Yu (Stanford University) · Jiaming Song (Stanford) · Stefano Ermon (Stanford University)

Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Shariq Iqbal (University of Southern California) · Fei Sha (University of Southern California)

Learning to Collaborate in Markov Decision Processes
Goran Radanovic (Harvard University) · Rati Devidze (Max Planck Institute for Software Systems) · David Parkes (Harvard University) · Adish Singla (Max Planck Institute (MPI-SWS))

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Natasha Jaques (MIT) · Angeliki Lazaridou (DeepMind) · Edward Hughes (DeepMind) · Caglar Gulcehre (DeepMind) · Pedro Ortega (DeepMind) · DJ Strouse (Princeton University) · Joel Z Leibo (DeepMind) · Nando de Freitas (DeepMind)

TarMAC: Targeted Multi-Agent Communication
Abhishek Das (Georgia Tech) · Theophile Gervet (Carnegie Mellon University) · Joshua Romoff (McGill University) · Dhruv Batra (Georgia Institute of Technology / Facebook AI Research) · Devi Parikh (Georgia Tech & Facebook AI Research) · Michael Rabbat (Facebook) · Joelle Pineau (Facebook)

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning
Thinh Doan (Georgia Institute of Technology) · Siva Maguluri (Georgia Tech) · Justin Romberg (Georgia Tech)

Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI
Lei Han (Tencent AI Lab) · Peng Sun (Tencent AI Lab) · Yali Du (University of Technology Sydney) · Jiechao Xiong (Tencent AI Lab) · Qing Wang () · Xinghai Sun (Tencent AI Lab) · Han Liu (Northwestern) · Tong Zhang (Tecent AI Lab)

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
Kyunghwan Son (KAIST) · Daewoo Kim (KAIST) · Wan Ju Kang (KAIST) · David Earl Hostallero (KAIST) · Yung Yi (KAIST)

A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs
Jingkai Mao (Man AHL) · Jakob Foerster (Facebook AI Research) · Tim Rocktäschel (University of Oxford) · Maruan Al-Shedivat (Carnegie Mellon University) · Gregory Farquhar (University of Oxford) · Shimon Whiteson (University of Oxford)

Open-ended learning in zero-sum games
David Balduzzi (DeepMind) · Marta Garnelo (DeepMind) · Yoram Bachrach () · Wojciech Czarnecki (DeepMind) · Julien Perolat (DeepMind) · Max Jaderberg (DeepMind) · Thore Graepel (DeepMind)

Papers: Relational RL

Neural Logic Reinforcement Learning
zhengyao jiang (University of Liverpool) · Shan Luo (University of Liverpool)

Papers: Learning to Learn

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Kate Rakelly (UC Berkeley) · Aurick Zhou (UC Berkeley) · Chelsea Finn (Stanford, Google, UC Berkeley) · Sergey Levine (Berkeley) · Deirdre Quillen (UC Berkeley)

CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
Cédric Colas (Inria) · Pierre-Yves Oudeyer (Inria) · Olivier Sigaud (Sorbonne University) · Pierre Fournier (UPMC) · Mohamed Chetouani (UPMC)

Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
Shani Gamrian (Bar-Ilan University) · Yoav Goldberg ()

Few-Shot Intent Inference via Meta-Inverse Reinforcement Learning
Kelvin Xu (University of California, Berkeley) · Ellis Ratner (University of California, Berkeley) · EECS Anca Dragan (EECS Department, University of California, Berkeley) · Sergey Levine (Berkeley) · Chelsea Finn (Stanford, Google, UC Berkeley)

Taming MAML: Control variates for unbiased meta-reinforcement learning gradient estimation
Hao Liu (Salesforce) · Richard Socher (Salesforce) · Caiming Xiong (Salesforce)

TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning
Tameem Adel (University of Cambridge) · Adrian Weller (University of Cambridge, Alan Turing Institute)

Papers: Applications

ELF OpenGo: an analysis and open reimplementation of AlphaZero
Yuandong Tian (Facebook AI Research) · Jerry Ma (Facebook AI Research) · Qucheng Gong (Facebook AI Research) · Shubho Sengupta (Facebook AI Research) · Zhuoyuan Chen (Facebook) · James Pinkerton (Facebook AI Research) · Larry Zitnick (Facebook AI Research)

Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems
Timothy Mann (DeepMind) · Sven Gowal (DeepMind) · Huiyi Hu (DeepMind) · Ray Jiang (Google Deepmind) · Balaji Lakshminarayanan (Google DeepMind) · Andras Gyorgy (DeepMind) · Prav Srinivasan (DeepMind)

Dynamic Measurement Scheduling for Event Forecasting using Deep RL
Chun-Hao Chang (University of Toronto) · Mingjie Mai (University of Toronto) · Anna Goldenberg (University of Toronto)

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System
Xinshi Chen (Georgia Institution of Technology) · Shuang Li (Georgia Tech) · Hui Li (Ant Financial) · Shaohua Jiang (Ant Financial) · Yuan Qi (Ant Financial Services Group) · Le Song (Georgia Institute of Technology)

A Deep Reinforcement Learning Perspective on Internet Congestion Control
Nathan Jay (University of Illinois Urbana-Champaign) · Noga H. Rotman (Hebrew University of Jerusalem) · Brighten Godfrey (University of Illinois Urbana-Champaign) · Michael Schapira (Hebrew University of Jerusalem) · Aviv Tamar (Technion Israeli Institute of Technology)

Target Tracking for Contextual Bandits: Application to Demand Side Management
Margaux Brégère (CNRS Université Paris-Sud, Inria Paris, EDF R&D) · Pierre Gaillard (INRIA Paris) · Yannig Goude (EDF Lab Paris-Saclay) · Gilles Stoltz (Université paris Sud)

Greedy Sequential Subset Selection via Sequential Facility Location
Ehsan Elhamifar (Northeastern University)

Hiring Under Uncertainty
Manish Purohit (Google) · Sreenivas Gollapudi (Google Research) · Manish Raghavan (Cornell)

Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments
Kirthevasan Kandasamy (Carnegie Mellon University) · Willie Neiswanger (CMU) · Reed Zhang (Carnegie Mellon University) · Akshay Krishnamurthy (Microsoft Research) · Jeff Schneider (Uber/CMU) · Barnabás Póczos (CMU)

A Control-Theoretic Perspective on Nesterov’s Accelerated Gradient Method
Michael Muehlebach (UC Berkeley) · Michael Jordan (UC Berkeley)

--

--