ICML 2019 is approaching. I collect invited talks, tutorials, and workshops about reinforcement learning (RL) and related deep learning, machine learning and AI topics, and RL papers. Comments are welcome.
Table of Contents
- Invited Talks
- Tutorials
- Workshops
- Papers
- Papers: Value Function
- Papers: Policy
- Papers: Reward
- Papers: Model
- Papers: Exploration
- Papers: Exploration: Bandits
- Papers: Representation
- Papers: Hierarchical RL
- Papers: Multi-agent RL
- Papers: Relational RL
- Papers: Learning to Learn
- Papers: Applications
Invited Talks
- Machine learning for robots to think fast, Aude Billard
- What 4 year olds can do and AI can’t (yet), Alison Gopnik
Tutorials
- Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning, Chelsea Finn · Sergey Levine
- Never-Ending Learning, Tom Mitchell · Partha Talukdar
- AI Safety, Silvia Chiappa · Jan Leike
- Neural Approaches to Conversational AI, Michel Galley · Jianfeng Gao
- Algorithm configuration: learning in the space of algorithm designs, Kevin Leyton-Brown · Frank Hutter
- A Tutorial on Attention in Deep Learning, Alex Smola · Aston Zhang
- Causal Inference and Stable Learning, Tong Zhang · Peng Cui
Workshops
- Reinforcement Learning for Real Life
- Real World Reinforcement Learning Workshop (June 9, 2pm Room 104)
- Generative Modeling and Model-Based Reasoning for Robotics and AI
- Exploration in Reinforcement Learning
- Multi-Task and Lifelong Reinforcement Learning
- Real-world Sequential Decision Making: Reinforcement Learning and Beyond (What is BEYOND reinforcement learning? That’s a question.)
- AI in Finance: Applications and Infrastructure for Multi-Agent Learning
- Automated Machine Learning
- Generative Modeling and Model-Based Reasoning for Robotics and AI
- Robustness and Uncertainty Estimation in Deep Learning
- AI for autonomous driving
- Imitation, Intent, and Interaction (I3)
- Self-Supervised Learning
- Learning and Reasoning with Graph-Structured Representations
- Adaptive and Multitask Learning: Algorithms & Systems
Papers
In the following, I collect (probably) all papers (directly) related to RL and put them into various topics. Comments are welcome, e.g., about the categorization, or if I miss some (important) papers. My email: yuxili@gmail.com. Thanks!
Papers: Value Function
Diagnosing Bottlenecks in Deep Q-learning Algorithms
Justin Fu (University of California, Berkeley) · Aviral Kumar (University of California Berkeley) · Matthew Soh (UC Berkeley) · Sergey Levine (Berkeley)
The Value Function Polytope in Reinforcement Learning
Robert Dadashi (Google AI Residency Program) · Marc Bellemare (Google Brain) · Adrien Ali Taiga (Université de Montréal) · Nicolas Le Roux (Google) · Dale Schuurmans (Google / University of Alberta)
Statistics and Samples in Distributional Reinforcement Learning
Mark Rowland (DeepMind) · Robert Dadashi (Google AI Residency Program) · Saurabh Kumar (Google) · Remi Munos (DeepMind) · Marc Bellemare (Google Brain) · Will Dabney (DeepMind)
Nonlinear Distributional Gradient Temporal-Difference Learning
chao qu (Ant Financial Service Group) · Shie Mannor (Technion) · Huan Xu (Georgia Tech)
Sample-Optimal Parametric Q-Learning with Linear Transition Models
Lin Yang (Princeton) · Mengdi Wang (Princeton University)
Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models
Michael Oberst (MIT) · David Sontag (Massachusetts Institute of Technology)
Composing Value Functions in Reinforcement Learning
Benjamin van Niekerk (University of the Witwatersrand) · Steven James (University of the Witwatersrand) · Adam Earle (University of the Witwatersrand) · Benjamin Rosman (Council for Scientific and Industrial Research)
Making Deep Q-learning methods robust to time discretization
Corentin Tallec (Univ. Paris-Sud) · Leonard Blier (Université Paris Sud and Facebook) · Yann Ollivier (Facebook Artificial Intelligence Research)
Sample-Optimal Parametric Q-Learning Using Linearly Additive Features
Lin Yang (Princeton) · Mengdi Wang (Princeton University)
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
Andrea Zanette (Stanford University) · Emma Brunskill (Stanford University)
Revisiting the Softmax Bellman Operator: New Benefits and New Perspective
Zhao Song (Baidu Research) · Ron Parr (Duke University) · Lawrence Carin (Duke University)
Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen (University of Illinois at Urbana-Champaign) · Nan Jiang (University of Illinois at Urbana-Champaign)
Dynamic Weights in Multi-Objective Deep Reinforcement Learning
Axel Abels (Université Libre de Bruxelles) · Diederik Roijers (VUB) · Tom Lenaerts (Vrije Universiteit Brussel) · Ann Nowé (Vrije Universiteit Brussel) · Denis Steckelmacher (Vrije Universiteit Brussel)
Papers: Policy
Understanding the Impact of Entropy on Policy Optimization
Zafarali Ahmed (Mila — McGill University) · Nicolas Le Roux (Google) · Mohammad Norouzi (Google Brain) · Dale Schuurmans (Google / University of Alberta)
Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann (Carnegie Mellon University) · Lihong Li (Google Inc.) · Wei Wei (Google) · Emma Brunskill (Stanford University)
Quantifying Generalization in Reinforcement Learning
Karl Cobbe (OpenAI) · Oleg Klimov (OpenAI) · Chris Hesse (OpenAI) · Taehoon Kim (OpenAI) · John Schulman (OpenAI)
Off-Policy Deep Reinforcement Learning without Exploration
Scott Fujimoto (McGill University) · David Meger (McGill University) · Doina Precup (McGill University / DeepMind)
Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
Casey Chu (Stanford University) · Jose Blanchet (Stanford University) · Peter Glynn (Stanford University)
POLITEX: Regret Bounds for Policy Iteration using Expert Prediction
Nevena Lazic (Google) · Yasin Abbasi-Yadkori (Adobe Research) · Kush Bhatia (UC Berkeley) · Gellért Weisz (DeepMind) · Peter Bartlett (“University of California, Berkeley”) · Csaba Szepesvari (DeepMind/University of Alberta)
Collaborative Evolutionary Reinforcement Learning
Shauharda Khadka (Intel AI) · Somdeb Majumdar (Intel AI Lab) · Tarek Nassar (Intel AI Lab) · Zach Dwiel (Intel AI Lab) · Evren Tumer (Intel Corporation) · Santiago Miret (Intel AI Products Group) · Yinyin Liu (Intel AI Lab) · Kagan Tumer (Oregon State University US)
Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules
Daniel Ho (UC Berkeley) · Eric Liang (UC Berkeley) · Xi Chen (UC Berkeley) · Ion Stoica (UC Berkeley) · Pieter Abbeel (UC Berkeley)
Safe Policy Improvement with Baseline Bootstrapping
Romain Laroche (Microsoft Research) · Paul TRICHELAIR (Mila — Quebec AI Institute/McGill University) · Remi Tachet des Combes (Microsoft Research Montreal)
Fingerprint Policy Optimisation for Robust Reinforcement Learning
Supratik Paul (University of Oxford) · Michael A Osborne (U Oxford) · Shimon Whiteson (University of Oxford)
Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN
dror freirich (Technion) · Tzahi Shimkin (Technion Israeli Institute of Technology) · Ron Meir (Technion Israeli Institute of Technology) · Aviv Tamar (Technion Israeli Institute of Technology)
Predictor-Corrector Policy Optimization
Ching-An Cheng (Georgia Tech) · Xinyan Yan (Georgia Tech) · Nathan Ratliff (NVIDIA) · Byron Boots (Georgia Tech)
Optimistic Policy Optimization via Multiple Importance Sampling
Matteo Papini (Politecnico di Milano) · Alberto Maria Metelli (Politecnico di Milano) · Lorenzo Lupo (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)
Projections for Approximate Policy Iteration Algorithms
Riad Akrour (TU Darmstadt) · Joni Pajarinen (TU Darmstadt) · Jan Peters (TU Darmstadt + Max Planck Institute for Intelligent Systems) · Gerhard Neumann (University of Lincoln)
Transfer of Samples in Policy Search via Multiple Importance Sampling
Andrea Tirinzoni (Politecnico di Milano) · Mattia Salvini (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)
Hessian Aided Policy Gradient
Zebang Shen (Zhejiang University) · Alejandro Ribeiro (University of Pennsylvania) · Hamed Hassani (University of Pennsylvania) · Hui Qian (Zhejiang University) · Chao Mi (Zhejiang University)
Policy Consolidation for Continual Reinforcement Learning
Christos Kaplanis (Imperial College London) · Murray Shanahan (DeepMind / Imperial College London) · Claudia Clopath (Imperial College London)
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Josiah Hanna (UT Austin) · Scott Niekum (University of Texas at Austin) · Peter Stone (University of Texas at Austin)
Trajectory-Based Off-Policy Deep Reinforcement Learning
Andreas Doerr (Bosch Center for Artificial Intelligence, Max Planck Institute for Intelligent Systems) · Michael Volpp (Bosch Center for AI) · Marc Toussaint (University Stuttgart) · Sebastian Trimpe (Max Planck Institute for Intelligent Systems) · Christian Daniel (Bosch Center for Artificial Intelligence)
CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
Yi Su (Cornell University) · Lequn Wang (Cornell University) · Michele Santacatterina (TRIPODS Center of Data Science — Cornell University) · Thorsten Joachims (Cornell)
More Efficient Policy Value Evaluation through Regularized Targeted Learning
Aurelien Bibaut (UC Berkeley) · Ivana Malenica (UC Berkeley) · Nikos Vlassis (Netflix) · Mark van der Laan (UC Berkeley)
Learning Novel Policies For Tasks
Yunbo Zhang (Georgia Institute of Technology) · Wenhao Yu (Georgia Institute of Technology) · Greg Turk (Georgia Institute of Technology)
Remember and Forget for Experience Replay
Guido Novati (ETH Zurich) · Petros Koumoutsakos (ETH Zurich)
Online Control with Adversarial Disturbances
Naman Agarwal (Google AI Princeton) · Brian Bullins (Princeton University) · Elad Hazan (Google Brain and Princeton University) · Sham Kakade (University of Washington) · Karan Singh (Princeton University)
Action Robust Reinforcement Learning and Applications in Continuous Control
Chen Tessler (Technion) · Yonathan Efroni (Technion) · Shie Mannor (Technion)
Control Regularization for Reduced Variance Reinforcement Learning
Richard Cheng (California Institute of Technology) · Abhinav Verma (Rice University) · Gabor Orosz (University of Michigan) · Swarat Chaudhuri (Rice University) · Yisong Yue (Caltech) · Joel Burdick (Caltech)
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
Seungyul Han (KAIST) · Youngchul Sung (KAIST)
Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
Shiau Hong Lim (IBM Research) · Arnaud Autef (Ecole Polytechnique)
A Theory of Regularized Markov Decision Processes
Matthieu Geist (Google) · Bruno Scherrer (INRIA) · Olivier Pietquin (GOOGLE BRAIN)
Online Convex Optimization in Adversarial Markov Decision Processes
Aviv Rosenberg (Tell Aviv University) · Yishay Mansour (Google and Tel Aviv University)
Batch Policy Learning under Constraints
Hoang Le (Caltech) · Cameron Voloshin (Caltech) · Yisong Yue (Caltech)
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
Rui Zhao (Siemens & Ludwig Maximilian University of Munich) · Xudong Sun (Ludwig Maximilian University of Munich) · Volker Tresp (Siemens AG and University of Munich)
Reinforcement Learning in Configurable Continuous Environments
Alberto Maria Metelli (Politecnico di Milano) · Emanuele Ghelfi (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)
On the Generalization Gap in Reparameterizable Reinforcement Learning
Huan Wang (Salesforce Research) · Stephan Zheng (Salesforce Research) · Caiming Xiong (Salesforce) · Richard Socher (Salesforce)
Papers: Reward
Provably Efficient Imitation Learning from Observation Alone
Wen Sun (Carnegie Mellon University) · Anirudh Vemula (CMU) · Byron Boots (Georgia Tech) · Drew Bagnell (Carnegie Mellon University)
Imitating Latent Policies from Observation
Ashley Edwards (Georgia Institute of Technology) · Himanshu Sahni (Georgia Institute of Technology) · Yannick Schroecker (Georgia Institute of Technology) · Charles Isbell (Georgia Institute of Technology)
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
Daniel Brown (University of Texas at Austin) · Wonjoon Goo (University of Texas at Austin) · Prabhat Nagarajan (Preferred Networks) · Scott Niekum (University of Texas at Austin)
Imitation Learning from Imperfect Demonstration
Yueh-Hua Wu (National Taiwan University) · Nontawat Charoenphakdee (The University of Tokyo / RIKEN) · Han Bao (The University of Tokyo / RIKEN) · Voot Tangkaratt (RIKEN AIP) · Masashi Sugiyama (RIKEN / The University of Tokyo)
Papers: Model
An investigation of model-free planning
Arthur Guez (Google DeepMind) · Mehdi Mirza (DeepMind) · Karol Gregor (DeepMind) · Rishabh Kabra (DeepMind) · Sebastien Racaniere (DeepMind) · Theophane Weber (DeepMind) · David Raposo (DeepMind) · Adam Santoro (DeepMind) · Laurent Orseau (DeepMind) · Tom Eccles (DeepMind) · Greg Wayne (DeepMind) · David Silver (Google DeepMind) · Timothy Lillicrap (Google DeepMind)
Calibrated Model-Based Deep Reinforcement Learning
Ali Malik (Stanford Universtiy) · Volodymyr Kuleshov (Stanford University) · Jiaming Song (Stanford) · Danny Nemer (Afresh Technologies) · Harlan Seymour (Afresh Technologies) · Stefano Ermon (Stanford University)
Learning Latent Dynamics for Planning from Pixels
Danijar Hafner (Google Brain & University of Toronto) · Timothy Lillicrap (Google DeepMind) · Ian Fischer (Google) · Ruben Villegas (University of Michigan) · David Ha (Google) · Honglak Lee (Google / U. Michigan) · James Davidson (Google Brain)
Papers: Exploration
Distribution Reinforcement Learning for Efficient Exploration
Borislav Mavrin (University of Alberta) · Hengshuai Yao (Huawei Technologies) · Linglong Kong (University of Alberta) · Kaiwen Wu (University of Waterloo) · Yaoliang Yu (University of Waterloo)
Exploration Conscious Reinforcement Learning Revisited
Lior Shani (Technion) · Yonathan Efroni (Technion) · Shie Mannor (Technion)
Dead-ends and Secure Exploration in Reinforcement Learning
Mehdi Fatemi (Microsoft Research) · Shikhar Sharma (Microsoft Research) · Harm van Seijen (Microsoft Research) · Samira Ebrahimi Kahou (Microsoft Research)
Learning to Explore via Disagreement
Deepak Pathak (UC Berkeley) · Dhiraj Gandhi (Carnegie Mellon University Robotics Institute) · Abhinav Gupta (Carnegie Mellon University)
Model-Based Active Exploration
Pranav Shyam (NNAISENSE) · Wojciech Jaskowski (NNAISENSE) · Faustino Gomez (NNAISENSE SA)
Papers: Exploration: Bandits
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
Chicheng Zhang (Microsoft Research) · Alekh Agarwal (Microsoft Research) · Hal Daume (Microsoft Research) · John Langford (Microsoft Research) · Sahand Negahban (YALE)
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Branislav Kveton (Google Research) · Csaba Szepesvari (DeepMind/University of Alberta) · Sharan Vaswani (Mila, University of Montreal) · Zheng Wen (Adobe Research) · Tor Lattimore (DeepMind) · Mohammad Ghavamzadeh (Facebook AI Research)
Decentralized Exploration in Multi-Armed Bandits
Raphael Feraud (Orange Labs) · REDA ALAMI (Orange Labs — Paris Saclay University — INRIA) · Romain Laroche (Microsoft Research)
Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits
Martin Zhang (Stanford University) · James Zou (Stanford) · David Tse (Stanford University)
Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model
Gi-Soo Kim (Seoul National University) · Myunghee Cho Paik (Seoul National University)
Bilinear Bandits with Low-rank Structure
Kwang-Sung Jun (Boston University) · Rebecca Willett (U Chicago) · Stephen Wright (University of Wisconsin-Madison) · Robert Nowak (University of Wisconsion-Madison)
Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards
Shiyin Lu (Nanjing University) · Guanghui Wang (Nanjing University) · Yao Hu (Alibaba Youku Cognitive and Intelligent Lab) · Lijun Zhang (Nanjing University)
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
Julian Zimmert (University of Copenhagen) · Haipeng Luo (University of Southern California) · Chen-Yu Wei (University of Southern California)
Exploiting structure of uncertainty for efficient combinatorial semi-bandits
Pierre Perrault (Inria Lille — Nord Europe) · Vianney Perchet (ENS Paris Saclay & Criteo AI Lab) · Michal Valko (DeepMind)
Correlated bandits or: How to minimize mean-squared error online
Vinay Praneeth Boda (LinkedIn Corp.) · Prashanth L.A. (IIT Madras)
PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits
Arghya Roy Chaudhuri (Indian Institute of Technology Bombay) · Shivaram Kalyanakrishnan (IIT Bombay)
Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging
Ping-Chun Hsieh (Texas A&M University) · Xi Liu (Texas A&M University) · Anirban Bhattacharya (Texas A&M University) · P R Kumar (Texas A & M University)
Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem
Junyu Cao (University of California Berkeley) · Wei Sun (IBM Research)
Data Poisoning Attacks on Stochastic Bandits
Fang Liu (The Ohio State University) · Ness Shroff (The Ohio State University)
On the design of estimators for bandit off-policy evaluation
Nikos Vlassis (Netflix) · Aurelien Bibaut (UC Berkeley) · Maria Dimakopoulou (Stanford) · Tony Jebara (Netflix)
An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule
Touqir Sajed (University of Alberta) · Or Sheffet (University of Alberta)
Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case
Alina Beygelzimer (Yahoo Research) · David Pal (Expedia) · Balazs Szorenyi (Yahoo Research) · Devanathan Thiruvenkatachari (New York University) · Chen-Yu Wei (University of Southern California) · Chicheng Zhang (Microsoft Research)
Papers: Representation
Learning Action Representations for Reinforcement Learning
Yash Chandak (University of Massachusetts Amherst) · Georgios Theocharous (Adobe Research) · James Kostas (UMass Amherst) · Scott Jordan (University of Massachusetts Amherst) · Philip Thomas (University of Massachusetts Amherst)
Provably efficient RL with Rich Observations via Latent State Decoding
Simon Du (Carnegie Mellon University) · Akshay Krishnamurthy (Microsoft Research) · Nan Jiang (University of Illinois at Urbana-Champaign) · Alekh Agarwal (Microsoft Research) · Miroslav Dudik (Microsoft Research) · John Langford (Microsoft Research)
Task-Agnostic Dynamics Priors for Deep Reinforcement Learning
Yilun Du (MIT) · Karthik Narasimhan (Princeton)
The Natural Language of Actions
Guy Tennenholtz (Technion) · Shie Mannor (Technion)
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
Marvin Zhang (UC Berkeley) · Sharad Vikram (UCSD) · Laura Smith (UC Berkeley) · Pieter Abbeel (OpenAI / UC Berkeley) · Matthew Johnson (Google Brain) · Sergey Levine (Berkeley)
DeepMDP: Learning Continuous Latent Space Models with Theoretical Guarantees
Carles Gelada (Google Brain) · Saurabh Kumar (Google Brain) · Jacob Buckman (Johns Hopkins University) · Ofir Nachum (Google Brain) · Marc Bellemare (Google Brain)
Papers: Hierarchical RL
Finding Options that Minimize Planning Time
Yuu Jinnai (Brown University) · David Abel (Brown University) · David Hershkowitz (Carnegie Mellon University) · Michael L. Littman (Brown University) · George Konidaris (Brown)
Option Discovery for Solving Sparse Reward Reinforcement Learning Problems
Yuu Jinnai (Brown University) · Jee Won Park (Brown University) · David Abel (Brown University) · George Konidaris (Brown)
Per-Decision Option Discounting
Anna Harutyunyan (DeepMind) · Peter Vrancx (PROWLER.io) · Philippe Hamel (Deepmind) · Ann Nowe (VU Brussel) · Doina Precup (DeepMind)
Papers: Multi-agent RL
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Jakob Foerster (Facebook AI Research) · Francis Song (DeepMind) · Edward Hughes (DeepMind) · Neil Burch (DeepMind) · Iain Dunning (DeepMind) · Shimon Whiteson (University of Oxford) · Matthew Botvinick (DeepMind) · Michael Bowling (DeepMind)
Multi-Agent Adversarial Inverse Reinforcement Learning
Lantao Yu (Stanford University) · Jiaming Song (Stanford) · Stefano Ermon (Stanford University)
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Shariq Iqbal (University of Southern California) · Fei Sha (University of Southern California)
Learning to Collaborate in Markov Decision Processes
Goran Radanovic (Harvard University) · Rati Devidze (Max Planck Institute for Software Systems) · David Parkes (Harvard University) · Adish Singla (Max Planck Institute (MPI-SWS))
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Natasha Jaques (MIT) · Angeliki Lazaridou (DeepMind) · Edward Hughes (DeepMind) · Caglar Gulcehre (DeepMind) · Pedro Ortega (DeepMind) · DJ Strouse (Princeton University) · Joel Z Leibo (DeepMind) · Nando de Freitas (DeepMind)
TarMAC: Targeted Multi-Agent Communication
Abhishek Das (Georgia Tech) · Theophile Gervet (Carnegie Mellon University) · Joshua Romoff (McGill University) · Dhruv Batra (Georgia Institute of Technology / Facebook AI Research) · Devi Parikh (Georgia Tech & Facebook AI Research) · Michael Rabbat (Facebook) · Joelle Pineau (Facebook)
Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning
Thinh Doan (Georgia Institute of Technology) · Siva Maguluri (Georgia Tech) · Justin Romberg (Georgia Tech)
Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI
Lei Han (Tencent AI Lab) · Peng Sun (Tencent AI Lab) · Yali Du (University of Technology Sydney) · Jiechao Xiong (Tencent AI Lab) · Qing Wang () · Xinghai Sun (Tencent AI Lab) · Han Liu (Northwestern) · Tong Zhang (Tecent AI Lab)
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
Kyunghwan Son (KAIST) · Daewoo Kim (KAIST) · Wan Ju Kang (KAIST) · David Earl Hostallero (KAIST) · Yung Yi (KAIST)
A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs
Jingkai Mao (Man AHL) · Jakob Foerster (Facebook AI Research) · Tim Rocktäschel (University of Oxford) · Maruan Al-Shedivat (Carnegie Mellon University) · Gregory Farquhar (University of Oxford) · Shimon Whiteson (University of Oxford)
Open-ended learning in zero-sum games
David Balduzzi (DeepMind) · Marta Garnelo (DeepMind) · Yoram Bachrach () · Wojciech Czarnecki (DeepMind) · Julien Perolat (DeepMind) · Max Jaderberg (DeepMind) · Thore Graepel (DeepMind)
Papers: Relational RL
Neural Logic Reinforcement Learning
zhengyao jiang (University of Liverpool) · Shan Luo (University of Liverpool)
Papers: Learning to Learn
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Kate Rakelly (UC Berkeley) · Aurick Zhou (UC Berkeley) · Chelsea Finn (Stanford, Google, UC Berkeley) · Sergey Levine (Berkeley) · Deirdre Quillen (UC Berkeley)
CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
Cédric Colas (Inria) · Pierre-Yves Oudeyer (Inria) · Olivier Sigaud (Sorbonne University) · Pierre Fournier (UPMC) · Mohamed Chetouani (UPMC)
Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
Shani Gamrian (Bar-Ilan University) · Yoav Goldberg ()
Few-Shot Intent Inference via Meta-Inverse Reinforcement Learning
Kelvin Xu (University of California, Berkeley) · Ellis Ratner (University of California, Berkeley) · EECS Anca Dragan (EECS Department, University of California, Berkeley) · Sergey Levine (Berkeley) · Chelsea Finn (Stanford, Google, UC Berkeley)
Taming MAML: Control variates for unbiased meta-reinforcement learning gradient estimation
Hao Liu (Salesforce) · Richard Socher (Salesforce) · Caiming Xiong (Salesforce)
TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning
Tameem Adel (University of Cambridge) · Adrian Weller (University of Cambridge, Alan Turing Institute)
Papers: Applications
ELF OpenGo: an analysis and open reimplementation of AlphaZero
Yuandong Tian (Facebook AI Research) · Jerry Ma (Facebook AI Research) · Qucheng Gong (Facebook AI Research) · Shubho Sengupta (Facebook AI Research) · Zhuoyuan Chen (Facebook) · James Pinkerton (Facebook AI Research) · Larry Zitnick (Facebook AI Research)
Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems
Timothy Mann (DeepMind) · Sven Gowal (DeepMind) · Huiyi Hu (DeepMind) · Ray Jiang (Google Deepmind) · Balaji Lakshminarayanan (Google DeepMind) · Andras Gyorgy (DeepMind) · Prav Srinivasan (DeepMind)
Dynamic Measurement Scheduling for Event Forecasting using Deep RL
Chun-Hao Chang (University of Toronto) · Mingjie Mai (University of Toronto) · Anna Goldenberg (University of Toronto)
Generative Adversarial User Model for Reinforcement Learning Based Recommendation System
Xinshi Chen (Georgia Institution of Technology) · Shuang Li (Georgia Tech) · Hui Li (Ant Financial) · Shaohua Jiang (Ant Financial) · Yuan Qi (Ant Financial Services Group) · Le Song (Georgia Institute of Technology)
A Deep Reinforcement Learning Perspective on Internet Congestion Control
Nathan Jay (University of Illinois Urbana-Champaign) · Noga H. Rotman (Hebrew University of Jerusalem) · Brighten Godfrey (University of Illinois Urbana-Champaign) · Michael Schapira (Hebrew University of Jerusalem) · Aviv Tamar (Technion Israeli Institute of Technology)
Target Tracking for Contextual Bandits: Application to Demand Side Management
Margaux Brégère (CNRS Université Paris-Sud, Inria Paris, EDF R&D) · Pierre Gaillard (INRIA Paris) · Yannig Goude (EDF Lab Paris-Saclay) · Gilles Stoltz (Université paris Sud)
Greedy Sequential Subset Selection via Sequential Facility Location
Ehsan Elhamifar (Northeastern University)
Hiring Under Uncertainty
Manish Purohit (Google) · Sreenivas Gollapudi (Google Research) · Manish Raghavan (Cornell)
Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments
Kirthevasan Kandasamy (Carnegie Mellon University) · Willie Neiswanger (CMU) · Reed Zhang (Carnegie Mellon University) · Akshay Krishnamurthy (Microsoft Research) · Jeff Schneider (Uber/CMU) · Barnabás Póczos (CMU)
A Control-Theoretic Perspective on Nesterov’s Accelerated Gradient Method
Michael Muehlebach (UC Berkeley) · Michael Jordan (UC Berkeley)