Explore, Exploit, and Explode — The Time for Reinforcement Learning is Coming

A Brief Introduction

Recent Achievements

Issues

Research Directions

Applications

Discussions

  • Broadly applicable methodology: Can address broad range of challenging problems. Deterministic-stochastic-dynamic, discrete-continuous, games, etc.
  • There are no methods that are guaranteed to work for all or even most problems
  • There are enough methods to try with a reasonable chance of success for most types of optimization problems
  • Role of the theory: Guide the art, delineate the sound ideas
  • There are challenging implementation issues in all approaches, and no fool-proof methods
  • Problem approximation and feature selection require domain-specific knowledge
  • Training algorithms are not as reliable as you might think by reading the literature
  • Approximate PI involves oscillations (note: PI means policy iteration)
  • Recognizing success or failure can be a challenge!
  • The RL success in game context are spectacular, but they have benefited from prefect known and stable models and small number of controls (per state)
  • Problems with partial state observation remain a big challenge
  • Massive computational power together with distributed computation are a source of hope
  • Silver lining: We can begin to address practical problems of unimaginable difficulty!
  • There is an exciting journey ahead!

--

--

--

Guest Editor, MLJ Special Issue, https://bit.ly/2B4hWCB; Co-Chair, ICML19 Workshop, https://bit.ly/2MxKHNr; DeepRL Overview, https://bit.ly/2lp5h3y; PhD UofA

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Detect suspicious behaviour on CCTV cameras

TimeSformer: Is Space-Time Attention All You Need for Video Understanding?

Introduction to Fraud Detection Systems

The Mirror: encounters of the third kind with natural language data

Identifying disaster-related tweets using deep learning and natural language processing with Fast…

MS-DAYOLO: Multiscale Domain Adaptive YOLO for Cross-Domain Object Detection

K-Medoid Clustering (PAM)Algorithm in Python

GAN — Generative Adversarial Network

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yuxi Li

Yuxi Li

Guest Editor, MLJ Special Issue, https://bit.ly/2B4hWCB; Co-Chair, ICML19 Workshop, https://bit.ly/2MxKHNr; DeepRL Overview, https://bit.ly/2lp5h3y; PhD UofA

More from Medium

The importance of invariance in AI 🤖

Neural Architecture Search w Reinforcement Learning

Removing Clouds in VIIRS Nighttime Images using SpA GAN

Self-Driving Car Using Q-Learning