Resources for Deep Reinforcement Learning

Yuxi Li
13 min readSep 16, 2018

--

Last updated: December 28, 2018

This is a collection of resources for deep reinforcement learning, including the following sections: Books, Surveys and Reports, Courses, Tutorials and Talks, Conferences, Journals and Workshops, Blogs, and, Benchmarks and Testbeds. This blog is very long, with lots of resources. See Table of Contents.

This blog is based on Deep Reinforcement Learning: An Overview. These resources are about reinforcement learning core elements, important mechanisms, and applications, as in the overview, also include topics for deep learning, reinforcement learning, machine learning, and, AI. I compile this blog to complement the above book draft, for flexible updates.

If pick three study materials:

Two new ones came out recently:

Pick three survey papers:

  • LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521:436–444.
  • Jordan, M. I. and Mitchell, T. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245):255–260.
  • Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback. Nature, 521:445–451.

There are excellent invited talks, tutorials, workshops in recent conferences, like NIPS, ICML, ICLR, ACL, CVPR, AAAI, IJCAI, etc. Many of them are not included here.

Books

Reinforcement Learning:

  • Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd Edition). MIT Press. The definitive and intuitive reinforcement learning book. Accompanying Lectures, Python code.
  • Szepesvári, C. (2010). Algorithms for Reinforcement Learning. Morgan & Claypool.
  • Bertsekas, D. P. (2019). Reinforcement Learning and Optimal Control (draft). Athena Scientific.
  • Bertsekas, D. P. (2012). Dynamic programming and optimal control (Vol. II, 4th Edition: Approximate Dynamic Programming). Athena Scientific.
  • Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
  • Powell, W. B. (2011). Approximate Dynamic Programming: Solving the curses of dimensionality (2nd Edition). John Wiley and Sons.
  • Wiering, M. and van Otterlo, M., editors (2012). Reinforcement Learning: State-of-the-Art. Springer.
  • Puterman, M. L. (2005). Markov decision processes : discrete stochastic dynamic programming. Wiley-Interscience.
  • Lattimore, T. and Szepesvári, C. (2018). Bandit Algorithms. Cambridge University Press.

Deep Learning

  • Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

Machine Learning

  • Bishop, C. (2011). Pattern Recognition and Machine Learning. Springer.
  • Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
  • Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. The MIT Press.
  • Zhou, Z.-H. (2016). Machine Learning (in Chinese). Tsinghua University Press, Beijing, China.
  • Mitchell, T. (1997). Machine Learning. McGraw Hill.
  • James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.
  • Kuhn, M. and Johnson, K. (2013). Applied Predictive Modeling. Springer.
  • Provost, F. and Fawcett, T. (2013). Data Science for Business. O’Reilly.
  • Simeone, O. (2017). A Brief Introduction to Machine Learning for Engineers. ArXiv.
  • Vapnik, V. N. (1998). Statistical Learning Theory. Wiley.
  • Haykin, S. (2008). Neural Networks and Learning Machines (third edition). Prentice Hall.

Causality

  • Pearl, J. (2009). Causality. Cambridge University Press.
  • Pearl, J., Glymour, M., and Jewell, N. P. (2016). Causal Inference in Statistics: A Primer. Wiley.
  • Pearl, J. and Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.
  • Peters, J., Janzing, D., and Schölkopf, B. (2017). Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press.

Natural Language Processing (NLP)

  • Jurafsky, D. and Martin, J. H. (2017). Speech and Language Processing (3rd ed. draft). Prentice Hall.
  • Goldberg, Y. (2017). Neural Network Methods for Natural Language Processing. Morgan & Claypool.
  • Deng, L. and Liu, Y., editors (2018). Deep Learning in Natural Language Processing. Springer.

Semi-supervised Learning

  • Zhu, X. and Goldberg, A. B. (2009). Introduction to semi-supervised learning. Morgan & Claypool.

Learning to learn

  • Hutter, F., Kotthoff, L., and Vanschoren, J., editors (2018). Automatic Machine Learning: Methods, Systems, Challenges. Springer. In press, available at http://automl.org/book.
  • Chen, Z. and Liu, B. (2016). Lifelong Machine Learning. Morgan & Claypool.

Game Theory

  • Leyton-Brown, K. and Shoham, Y. (2008). Essentials of Game Theory: A Concise, Multidisciplinary Introduction. Morgan & Claypool.

Finance

  • Hull, J. C., Options, Futures and Other Derivatives, Prentice Hall.

Transportation

  • Bazzan, A. L. and Klügl, F. (2014). Introduction to Intelligent Systems in Traffic and Transportation. Morgan & Claypool

Artificial Intelligence

  • Russell, S. and Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd edition). Pearson.

Go to Table of Contents

Surveys and Reports

Reinforcement Learning

  • Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback. Nature, 521:445–451.
  • Kaelbling, L. P., Littman, M. L., and Moore, A. (1996). Reinforcement learning: A survey. JAIR, 4:237–285.
  • Li, Y. (2017). Deep Reinforcement Learning: An Overview. ArXiv.
  • Levine, S. (2018). Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review. ArXiv.
  • Recht, B. (2018). A Tour of Reinforcement Learning: The View from Continuous Control. ArXiv.
  • Geramifard, A., Walsh, T. J., Tellex, S., Chowdhary, G., Roy, N., and How, J. P. (2013). A tutorial on linear function approximators for dynamic programming and reinforcement learning. Foundations and Trends® in Machine Learning, 6(4):375–451.
  • Grondman, I., Busoniu, L., Lopes, G. A., and Babuška, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):1291–1307.
  • Roijers, D. M., Vamplew, P., Whiteson, S., and Dazeley, R. (2013). A survey of multi-objective sequential decision-making. JAIR, 48:67–113.

Deep Learning

  • LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521:436–444.
  • Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., and Liao, Q. (2017). Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. International Journal of Automation and Computing, 14(5):503–519.
  • Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. TPAMI, 35(8):1798–1828.
  • Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends®in Machine Learning, 2(1):1–127.
  • Deng, L. and Dong, Y. (2014). Deep learning: Methods and applications. Foundations and Trends® in Signal Processing, 7(3–4):197–387.
  • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61:85–117.
  • Wang, H. and Raj, B. (2017). On the Origin of Deep Learning. ArXiv.
  • Sze, V., Chen, Y.-H., Yang, T.-J., and Emer, J. (2017). Efficient Processing of Deep Neural Networks: A Tutorial and Survey. ArXiv.

Machine Learning

  • Jordan, M. I. and Mitchell, T. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245):255–260.
  • Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87.
  • Bottou, L., Curtis, F. E., and Nocedal, J. (2018). Optimization methods for large-scale machine learning. SIAM Review, 60(2):223–311.
  • Ng, A. (2018). Machine Learning Yearning (draft). deeplearning.ai.
  • Zinkevich, M. (2017). Rules of Machine Learning: Best Practices for ML Engineering.
  • Andrieu, C., de Freitas, N., Doucet, A., and Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine Learning, 50(1–2):5–43.

Causality

  • Pearl, J. (2018). The seven pillars of causal reasoning with reflections on machine learning. UCLA Technical Report R-481.
  • Guo, R., Cheng, L., Li, J., Hahn, P. R., and Liu, H. (2018). A Survey of Learning Causality with Data: Problems and Methods. ArXiv e-prints.

Graph Neural Networks

  • Battaglia, P. W., Hamrick, J. B., Bapst, V., et al. (2018). Relational inductive biases, deep learning, and graph networks. ArXiv.
  • Zhang, Z., Cui, P., and Zhu, W. (2018c). Deep learning on graphs: A survey. ArXiv.
  • Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., and Sun, M. (2018a). Graph neural networks: A review of methods and applications. ArXiv.

Exploration

  • Li, L. (2012). Sample complexity bounds of exploration. In Wiering, M. and van Otterlo, M., editors, Reinforcement Learning: State-of-the-Art, pages 175–204. Springer-Verlag Berlin Heidelberg.

Transfer Learning

  • Taylor, M. E. and Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. JMLR, 10:1633–1685.
  • Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359.
  • Weiss, K., Khoshgoftaar, T. M., and Wang, D. (2016). A survey of transfer learning. Journal of Big Data, 3(9).

Multi-task Learning

  • Zhang, Y., , and Yang, Q. (2018). An overview of multi-task learning. National Science Review, 5:30–43.
  • Ruder, S. (2017). An Overview of Multi-Task Learning in Deep Neural Networks. ArXiv.

Neural Architecture Search

  • Elsken, T., Hendrik Metzen, J., and Hutter, F. (2018). Neural Architecture Search: A Survey. ArXiv.

Learning to Learn

Successor Representation

  • Gershman, S. J. (2018). The successor representation: Its computational logic and neural substrates. Journal of Neuroscience, 38(33):7193–7200.

Bayesian RL

  • Ghavamzadeh, M., Mannor, S., Pineau, J., and Tamar, A. (2015). Bayesian reinforcement learning: a survey. Foundations and Trends in Machine Learning, 8(5–6):359–483.

Monte Carlo tree search (MCTS)

  • Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., and Colton, S. (2012). A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43.
  • Gelly, S., Schoenauer, M., Sebag, M., Teytaud, O., Kocsis, L., Silver, D., and Szepesvári, C. (2012). The grand challenge of computer go: Monte carlo tree search and extensions. Communications of the ACM, 55(3):106–113.

Attention and Memory

Intrinsic Motivation

  • Barto, A. (2013). Intrinsic motivation and reinforcement learning. In Baldassarre, G. and Mirolli, M., editors, Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin, Heidelberg.
  • Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3):230–247.
  • Oudeyer, P.-Y. and Kaplan, F. (2007). What is intrinsic motivation? a typology of computational approaches. Frontiers in neurorobotics, 1(6).

Evolution Strategy

  • Hansen, N. (2016). The CMA Evolution Strategy: A Tutorial. ArXiv.

Robotics

  • Kober, J., Bagnell, J. A., and Peters, J. (2013). Reinforcement learning in robotics: A survey. International Journal of Robotics Research, 32(11):1238–1278.
  • Deisenroth, M. P., Neumann, G., and Peters, J. (2013). A survey on policy search for robotics. Foundations and Trend in Robotics, 2:1–142.
  • Argall, B. D., Chernova, S., Veloso, M., and Browning, B. (2009). A survey of robot learning from demonstration.Robotics and Autonomous Systems, 57(5):469–483.

Natural Language Processing (NLP)

  • Hirschberg, J. and Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245):261–266.
  • Cho, K. (2015). Natural Language Understanding with Distributed Representation. ArXiv.
  • Young, T., Hazarika, D., Poria, S., and Cambria, E. (2017). Recent Trends in Deep Learning Based Natural Language Processing. ArXiv.

Dialogue Systems

  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., , and Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 82.
  • Deng, L. and Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transac- tions on Audio, Speech, and Language Processing, 21(5):1060–1089.
  • Gao, J., Galley, M., and Li, L. (2018). Neural approaches to Conversational AI. Foundations and Trends in Information Retrieval. To appear.
  • He, X. and Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. Proceedings of the IEEE | Vol. 101, №5, May 2013, 101(5):1116–1135.
  • Young, S., Gašić, M., Thomson, B., and Williams, J. D. (2013). POMDP-based statistical spoken dialogue systems: a review. Proceedings of IEEE, 101(5):1160–1179.

Computer Vision

  • Zhang, Q. and Zhu, S.-C. (2018). Visual interpretability for deep learning: a survey. Frontiers of Information Technology & Electronic Engineering, 19(1):27–39.
  • Bohg, J., Hausman, K., Sankaran, B., Brock, O., Kragic, D., Schaal, S., and Sukhatme, G. S. (2017). Interactive perception: Leveraging action in perception and perception in action. IEEE Transactions on Robotics, 33(6):1273–1291.

Recommender System

  • Zhang, S., Yao, L., Sun, A., and Tay, Y. (2017). Deep Learning based Recommender System: A Survey and New Perspectives. ArXiv e-prints.

Healthcare

  • Chakraborty, B. and Murphy, S. A. (2014). Dynamic treatment regimes. Annual Review of Statistics and Its Application, 1:447–464.

Energy

  • Anderson, R. N., Boulanger, A., Powell, W. B., and Scott, W. (2011). Adaptive stochastic control for the smart grid. Proceedings of the IEEE, 99(6):1098–1115.

Collection of Applications

AI Safety

  • Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Mané, D. (2016). Concrete Problems in AI Safety. ArXiv.
  • Garcìa, J. and Fernàndez, F. (2015). A comprehensive survey on safe reinforcement learning. JMLR, 16:1437–1480.

Go to Table of Contents

Courses

Reinforcement Learning

Deep Learning

Machine Learning

Robotics

Computer Vision

NLP

Healthcare

AI

Go to Table of Contents

Tutorials and Talks

Reinforcement Learning

Deep Learning

Robotics

Computer Vision

NLP

Fiance & Economics

Healthcare

Education

Security

Transportation

Go to Table of Contents

Conferences, Journals and Workshops

  • NIPS: Neural Information Processing Systems
  • ICML: International Conference on Machine Learning
  • ICLR: International Conference on Learning Representation
  • RLDM: Multidisciplinary Conference on Reinforcement Learning and Decision Making
  • EWRL: European Workshop on Reinforcement Learning
  • Deep Reinforcement Learning Workshop, NIPS 2018, 2017 (Symposium), 2016, 2015; IJCAI 2016
  • AAAI, IJCAI, ACL, EMNLP, NAACL, CVPR, ICCV, ECCV, ICRA, IROS, RSS, SIGDIAL, KDD, SIGIR, WWW, etc.
  • AI Frontiers Conference
  • JMLR, MLJ, AIJ, JAIR, TPAMI, etc
  • Nature Machine Intelligence, Science Robotics
  • Nature May 2015, Science July 2015, survey papers on machine learning/AI
  • Science, July 7, 2017 issue, The Cyberscientist, a special issue about AI
  • http://distill.pub

Go to Table of Contents

Benchmarks and Testbeds

I list some RL testbeds in the following. Common testbeds for general RL algorithms are Atari games, e.g., in the Arcade Learning Environment (ALE), for discrete control, and simulated robots, e.g. using MuJoCo in OpenAI Gym, for continuous control.

  • The Arcade Learning Environment (ALE) is a framework composed of Atari 2600 games to develop and evaluate AI agents.
  • OpenAI Gym is a toolkit for the development of RL algorithms, consisting of environments, e.g., Atari games and simulated robots, and a site for the comparison and reproduction of results. OpenAI Gym has the following environments: algorithmic, Atari, xox2d, classic control, MuJoCo, robotics, and, toy text.
  • MuJoCo, Multi-Joint dynamics with Contact, a physics engine.
  • DeepMind Control Suite
  • DeepMind Lab, DeepMind first-person 3D game platform
  • Deepmind PySC2 — StarCraft II Learning Environment
  • Dopamine, a Tensorflow-based RL framework from Google AI
  • TRFL: Reinforcement Learning Building Blocks
  • David Churchill, CommandCenter: StarCraft 2 AI Bot
  • ELF, an extensive, lightweight and flexible platform for RL research,
    ELF OpenGo: A Reimplementation of AlphaGoZero/AlphaZero using ELF.
  • FAIR TorchCraft is a library for Real-Time Strategy (RTS) games such as StarCraft: Brood War.
  • FAIR Detectron, for computer vision.
  • Ray RLlib: A Composable and Scalable Reinforcement Learning Library
  • ParlAI is a framework for dialogue research, implemented in Python, open-sourced by Facebook.
  • Natural language decathlon (decaNLP), an NLP benchmark suitable for multitask, transfer, and continual learning.
  • Project Malmo, from Microsoft, is an AI research and experimentation platform built on top of Minecraft.
  • Twitter open-sources torch-twrl, a framework for RL development.
  • ViZDoom is a Doom-based AI research platform for visual RL.
  • Baidu Apollo Project, self-driving open-source
  • TORCS is a car racing simulator.
  • CoQA, a large-scale dataset for building conversational QA systems
  • WebNav Challenge for Wikipedia links navigation
  • Psychlab: A Psychology Laboratory for Deep RL Agents
  • RLGlue is a language-independent software for RL experiments.
  • RLPy is a value-function-based reinforcement learning framework for education and research.

Go to Table of Contents

--

--