Reinforcement Learning Applications

Yuxi Li
Yuxi Li
Oct 15, 2018 · 20 min read

Last updated: Feburary 28, 2019.

Reinforcement learning (RL) has made tremendous achievements, e.g., AlphaGo. Here I list (deep) RL applications in the following categories: production, autonomous driving, business management, computer systems, computer vision, education, energy, finance, games, healthcare, natural language processing (NLP), robotics, “science, engineering and arts”, and transportation. See Table of Contents.

I list “production” in the first place to highlight RL’s real life deployments, and list the rest alphabetically. I list only a few papers for games and robotics, since they are traditional RL application areas, and many people are familiar with them.

There are actually too many RL applications papers for me to update in this blog in a reasonble way. (I may update once more in one month or so.) I’d like to make some general comments as below:

  • check references and citations of papers here
  • check Google/Google Scholar
  • keep track of recent papers, e.g., on arXiv and on conferences like ICML, NeurIPS, KDD, WWW, RecSys, etc.
  • keep track of some authors
  • check ICML 2019 RL for Real Life Workshop

I focus on recent work. This blog is complementary to Deep Reinforcement Learning: An Overview.

Image for post

Previously, RL applications are discussed/listed in

See my previous blog, Resources for Deep Reinforcement Learning. It contains a section about Benchmarks and Testbeds.

Table of Contents

Production Systems

Go to Table of Contents

Autonomous Driving

  • Lange, S., Riedmiller, M., and Voigtla ̈nder, A. (2012). Autonomous reinforcement learning on raw visual input data in a real world application. In International Joint Conference on Neural Networks (IJCNN).
  • Li, D., Zhao, D., Zhang, Q., and Chen, Y. (2019a). Reinforcement learning and deep learning based lateral control for autonomous driving. IEEE Computational Intelligence Magazine (to appear).

Go to Table of Contents

Business Management

  • Nazari, M., Oroojlooy, A., Snyder, L., and Takáč, M. (2018). Reinforcement learning for solving the vehicle routing problem. In NIPS
  • Shi, J.-C., Yu, Y., Da, Q., Chen, S.-Y., and Zeng, A.-X. (2018). Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning. ArXiv.
  • Silver, D., Newnham, L., Barker, D., Weller, S., and McFall, J. (2013). Concurrent reinforcement learning from customer interactions. In ICML.

Business Management: Recommender Systems

  • Zhang, S., Yao, L., Sun, A., and Tay, Y. (2017). Deep Learning based Recommender System: A Survey and New Perspectives. ArXiv.
  • Chen, S.-Y., Yu, Y., Da, Q., Tan, J., Huang, H.-K., and Tang, H.-H. (2018). Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In KDD.
  • Hu, Y., Da, Q., Zeng, A., Yu, Y., and Xu, Y. (2018). Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In KDD.
  • Hu, Z., Liang, Y., Liu, Y., and Zhang, J. (2018b). Inference aided reinforcement learning for incentive mechanism design in crowdsourcing. In NIPS.
  • Lattimore, T., Kveton, B., Li, S., and Szepesvári, C. (2018). Toprank: A practical algorithm for online stochastic ranking. In NIPS.
  • Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In WWW.
  • Jiang, N. and Li, L. (2016). Doubly robust off-policy value evaluation for reinforcement learning. In ICML.
  • Theocharous, G., Thomas, P. S., and Ghavamzadeh, M. (2015). Personalized ad recommendation systems for life-time value optimization with guarantees. In IJCAI.
  • Zhao, X., Xia, L., Zhang, L., Ding, Z., Yin, D., and Tang, J. (2018). Deep reinforcement learning for page-wise recommendations. In ACM RecSys.
  • Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In KDD.
  • Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018a). DRN: A deep reinforcement learning framework for news recommendation. In WWW.

Go to Table of Contents

Computer Systems

CS: Database

CS: Hardware

CS: Machine Learning

  • Hsu, K., Levine, S., and Finn, C. (2018). Unsupervised Learning via Meta-Learning. ArXiv e-prints.
  • Hu, Z., Yang, Z., Salakhutdinov, R., Qin, L., Liang, X., Dong, H., and Xing, E. (2018). Deep generative models with learnable knowledge constraints. In NIPS. (learn knowledge constraints with inverse RL)
  • Negrinho, R., Gormley, M., and Gordon, G. (2018). Learning beam search policies via imitation learning. In NIPS.
  • He, Y., Lin, J., Liu, Z., Wang, H., Li, L.-J., and Han, S. (2018). AMC: AutoML for model compression and acceleration on mobile devices. In ECCV.

CS: Networking

  • Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad Alizadeh, Learning Scheduling Algorithms for Data Processing Clusters, ACM SIGCOMM 2019
  • Hongzi Mao, et al., Park: An Open Platform for Learning Augmented Computer Systems, ICML 2019 RL for Real Life Workshop

CS: Program Synthesis

  • Faust, A., Aimone, J. B., James, C. D., and Tapia, L. (2018). Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting. ArXiv e-prints.
  • Liang, C., Berant, J., Le, Q., Forbus, K. D., and Lao, N. (2017). Neural symbolic machines: Learning semantic parsers on freebase with weak supervision. In ACL.
  • Liang, C., Norouzi, M., Berant, J., Le, Q. V., and Lao, N. (2018). Memory augmented policy optimization for program synthesis with generalization. In NIPS.
  • Nachum, O., Norouzi, M., and Schuurmans, D. (2017). Improving policy gradient by exploring under-appreciated rewards. In ICLR.
  • Parisotto, E., Mohamed, A.-r., Singh, R., Li, L., Zhou, D., and Kohli, P. (2017). Neuro-symbolic program synthesis. In ICLR.
  • Reed, S. and de Freitas, N. (2016). Neural programmer-interpreters. In ICLR.
  • Vinyals, O., Fortunato, M., and Jaitly, N. (2015). Pointer networks. In NIPS.
  • Zaremba, W. and Sutskever, I. (2015). Reinforcement Learning Neural Turing Machines — Revised. ArXiv.
  • Zhang, L., Rosenblatt, G., Fetaya, E., Liao, R., Byrd, W., Might, M., Urtasun, R., and Zemel, R. (2018). Neural guided constraint logic programming for program synthesis. In NIPS.

CS: Scheduling

  • Mirhoseini, A., Pham, H., Le, Q. V., Steiner, B., Larsen, R., Zhou, Y., Kumar, N., and Mohammad Norouzi, Samy Bengio, J. D. (2017). Device placement optimization with reinforcement learning. In ICML.

CS: Security

  • Havens, A., Jiang, Z., and Sarkar, S. (2018). Online robust policy learning in the presence of unknown adversaries. In NIPS.
  • Huang, S., Papernot, N., Goodfellow, I., Duan, Y., and Abbeel, P. (2017). Adversarial attacks on neural network policies. In ICLR Workshop Track.
  • Jun, K.-S., Li, L., Ma, Y., and Zhu, X. (2018). Adversarial attacks on stochastic bandits. In NIPS.

CS: Software Testing


  • Tamar, A., Wu, Y., Thomas, G., Levine, S., and Abbeel, P. (2016). Value iteration networks. In NIPS. (Wikipedia links navigation)

Go to Table of Contents

Computer Vision

CV: Recognition

  • Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W. F., and Yan, S. (2016). Tree-structured reinforcement learning for sequential object localization. In NIPS.
  • Kong, X., Xin, B., Wang, Y., and Hua, G. (2017). Collaborative deep reinforcement learning for joint object search. In CVPR.
  • Krull, A., Brachmann, E., Nowozin, S., Michel, F., Shotton, J., and Rother, C. (2017). Poseagent: Budget-constrained 6d object pose estimation via reinforcement learning. In CVPR.
  • Mathe, S., Pirinen, A., and Sminchisescu, C. (2016). Reinforcement learning for visual object detection. In CVPR.
  • Mnih, V., Heess, N., Graves, A., and Kavukcuoglu, K. (2014). Recurrent models of visual attention. In NIPS.
  • Rao, Y., Lu, J., and Zhou, J. (2017). Attention-aware deep reinforcement learning for video face recognition. In ICCV.
  • Welleck, S., Mao, J., Cho, K., and Zhang, Z. (2017). Saliency-based sequential image attention with multiset prediction. In NIPS.
  • Rhinehart, N. and Kitani, K. M. (2017). First-person activity forecasting with online inverse reinforcement learning. In ICCV.
  • Supancic, III, J. and Ramanan, D. (2017). Tracking as online decision-making: Learning a policy from streaming videos with reinforcement learning. In ICCV.
  • Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., and Wang, Z. (2018). Predicting head movement in panoramic video: A deep reinforcement learning approach. TPAMI.
  • Yun, S., Choi, J., Yoo, Y., Yun, K., and Young Choi, J. (2017). Action-decision networks for visual tracking with deep reinforcement learning. In CVPR.

CV: Scene Understanding

  • Wu, J., Lu, E., Kohli, P., Freeman, B., and Tenenbaum, J. (2017). Learning to see physics via visual de-animation. In NIPS.
  • Wu, J., Tenenbaum, J. B., and Kohli, P. (2017). Neural scene de-rendering. In CVPR.

CV: Interactive Perception

  • Jayaraman, D. and Grauman, K. (2018). Learning to look around: Intelligently exploring unseen environments for unknown tasks. In CVPR.

CV: More Topics

  • Brunner, G., Richter, O., Wang, Y., and Wattenhofer, R. (2018). Teaching a machine to read maps with deep reinforcement learning. In AAAI.
  • Cao, Q., Lin, L., Shi, Y., Liang, X., and Li, G. (2017). Attention-aware face hallucination via deep reinforcement learning. In CVPR.
  • Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q. V. (2018). AutoAugment: Learning Augmentation Policies from Data. ArXiv.
  • Devrim Kaba, M., Gokhan Uzunbas, M., and Nam Lim, S. (2017). A reinforcement learning approach to the view planning problem. In CVPR.
  • Liu, F., Li, S., Zhang, L., Zhou, C., Ye, R., Wang, Y., and Lu, J. (2017). 3DCNN-DQN-RNN: A deep reinforcement learning framework for semantic parsing of large-scale 3d point clouds. In ICCV.
  • see captioning, visual dialog, visual QA, etc. in the section of NLP
  • see games and robotics for computer vision issues

Go to Table of Contents


  • Mandel, T., Liu, Y. E., Levine, S., Brunskill, E.,and Popović, Z.(2014). Offline policy evaluation across representations with applications to educational games. In AAMAS.
  • Upadhyay, U., De, A., and Rodriguez, M. G. (2018). Deep reinforcement learning of marked temporal point processes. In NIPS.

Go to Table of Contents


  • Glavic, M., Fonteneau, R., and Ernst, D. (2017). Reinforcement learning for electric power system decision and control: Past considerations and perspectives. In The 20th World Congress of the International Federation of Automatic Control.
  • Lazic, N., Boutilier, C., Lu, T., Wong, E., Roy, B., Ryu, M., and Imwalle, G. (2018). Data center cooling using model-predictive control. In NIPS.
  • Ruelens, F., Claessens, B. J., Vandael, S., Schutter, B. D., Babuška, R., and Belmans, R. (2016). Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE Transactions on Smart Grid, PP(99):1–11.
  • Wen, Z., O’Neill, D., and Maei, H. (2015). Optimal demand response using device-based reinforcement learning. IEEE Transactions on Smart Grid, 6(5):2312–2324.

Go to Table of Contents


  • Deng, Y., Bao, F., Kong, Y., Ren, Z., and Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems.
  • Li, Y., Szepesvári, C., and Schuurmans, D. (2009). Learning exercise policies for American options. In AISTATS.
  • Longstaff, F. A. and Schwartz, E. S. (2001). Valuing American options by simulation: a simple least-squares approach. The Review of Financial Studies, 14(1):113–147.
  • Moody, J. and Saffell, M. (2001). Learning to trade via direct reinforcement. TNN, 12(4):875–889.
  • Nevmyvaka, Y., Feng, Y., and Kearns, M. (2006). Reinforcement learning for optimized trade execution. In ICML.
  • Prashanth, L., Jie, C., Fu, M., Marcus, S., and Szepesvári, C. (2016). Cumulative prospect theory meets reinforcement learning: Prediction and control. In ICML.
  • Tsitsiklis, J. N. and Van Roy, B. (2001). Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks, 12(4):694–703.

Go to Table of Contents


  • Justesen, N., Bontrager, P., Togelius, J., and Risi, S. (2017). Deep Learning for Video Game Playing. ArXiv.
  • Yannakakis, G. N. and Togelius, J. (2018). Artificial Intelligence and Games. Springer.
  • AlphaStar: Mastering the Real-Time Strategy Game StarCraft II
  • Aytar, Y., Pfaff, T., Budden, D., Paine, T., Wang, Z., and de Freitas, N. (2018). Playing hard exploration games by watching YouTube. In NIPS.
  • Bowling, M., Burch, N., Johanson, M., and Tammelin, O. (2015). Heads-up limit hold’em poker is solved. Science, 347(6218):145–149.
  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529–533.
  • Moravčík, M., Schmid, M., Burch, N., Lisý, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., and Bowling, M. (2017). Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science.
  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489.
  • Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., and Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550:354–359.
  • Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., and Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144.

Go to Table of Contents


  • Deep reinforcement learning for medical imaging, tutorial at MICCAI 2018
  • Goldberg, Y. and Kosorok, M. R. (2012). Q-learning with censored data. Annals of Statistics, 40(1):529–560.
  • Kallus, N. and Zhou, A. (2018). Confounding-robust policy improvement. In NIPS.
  • Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C., and Faisal, A. A. (2018). The artifi- cial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine, 24:1716–1720.
  • Li, Y., Liang, X., Hu, Z., and Xing, E. (2018). Hybrid retrieval-generation reinforced agent for medical image report generation. In NIPS.
  • Ling, Y., Hasan, S. A., Datla, V., Qadir, A., Lee, K., Liu, J., and Farri, O. (2017). Diagnostic inferencing via improving clinical concept extraction with deep reinforcement learning: A preliminary study. In Machine Learning for Healthcare.
  • Liu, Y., Gottesman, O., Raghu, A., Komorowski, M., Faisal, A. A., Doshi-Velez, F., and Brunskill, E. (2018b). Representation balancing MDPs for off-policy policy evaluation. In NIPS.
  • Peng, Y.-S., Tang, K., Lin, H.-T., and Chang, E. (2018). Exploring sparse features in deep reinforcement learning towards fast disease diagnosis. In NIPS.
  • Shortreed, S. M., Laber, E., Lizotte, D. J., Stroup, T. S., Pineau, J., and Murphy, S. A. (2011). Informing sequential clinical decision-making through reinforcement learning: an empirical study. MLJ, 84:109–136.

Go to Table of Contents

Natural Language Processing (NLP)

NLP: Captioning

  • Ren, Z., Wang, X., Zhang, N., Lv, X., and Li, L.-J. (2017). Deep reinforcement learning-based image captioning with embedding reward. In CVPR.
  • Wang, X., Chen, W., Wu, J., Wang, Y.-F., and Wang, W. Y. (2018). Video captioning via hierarchical reinforcement learning. In CVPR.
  • Xu, K., Ba, J. L., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R. S., and Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In ICML.

NLP: Dialogue

  • Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y.-N., Ahmed, F., and Deng, L. (2017). End-to-end reinforcement learning of dialogue agents for information access. In ACL.
  • Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J., and Jurafsky, D. (2016). Deep reinforcement learning for dialogue generation. In EMNLP.
  • Su, P.-H., Gašić, M., Mrkšić, N., Rojas-Barahona, L., Ultes, S., Vandyke, D., Wen, T.-H., and Young, S. (2016). On-line active reward learning for policy optimisation in spoken dialogue systems. In ACL.
  • Many more, check Deep Reinforcement Learning

NLP: Information Extraction

NLP: Information Retrieval

NLP: Linguistics

  • Liang, C., Berant, J., Le, Q., Forbus, K. D., and Lao, N. (2017). Neural symbolic machines: Learning semantic parsers on freebase with weak supervision. In ACL.
  • Yogatama, D., Blunsom, P., Dyer, C., Grefenstette, E., and Ling, W. (2017). Learning to compose words into sentences with reinforcement learning. In ICLR.

NLP: Question Answering (QA)

  • Shen, Y., Huang, P.-S., Gao, J., and Chen, W. (2017). Reasonet: Learning to stop reading in machine comprehension. In KDD.

NLP: Reasoning

  • Liang, X., Lee, L., and Xing, E. P. (2017). Deep variation-structured reinforcement learning for visual relationship and attribute detection. In CVPR.
  • Xiong, W., Hoang, T., and Wang, W. Y. (2017c). Deeppath: A reinforcement learning method for knowledge graph reasoning. In EMNLP.

NLP: Sequence Generation

  • Jaques, N., Gu, S., Bahdanau, D., Hernández-Lobato, J. M., Turner, R. E., and Eck, D. (2017). Sequence tutor: Conservative fine-tuning of sequence generation models with KL-control. In ICML.
  • Ranzato, M., Chopra, S., Auli, M., and Zaremba, W. (2016). Sequence level training with recurrent neural networks. In ICLR.
  • Yu, L., Zhang, W., Wang, J., and Yu, Y. (2017). SeqGAN: Sequence generative adversarial nets with policy gradient. In AAAI.

NLP: Summarization

  • Zhang, X. and Lapata, M. (2017). Sentence simplification with deep reinforcement learning. In EMNLP.

NLP: Text Games

  • He, J., Chen, J., He, X., Gao, J., Li, L., Deng, L., and Ostendorf, M. (2016b). Deep reinforcement learning with a natural language action space. In ACL.
  • Narasimhan, K., Kulkarni, T., and Barzilay, R. (2015). Language understanding for text-based games using deep reinforcement learning. In EMNLP.

NLP: Translation

NLP: Visual Dialog

  • Strub, F., de Vries, H., Mary, J., Piot, B., Courville, A., and Pietquin, O. (2017). End-to-end optimization of goal-driven and visually grounded dialogue systems. In IJCAI.

NLP: More Topics

  • Wu, J., Li, L., and Wang, W. Y. (2018). Reinforced co-training. In NAACL.

Go to Table of Contents


  • Argall, B. D., Chernova, S., Veloso, M., and Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5):469–483.
  • Deisenroth, M. P., Neumann, G., and Peters, J. (2013). A survey on policy search for robotics. Foundations and Trend in Robotics, 2:1–142.
  • Kober, J., Bagnell, J. A., and Peters, J. (2013). Reinforcement learning in robotics: A survey. International Journal of Robotics Research, 32(11):1238–1278.

Robotics: Dexterity

Robotics: Sim-to-Real

  • Chen, T., Murali, A., and Gupta, A. (2018b). Hardware conditioned policies for multi-robot transfer learning. In NIPS.
  • Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., and Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26).
  • Peng, X. B., Andrychowicz, M., Zaremba, W., and Abbeel, P. (2017). Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. ArXiv.
  • Rusu, A. A., Vecerik, M., Rothörl, T., Heess, N., Pascanu, R., and Hadsell, R. (2017). Sim-to-real robot learning from pixels with progressive nets. In Conference on Robot Learning (CoRL).
  • Sadeghi, F., Toshev, A., Jang, E., and Levine, S. (2018). Sim2real view invariant visual servoing by recurrent control. In CVPR.

Robotics: Value-based Learning

  • Levine, S., Finn, C., Darrell, T., and Abbeel, P. (2016). End-to-end training of deep visuomotor policies. JMLR, 17:1–40.
  • Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Li, F.-F., and Farhadi, A. (2017). Target-driven visual navigation in indoor scenes using deep reinforcement learning. In ICRA.

Robotics: Policy-based Learning

  • Gu, S., Holly, E., Lillicrap, T., and Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In ICRA.

Go to Table of Contents

Science, Engineering, and Art

Science: Biology

Science: Chemistry

  • You, J., Liu, B., Ying, R., Pande, V., and Leskovec, J. (2018). Graph convolutional policy network for goal-directed molecular graph generation. In NIPS.
  • Neftci, E. O. and Averbeck, B. B. (2019). Reinforcement learning in artificial and biological systems.Nature Machine Intelligence.

Science: Drug Design

Science: Earth Science


  • Urban, J., Kaliszyk, C., Michalewski, H., and Olšák, M. (2018). Reinforcement learning of theorem proving. In NIPS.

Science: Physics

Art: Animation

  • Peng, X. B., Kanazawa, A., Malik, J., Abbeel, P., and Levine, S. (2018b). Sfv: Reinforcement learning of physical skills from videos. ACM Transaction on Graphics, 37(6).

Art: Drawing

Art: Music

Art: Story Telling

Go to Table of Contents


  • Belletti, F., Haziza, D., Gomes, G., and Bayen, A. M. (2018). Expert level control of ramp metering based on multi-task deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 19(4):1198–1207.
  • El-Tantawy, S., Abdulhai, B., and Abdelgawad, H. (2013). Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on down- town toronto. IEEE Transactions on Intelligent Transportation Systems, 14(3):1140–1150.
  • Khadilkar, H., A Scalable Reinforcement Learning Algorithm for Scheduling Railway Lines, IEEE Transactions on Intelligent Transportation Systems, 20(2): 727-736, 2019
  • Li, M., (Tony)Qin, Z., Jiao, Y., Yang, Y., Gong, Z., Wang, J., Wang, C., Wu, G., and Ye, J. (2019). Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In WWW.
  • Mannion, P., Duggan, J., and Howley, E. (2016). An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In McCluskey, T., Kotsialos, A., Müller, J., Klügl, F., Rana, O., and R., S., editors, Autonomic Road Transport Support Systems, pages 47–66. Springer.
  • Simao, H. P., Day, J., George, A. P., Gifford, T., Nienow, J., and Powell, W. B. (2009). An approximate dynamic programming algorithm for large-scale fleet management: A case application.Transportation Science, 43(2):178–197.
  • Wei, H., Zheng, G., Yao, H., and Li, Z. (2018). Intellilight: A reinforcement learning approach for intelligent traffic light control. In KDD.

Go to Table of Contents

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store