Side Notes: From Zero to SOTA in Reinforcement Learning

May 31 · 4 min read

In addition to our serialised blogs, ‘AI Distillery’ & ‘Cups to Consciousness’, we’ll occasionally be releasing pieces & materials as stand-alones or short-run series. Speaking to the off-topic, and somewhat ambiguous nature of these materials when contrasted with our other publications, we’ve elected to call it ‘Side Notes’.

Side Notes’ will be just that. Ancillary pieces related to MTank & AI, that we think our readers may like. Formats will vary, but will probably include everything. Think of it as a catch-all for extra projects, talks & ideas.

With that said, over the last year or so we’ve spent a considerable amount of time reading, returning to and distilling a favourite field of ours: Reinforcement Learning (RL). For those interested in RL as a branch of AI, we’re open-sourcing an RL course that we built last year as an introduction for engineers & researchers.

What is reinforcement learning?

Reinforcement Learning is one of the most promising techniques of recent years; enabling large progress in video and board-game playing as well as providing the framework to solve general sequential decision making problems and pave the way towards more intelligent machines. We’ve been fascinated with this field for a while and slowly started studying the core materials and papers from across the web.

While we felt that there was a lot of material available ― personally we recommend Richard Sutton’s book (Link) and David Silver’s UCL RL course (Link) ― generally it was scattered, too introductory or too advanced. Ultimately, despite looking we found no good sub-10 hour primers for people interested in the space. So we built one.

The thing we made

We created the RL course in two parts: Intro to RL and Intro to Deep RL. The first handles some of the theoretical bases of RL ― policies, rewards, equations, all that good stuff. The latter quickly brings readers through some of the State-of-the-art (SOTA) approaches that keep cropping up in the media when AI companies demolish human gamers with their intelligent RL agents.

Link to slides

Tile-based view of the first 20 slides out of 125. There are too many GIFs inside so initial loading of the slides can take some time. This is a price worth paying for awesome GIFs.

Where the course appeared/Who we made it for

The course was originally assembled for a 4 hour RL workshop that Fernando gave internally at Bosch in 2018. We’re scheduled to deliver an updated version this summer too. Plus he also presented an abridged version of the course at StuttgartAI meetup in February [link].

Ben delivered a 1 hour talk for Open Data Science Conference (ODSC) [Link to site] covering part one and a bit of part two. A video of the talk is available from their site:

Users will have to register to watch, but it is free of charge and takes less than a minute!

The course was made for the total beginner in RL, and for someone who wants to ‘catch up’ on some of the latest techniques in RL and subfields. We made it because we believed there was a lack of comprehensive, succinct materials resources on RL that prioritised real-world research examples as well as modern approaches.

Quite frankly, we built it for ourselves. But in doing so, we created something that may have value for other people too. The slides are not finalised and will continue to be updated as time goes on and techniques change. But hopefully someone else will derive value from this work too.

Breakdown at a glance

Part one covers the basics, core concepts and fundamentals within RL. The main applications, inspiration and vision behind RL. Other parts of the main iceberg included the ‘credit assignment’ problem, exploration vs exploitation. On the algorithmic side we covered: Monte Carlo vs Temporal Difference, plus Dynamic Programming (policy and value iteration).

Once readers have a handle on part one, part two should be reasonably straightforward conceptually as we are just building on the main concepts from part one. As can be seen below, we added the latest approaches in RL which heavily use Deep Learning. These can be roughly categorised as Policy Gradient methods and Deep Q-Learning based methods. Happy studying!

Wrapping up

We hope that some of our readers, or future readers, will find this blitz course on Reinforcement Learning valuable, link here. If you do, or know someone who might, then pass it on. Also we plan to constantly change, refine and update these materials. So don’t be shy on dropping a comment, or an email to, and help us make it even better!

If you have any questions, need a reference, or want a clarification from us. You know where to find us, always happy to help!

We promised many GIFs


Written by


The MTank team creates unique resources on Artificial Intelligence for free and for everyone. Find us @