RL: Where and how to begin your hands-on journey

Branavan Selvasingham
Learning Reinforcement Learning
4 min readOct 20, 2022


It’s time to get hands-on. You’ve done your reading and understand the concepts. The math is fine. But where and how to begin?

Here is the short-list of approachable platforms:

  1. OpenAI Gym (TL;DR: Selected option)
  2. Unity with ML-Agents extension
  3. Deepmind Lab
  4. Amazon SageMaker RL

I’ve personally looked into and used each of these as part of my initial trials. I ultimately chose OpenAI Gym but it was definitely not the first platform I started with.

Let me quickly explain the completely informal and organic selection process along with my observations for for each platform. I looked for the following dimensions:

  • friction of platform setup: OpenAI gym was super easy, Unity ML Agent setup was horrible
  • depth of RL learning roadmap available: All of them showed great roadmaps but OpenAI Gym learning roadmap seemed the most accessible to a novice
  • fidelity of simulated environment: Unity looked awesome for this and I was able to get some of the template agents with pre-trained models working.
  • community support (Open AI gym was by far the widely used platform by the community)

You’ve probably noticed I completely ignored scalability, performance, and tuning capabilities of the platforms in my decision process. Why? Because I want to learn the nuts and bolts of agent-action-state-env-reward interactions at a micro level, really well, and will perform a second round of platform selection to optimize on the macro level.

Unity ML Agents

My trials (read as trials and tribulations… ) with Unity ML Agent setup put a major damper on my dreams of using that as my first platform. I was able to get it running to a basic extent but was still unsure if I had just hacked it together or if the setup was actually complete.

The “Hello world” example on Unity ML Agent Extension: balance a ball on a block head. (Image by author)

Getting to this point took me some time… and a lot of digging… and a lot of incompatible packages… and outdated support. But from this point on, I was able to see the power that laid ahead.

A live snippet of the 3D Ball example where a fully trained block-head is able to balance a ball. (Video by author)

This gave me a tantalizing glimpse into where I need to eventually go for environment and agent simulations as this looked pretty promising. But the path to achieve these out-of-the-box examples felt a bit too unpaved for a novice. So I decided to park Unity for now.

Amazon SageMaker RL

I’ve actually been using this with my team at work and though it’s only a side project (mostly worked on by interns) I was able to get a sense of it. It seemed pretty good from a robotics simulation (via RoboMaker and Gazebo) perspective. The use case we were investigating was the ability to simulate Boston Dynamics’ Spot Robot in an environment that we create.

A screenshot of Spot walking through a custom environment.
A screenshot of Spot simulated in Gazebo walking through a custom environment (a poorly scanned hallway). (Image by author)

The process was a bit choppy to say the least. We struggled through it but got to something that resembled end of job but eventually needed to use Paperspace servers instead. Probably a platform to come back to when considering larger-scale collaboration and high-fidelity industrial environment simulations. Parked.

DeepMind Lab

DeepMind Lab setup and usage seemed a bit in the middle of (maybe closer to) OpenAI Gym’s simplicity and Unity’s ability as a game engine (built from ioQuake3).

DeepMind Lab Maze Level Video

The fact that OpenAI Gym existed at a slightly more accessible entry point (and more widely used) and if I wanted a full-on game engine I’d probably use Unity at some point, kept me biased against using DeepMind Lab as my starting point. I do think this is probably my next platform after getting up to speed via Gym. Sorry if you were looking for a more in-depth exploration of Lab at this point but as of right now I’m eager to get hands-on rather than compare further. But, I’ll be back. 😎

OpenAI Gym

Ah… finally. The search is over. The eagle has landed.

A screenshot of the lunar lander environment showing a high reward episode
A successful, high reward, episode of the Lunar Lander example environment (image by author)

I’ve mentioned many of the positive aspects of OpenAI Gym through out this article but I’ll summarize once more if you skipped right to this section:

  • The Python of the RL world in my opinion. Just as how Python has made programming way more accessible to a wide spectrum of users, OpenAI Gym (I think) has made RL way more accessible. It’s written in python and you just pip install and you’re ready to go. The environment classes are easy to understand and you actually get a sense (even as a novice) that you may be able to make a new environment on your own here. Which is to say the ramp to deeper RL understanding was very promising.
  • Easy setup process. There are no package version compatibility issues and you can start to bring in your familiar python packages and use them accordingly.
  • I did have some questions as to how to get from the standard heuristic examples they provide (the lunar lander example) to a situation where I’m actually training the agent one episode at a time. Thankfully the community support was fantastic. Something I hope to pay back at some point.
  • Though the environment simulation wasn’t high fidelity, it was good enough for me to get a sense of what is going on. And the nostalgia of the provided environments helped a lot as well.

So that was my choice. Get hands-on with OpenAI Gym. My next articles will focus solely on the journey of understanding the basic concepts of RL (via Gym).



Branavan Selvasingham
Learning Reinforcement Learning

Perpetual learner. Trying to share the lessons learned. Building and managing production AI solutions.