Discovering Reinforcement Learning Through Racing

Published in

SSENSE-TECH

8 min readNov 28, 2019

I’ve always been attracted to AI and machine learning. It probably started with me watching science fiction movies. The idea that machines can learn by themselves and surpass us has always intrigued me. It got me wondering to what extent we might one day rely on AI, and when it might surpass us in most domains. Sure, it will challenge us ethically on many different levels, but I can’t help but be curious and excited about all the possibilities that lie in the years to come. Recently, I’ve been most interested in self-driving cars and how they stand to completely revolutionize our commutes, and by extension our lives.

My current day-to-day isn’t related to AI and machine learning at the moment, and I’m just a curious beginner in the domain. I’ve only tinkered a little bit with Supervised Learning in a workshop, and read a little about the concept of Unsupervised Learning and Reinforcement Learning (with more interest in the latter). Then again, I’ve never encountered an occasion to put these theories into practice. Until last year!

It all started at the last AWS Re:Invent in 2018, when they announced the release of the DeepRacer League. The DeepRacer is a small autonomous vehicle (1/18th scale), powered by Reinforcement Learning, that you can train from the comfort of your living room. In my opinion, it’s one of the easiest ways to learn how Reinforcement Training works and to actually put your learnings in practice by competing against other people globally.

What is Reinforcement Learning

The Wikipedia definition is as follows:
“Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.”

There’s a couple of key concepts here. We’re talking about an agent taking action in an environment, and trying to maximize a cumulative reward. Let’s break down these elements in the context of the Deep Racer:

Agent

The agent is responsible for taking action in a given environment. In our example, it’s the small racing car.

Action

Actions represent what the agent can do in an environment. In our case, you can control the granularity of the throttle (the speed at which the agent can go), and control the steering (turning left or right). The more actions available for the agent to choose from, the longer the training period before we start seeing results.

Environment

The environment, in this case, is simply the track on which the agent is trying to complete laps.

State

The state represents a snapshot of the environment taken by the agent. From the state, we have access to different characteristics in the cumulative reward function.

Cumulative reward

The reward function is where the magic happens. Given the actual state, this is where you give rewards based on point thresholds. The more desirable an action, the bigger the reward.

How to Get Started with DeepRacer

I have to say that AWS couldn’t have made it easier to get started. Assuming that you have an AWS account up and running, all you need to do is to go in the Services -> DeepRacer admin section, and click on the ‘Get Started’ button. From there, all you need to do is to click on the ‘Create Resources’ button, which will take around 5 minutes.

This is what you should see when everything is ready:

While you’re waiting, they offer a handy presentation explaining what Reinforcement Learning is, and how the DeepRacer works. They also cover all the different parameters that are available in the reward function. I highly recommend going over these slides.

Once you’re ready to get started, just click on ‘Create Model’ in the following box to make your first Reinforcement Learning model.

Create Your First Model

The first thing that you need to provide is a name for your model. Then, you need to select the track on which to train your model. If your goal is simply to test how a race works, select the first track. Otherwise, if you want to train a model that will be polyvalent on any type of track, I suggest that you choose a more complex one, like the “AWS Track” for example.

Now comes the ‘Action’ space where you can choose the maximum steering angle and speed, and configure the ranges and heuristics based on which your agent can make decisions. You can see the results in the ‘Action List’ below the inputs to have an idea of all the actions that will be available for the agents based on your choices.

Then comes the more important part, the reward function. This function, written in Python, will get pre-calculated values of the current state as input in the “params” argument. It’s then your job to return the best reward corresponding to what you judge to be the best decisions made by the agent. You can always use the default function or any other sample reward function provided by AWS if you just want to see how it works.

The available parameters are as follows (from AWS DeepRacer Documentation)

{“all_wheels_on_track”: Boolean, # flag to indicate if the vehicle is on the track“x”: float, # vehicle’s x-coordinate in meters“y”: float, # vehicle’s y-coordinate in meters“distance_from_center”: float, # distance in meters from the track center“is_left_of_center”: Boolean, # Flag to indicate if the vehicle is on the left side to the track center or not.“heading”: float, # vehicle’s yaw in degrees“progress”: float, # percentage of track completed“steps”: int, # number steps completed“speed”: float, # vehicle’s speed in meters per second (m/s)“steering_angle”: float, # vehicle’s steering angle in degrees“track_width”: float, # width of the track“waypoints”: [[float, float], … ], # list of [x,y] as milestones along the track center“closest_waypoints”: [int, int] # indices of the two nearest waypoints.}

Next come the ‘Hyperparameters’, where you can configure the training parameters. Unless you really know what you’re doing, I would suggest keeping the default settings.

Finally, the last parameter, the ‘Stop Condition’. This number, in minutes, represents the duration of the training. I suggest training your model for around 4 hours (240 minutes) for improved results.

Ready? Click ‘Start training’ and relax. But don’t forget that training a model isn’t free. The estimated cost is roughly $3.36 USD per hour.

While the model is training, you can explore the ‘Reward’ graph to see how many points the agent made for each episode or lap. If you see that there isn’t any ascending trend after a certain amount of time, I’d suggest stopping your training and working on your reward function.

Evaluate Your Model

When the training is done, a new section will be available on the same page as your training: the ‘Evaluation section’. The goal of the evaluation is to try your model in a virtual race and a selected track. This is how you’re going to see if your model works or not, and evaluate its performance. At a high level, a performant model can be considered as one that always finishes the laps without straying off-course, while maintaining an optimal speed.

Simply choose a track and how many laps to accomplish, and wait. Note that the evaluation doesn’t cost you more money.

A sample evaluation of my model against the Re:Invent 2018 track

The Virtual League

Once you have a good model in hand, you can submit your model to race in the Virtual League. This is where you can race against others in the world. Every time your model races, you will have the best time to make a lap in the results for a given track, and as you improve your model, it would ideally climb up the leaderboard. Prizes can be won once you’ve reached the top 3 or higher.

The Actual Race

I had the opportunity to represent SSENSE at the Fintech Conference in Montreal last November, which gave me the chance to participate in an actual race. If you want to participate in a real race, you have to register with AWS. Once registered, you’ll be assigned a time slot of 30 minutes, during which you’ll be able to test your different models and officially record your laps.

At the venue, you hand over a USB key containing all your competition models to the AWS agent. You can try as many models as you want, but remember that you have a limited amount of time to try them on the track. The agent will then give the DeepRacer car to his colleague on the track, and give you a tablet.

Tablet in hand, it’s now time to start the circuit and try your different models. As the car speeds through the track, you can increase and decrease the speed of the car in near-real-time (note that there’s a delay of 1–2 seconds). You now have 25 minutes to test your models and speed configurations to ensure that the car maintains course at its highest possible speed. Depending on your model and its training, if you set the speed to 100%, it might just always exit the track, so test your model carefully.

The AWS agent on the track has to replace the car on the track when all four wheels are in the “grass” (green zone), and in a race, after 3 instances of this, the agent will reset the racer to the starting line. When your 25 minutes are up, it’s now time for the actual race.

The AWS agent will change the batteries of the car, and you will now have 4 minutes to complete laps as fast as possible. They will then compile the time needed for your model to complete the laps and record the fastest one as your final time.

Surprisingly, my model was actually better in reality than during the virtual simulations! I was able to finish in second place with a time of 9.61 seconds. Based on the scoreboard for the two day event that I was attending, I would have finished in 2nd position out of the 9 participating teams. Not bad for a first time!

I strongly suggest that you participate in a real race to see your model on the circuit. It doesn’t behave exactly like in the virtual evaluation and it can give you hints on how to tune your model.

See you in the virtual league, or at the next DeepRacer event in Montreal ;)

Editorial reviews by Deanna Chow, Liela Touré & Prateek Sanyal.

Want to work with us? Click here to see all open positions at SSENSE!