AWS DeepRacer — How We Got Rolling with Reinforcement Learning…Literally!

Published in

Slalom Technology

9 min readOct 17, 2019

If you haven’t heard of AWS DeepRacer, it’s an autonomous car that’s powered by reinforcement learning (RL). The beauty of it all is that the car is trained in a virtual environment and then deployed to a physical car to drive around a track. The motivation behind this initiative by AWS is to get reinforcement learning in the hands of more developers to help them utilize it to solve challenging problems in AI; And boy did they achieve that.

As a result of this initiative, they have launched the first ever international autonomous car race dubbed the AWS DeepRacer League. Competitors take to the track to try and get the best lap times and there are two flavors of these races: virtual and physical. The virtual races happen over an entire month and the physical ones usually take place at the AWS summits around the world which typically last a day but could be more (e.g. Amazon re:MARS). The winners from the summits will then compete at re:Invent in a knockout style competition.

AWS DeepRacer Event at the AWS Summit in Toronto

We were lucky enough to have the last summit of the year in our city, Toronto Canada, and were able to attend an official race in person. They had an IndyCar race commentator setting the mood, which made the whole event super exciting! By the end of it we managed to land 2 of the top 10 positions, and 4 in the top 20 for Slalom! All top 10 times were under 10s, which is rare across other summits! We got a best time of 8.9s which landed us in the 7th position on the board, with the best time being 7.8s, just 0.4s shy of the world record time at the time of writing this article which is 7.4s! Our times were better than some first-place times in other summits, which was pretty cool as well. It was a very competitive race and people flew in from all over the world just to attend!

Final results of the AWS DeepRacer event in Toronto: https://d3akhm1epsal2g.cloudfront.net/?event=toronto

Reinforcement Learning?

So, what is reinforcement learning you might ask? It’s a branch of machine learning that deals with training an autonomous agent to achieve a task in its environment based on a reward mechanism or reward function. Let’s take a step back and see where this fits into the broader machine learning landscape.

There are three main branches of machine learning in general: supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning is an aspect of machine learning where the algorithm knows the expected output of a given input. Think of a problem where we need to detect a cat in a picture. We would need enough images labelled with cat so that the model can be able to know what a cat looks like. We might also need a bunch of others labelled “not cat” so that it can discern stuff that might look like a cat from actual cats.

For unsupervised learning we do not necessarily know the output of a given input. An example would be determining users who are similar to each other based on some event stream such as likes or purchases. There is no ground truth as is the case with supervised learning. We need to define what similar means in order to allow the model to find users that are similar to each other.

Then comes reinforcement learning. Reinforcement learning involves an autonomous agent that interacts with its environment with the goal to gain the maximum reward from it. At any given point, the agent is in a specific state which returns some reward based on that state. The agent then has the choice of deciding based on some predefined set of actions to maximize any future gain. Let’s give an analogy in terms of training a dog. If the dog is rewarded for good behavior but not for bad, then over time the dog will start to try and maximize the reward (treats or petting) by only doing the behavior that will reward it. The dog will likely listen to a command if it is expecting a treat afterward and not, if it is not.

Illustration showing RL in action (source)

Unlike supervised and unsupervised models, this branch of machine learning involves complex simulations in order to allow the agent as much time to experience the environment as possible.

Reinforcement Learning in the Context of DeepRacer

To put this into context of the DeepRacer, let’s define some terms. The agent would be the car. The environment would be the track and everything about it, like friction and lighting. The state of the car is the image the car sees from the onboard camera as well as some other metadata available to the simulation to help it optimize the model. Note that the metadata is not available to the physical car, it only sees what the camera sees as a signal and so the final model is only based on that. The action is a possible action the car can take at any given time such as max speed straight or slow left. The reward is a user-defined function that rewards the agent for certain behavior, for example finishing a lap. An episode is one trial run around the track until the car completes the lap or goes off track. Now that we understand the jargon let’s get into the nitty gritty of the DeepRacer.

Training a Model – optional section meant for developers

To train a model and compete in the race, there are three things a user has to define: the action space, the reward function and the hyperparameters. These are all available through the DeepRacer console on AWS.

The action space is a list of actions the car can pick from at any given state. It is discrete for the purposes of the DeepRacer as a continuous action space might take much longer to train than a shorter set of discrete actions. Let’s assume the car can go 5m/s at 0°, 30° or at -30°. These three actions would constitute the action space for this example.

Example action space with 3 possible actions

The reward function is a function written in python that defines how the car perceives the reward from its environment.

params dictionary (python) that is available within the reward function (source)

The above shows the input to the reward function at any given point in time. An example reward function would look something like the following:

Example DeepRacer reward function

The above reward function rewards the car for staying as close to the center line as possible. This is achieved using the track width and distance from center input parameters. If the distance is too large the car is likely completely off the track or crashed.

The hyperparameters (shown below) are mainly for the neural network aspect of this model, and unless you are familiar with this and what the parameters mean the default settings are the best bet. Here is a screenshot of the hyperparameters if you are interested.

Sample hyperparameters settings for DeepRacer

The last piece to define before we can train our model is the stopping condition which in this case is time. It can go from 5 minutes to 8 hours, but the model can be trained for longer by cloning it, if need be.

Once we have all the setup for the model ready, we can begin training. The training will likely take anywhere from 1 to 4 hours for a stable model. Anything beyond that will be overfit and might not perform well on the physical track at all. (trust me we’ve tried)

Testing the Model

When the model is done training, it can be evaluated on the virtual track and then submitted to one of the virtual races as an official time. If you are lucky enough to have an AWS summit near your location, or willing to fly out to one, then you can also try your luck on the physical track and meet other racers in person. The rules of the race are pretty straightforward, every racer gets 4 minutes to try whatever they want and as long as 1 wheel is on track the car will not be reset. If the car is completely off track it must be reset either to the last good position or to the beginning of the line, but there is a limit of three on track resets before the lap is considered void. The only controls the racer gets are the choice of model and the max throttle of the car in %.

AWS re:Invent 2018 track setup in condo basketball court

We were not happy waiting until the race to test out our models, so we had some physical cars shipped to us from our Chicago office and printed a track on vinyl to test out our algorithms. As the track was too big to fit in the office, we had to find some workarounds to test out the model, which in our case ended up being a basketball court in a condo building.

Early on we started to realize the discrepancy between the online and offline model performance and started to tweak our future models to adjust for those discrepancies. We also found that models that perform near perfect in the virtual world performed horribly on the track as they are too used the ideal virtual world and cannot account for random bumps or shiny surfaces for example.

Once we were confident that we had something competition worthy, we showed up to the race with models in hand ready to go. In my first attempt I managed to land third on the board! I couldn’t have done this without the diligent testing that the team had put in. That however, was short lived though as I started seeing my time move down soon after, but didn’t give up trying to get a better time. We were also being cheered on by the slew of Slalom employees manning our booth at the summit. The race started at 7am sharp and I managed to get my first time on the board by around 8am. The first 8s time was set at just 7:10am, setting the bar really high for everyone else competing. By the end of the day we each got in around 4 tries in with the average wait time between tries being about one and a half hours due to the length of the queue. There was no limit on how many tries we got, which gave everyone a better chance at winning.

Anxiously watching the car going around the track at the official race

The experience at the AWS summit was exhilarating to say the least. There were very dedicated people who spent weeks training models leading up to the event and brought their all to the race. It was great getting to network with other machine learning and artificial intelligence enthusiasts and professionals and to learn from their experiences. We landed 2 times in the top 10, winning us two DeepRacer cars to use for our own races and testing!

If you would like to learn more about DeepRacer you can checkout this page. If you are interested in joining an online community with other DeepRacer enthusiasts you can checkout https://deepracing.io/ and apply to join the slack community. Big shoutout to the people who put this together it has been a valuable resource for us! They also have guides for setting up training on your local environment if you wanted to try that out.

Slalom is a modern consulting firm focused on strategy, technology, and business transformation. In over 30 markets across the US, UK, and Canada, Slalom’s teams have autonomy to move fast and do what’s right. They’re backed by regional innovation hubs, a global culture of collaboration, and partnerships with the world’s top technology providers. Founded in 2001 and headquartered in Seattle, Slalom has organically grown to over 7,000 employees. Slalom was named one of Fortune’s 100 Best Companies to Work For in 2019 and is regularly recognized by employees as a best place to work. Learn more at slalom.com

If you want to learn more about Machine Learning or Artificial Intelligence and how it can impact your company or organization please reach out to us directly at: dev.mishra@slalom.com or danny.farah@slalom.com

AWS DeepRacer — How We Got Rolling with Reinforcement Learning…Literally!

Reinforcement Learning?

Reinforcement Learning in the Context of DeepRacer

Training a Model – optional section meant for developers

Testing the Model

Written by Danny Farah