AWS DeepRacer at re:Invent 2018

Energetiq’s experience with Amazon’s new quirky machine-learning offering

Tom Wright
Energetiq
6 min readDec 21, 2018

--

Myself and a couple other members of the Energetiq team recently made the long journey from Melbourne to Las Vegas to attend AWS re:Invent 2018. One odd little announcement worth spending some time on is AWS DeepRacer and the associated AWS DeepRacer League.

AWS DeepRacer, a new machine-learning offering for… fun?

DeepRacer?

AWS DeepRacer (“deep” being a reference to deep learning I imagine, the family of machine learning that includes modern computer vision) consists of an autonomous “toy” car, and a collection of associated cloud tools for training models to drive it using simulations and reinforcement learning. Or as the product page puts it:

“AWS DeepRacer is the fastest way to get rolling with machine learning, literally. Get hands-on with a fully autonomous 1/18th scale race car driven by reinforcement learning, 3D racing simulator, and global racing league.”

Boiling it down: the car is a remote control car without the remote, and with a camera and sensors for observing its surroundings, along with a CPU to run inference on a machine-learning model to control the car’s throttle and steering.

DeepRacer specs — from https://aws.amazon.com/deepracer/

On the cloud side AWS provides a simulation and training suite for the models that power the DeepRacer. Upload a reinforcement learning reward function written in Python and off it goes — learning through trial and error in a 3D simulation.

One of our DeepRacer models training in the cloud simulator (ft. Guy Fieri’s Vegas Kitchen)

Lastly, AWS has announced an AWS DeepRacer League starting in 2019, where developers can bring their best and brightest DeepRacer models to the track at AWS Summit events and battle it out for places in championship races, with the inaugural AWS DeepRacer league taking place right after the announcements at re:Invent 2018.

“The first global autonomous racing league, open to everyone”

… Reinforcement Learning?

Reinforcement Learning differs from the branches of supervised machine learning that have been the hot topics in the mainstream eye the last few years. Whereas supervised learning (as the name suggests) is ideally suited to problems with known solutions (e.g. “is there a face in this image?”), reinforcement learning excels in problem spaces where “the right answer” is difficult to quantify or is otherwise unknown.

What does that mean?

Consider the problem of training a robot to walk. It’s very difficult to label the problem with precisely the actions that would consistitute “correct walking” — this would be a supervised learning approach. Instead it makes more sense to produce a way to evaluate numerically how well the robot is doing, and then simply just let it explore all the things it can try and do, guided by your evaluations. This can be simpler than it might sound, for example:

At each time step, if:

  • the robot is standing up, it gets 1 point
  • the robot is moving forward, it gets 1 point
  • the robot has fallen over, it gets -100 points
Me on Thursday night in Las Vegas — I mean, an impressive example of bipedal robotics (Boston Dynamics)

This set of rules constitute our reward function, which is the core of reinforcement learning. The goal of the agent (our robot, or our AWS DeepRacer) is to explore the ways it can move about the world and try determine what things to do to maximise the reward is accumulates over time. It’s a bit like life, really.

The goal of the agent is to explore the ways it can move about the world and try to determine what things to do to maximise the reward accumulated over time.

The thing is… this can take a lot of time, and a lot of failed attempts. That’s why we simulate, because compute time is cheap, and simulations can run as fast as we can compute them. Not like that lousy, real-world time.

The Energetiq DeepRacer

Or something like that, anyway…

This DeepRacer all sounded like a bit of fun, so we decided to spend some conference time giving this a shot. Naturally, we had no chance getting a seat in one of the workshops — I don’t think I’ve ever seen slots fill so fast. AWS DeepRacer was off and racing.

We learnt that if we headed down to “MGM Speedway” at the MGM Grand hotel we could check out the qualifying tracks AWS had set up, and also visit the “garage” and secure ourselves some throw-away AWS accounts with DeepRacer available.

MGM Speedway, in all its autonomous racing splendour

Once underway, it was simply a matter of iteration. Produce a Python reward function matching the given signature, upload it to the DeepRacer training simulation, let it bake at 220ᵒC for 1 hour, then check the evaluation laps. If the model looked good, get in line to upload your model to a test car and take it for a spin around the track.

Here’s one of the reward functions the Energetiq team produced. You can see DeepRacer provides you with a bunch of different inputs to use in your calculation of a reward (whether you’re on the track or not, how far from the center of the track, the car’s current throttle, etc.). Our logic was simple:

  • give a sliding reward based on how close we are to the center of the track — keep to that line, baby
  • apply a sliding penalty based on how low the throttle is — go fast is good
  • apply a sliding penalty based on how much we’re steering — in general, steering is bad. At corners, this penalty will be outweighed by the reward of staying on the track
  • two constants THROTTLE_PENALTY_FACTOR and STEERING_PENALTY_FACTOR allowed us to tune the severity of the penalties we applied
One of the code revisions for our reward function: middle of track, good, low throttle and steering, bad

So how did the Energetiq DeepRacer go? After a decent handful of 1-hour training sessions, and an incredibly poor-man’s hyperparameter search to find good values for THROTTLE_PENALTY_FACTOR and STEERING_PENALTY_FACTOR, we had a model that could zip around the simulation track 5/5 times at decent pace.

“She may not look like much, but she’s got it where it counts, kid.” — Scott Brown, on the Energetiq DeepRacer

But, unfortunately, the real car was often a different story. Good performance in the simulation was one thing, but debugging the behaviour of your model in the real world was entirely another. After a couple of time-consuming full iterations, in which we went to one of the real cars, we realised there was a lot more conference to be done, and the Energetiq DeepRacer was parked indefinitely.

Closing thoughts

I’m not going to pretend like I completely understand AWS’s reasoning behind the AWS DeepRacer offering, but I do see the value. While in its current form it is mostly just a bit of fun, it is also an excellent tool to learn about reinforcement learning and an incredible showcase of the powerful tools that AWS has under the hood: SageMaker RL and RoboMaker. Was there some teething issues, issues with the simulator, the car, etc.? Of course, but here at Energetiq we’re excited to see where this will go as it matures. We might even pick up a DeepRacer for the office — if Amazon ever ships them to Oz…

The Energetiq reinforcement learning laboratory

Tom is Infrastructure Lead at Energetiq, where he spends most of his time trying to automate away his job.

Scott is a frontend lead developer at Energetiq, where he is passionate about crafting modern React-powered web products.

Energetiq would also like to give a shout-out to our other Melbourne race-team collaborator, Jules.

--

--

Tom Wright
Energetiq

I like automation, productivity, team process, and watching the thing actually get out the door. @tomwwright on GitHub