Liberty DeepRacer League and Reinforcement Machine Learning

Darren Broderick
Oct 10 · 5 min read
Outrun won Belfast with a time of 13.11 seconds.

Welcome to a light read about DeepRacer and Reinforcement Learning.

The team above is ‘Outrun’ from the Quantum Unit.
Working left to right we have;
- Maths guru (David Fyffe)
- Without the Sir or the Wings (Paul McCartney)
- Team DK (David Kelly)
- Me (DBro)
- Dr. DeepRacer (Dr. Glenn Horan)

We are passionate about trying new technologies and winning.
Liberty Mutual’s AWS DeepRacer League was a great opportunity to achieve both.

AWS DeepRacer

Multiple agents brought for each event

The DeepRacer car or ‘agent’ as it’s also referred to is a fully autonomous race car, programmed by us in python and trained across many iterations in AWS SageMaker on a simulation environment spun up by AWS RoboMaker.

We don’t provide the training data upfront like in supervised and unsupervised machine learning, neither do we apply any labels initially.

Instead the agent supplies its own timed delay label, known as the ‘reward’.

The data is gathered by the agent’s photo lens which are turned to greyscale. These images are off the simulated track. It tries an ‘action’ (JSON format with properties of speed and angle), you set these before training. Then analyses rewards received for its attempts, and repeats the process with different actions to look for greater rewards, being a returned as a float number in your reward function (will discuss later).

In short, the agent’s single focus is; return the maximum rewards possible.

The Liberty League and Rules

The rules are simple. Each team gets 4 minutes to achieve their best lap time on the re:Invent track. You’re allowed to come off the track a maximum of 3 times in order to qualify a lap. But each “off course” must be fixed by manually re-plotting the car back on the track, eating into your lap time.

Liberty IT ran our own practice day, Outrun were able to take 1st place

Outrun’s fastest lap time of 9.26 seconds is currently 3rd place in the league.

Love that first corner

The top 8 teams go on to the semi-finals and the top 4 to the finals, on 29/30th October.

The winning team goes to re:Invent 2019 to race for the Vegas cup.

Winner of Vegas ends the year!

Reinforcement Learning

Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.

Unlike supervised learning where feedback provided to the agent is a correct set of actions for performing a task, reinforcement learning uses rewards and punishment as signals for positive and negative behaviour.

Compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in unsupervised learning is to find similarities and differences between data points, in reinforcement learning the goal is to find a suitable action model that would maximise the total cumulative reward of the agent. The figure to the left represents the basic idea and elements involved in a reinforcement learning model.

Environment: Physical world in which the agent operates.
State: Current situation of the agent.
Reward: Feedback from the environment.
Policy: Method to map agent’s state to actions.
Value: Future reward that an agent would receive by taking an action in a particular state.
SageMaker: With each batch of experiences from RoboMaker, SageMaker updates the neural network, “and hopefully your model has improved.”

RoboMaker Simulation Example

What the agent trains on, how it takes iterations based on your set actions

Tips and Tricks

3 defaults models are easily available
  1. Keep your models simple, the model above focuses on keeping the car on the centre line, it is a great place to start with, wouldn’t change a thing above as a beginner.
  2. Don’t reward with a negative float, it can force the car to finish laps early to avoid them and cuts valuable training episodes. Punish instead with multiplying by decimal values.
  3. Get a look at the logs of your training. They are also on cloudWatch but not very readable.
The reward graph seen after training is complete. Purple = progress.

4. Train your models for 1–2 hours, you can clone them to continue further training, but 1-2 hours gives a good indication if you are making progress on the track. (See on left).

My Experience

I’ve really enjoyed the DeepRacer experience as a fun competition but more of a way into understanding RL and machine learning in general. It’s taken a lot of time to get through the vast material but well worth the learning journey, if you are interested in learning more just let me know!
The best way to get involved is to race a model you’ve made yourself, then you’re hooked, which is a good thing.
I plan to write an enhanced deep racer guide in the future to focus on ways to be competitive and efficient with your time, hopefully they work for us at the end of October :)

Thank you!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade