Liberty DeepRacer League and Reinforcement Machine Learning

Outrun won Belfast with a time of 13.11 seconds.

Welcome to a light read about DeepRacer and Reinforcement Learning.

The team above is ‘Outrun’ from the Quantum Unit.
Working left to right we have;
- Maths guru (David Fyffe)
- Without the Sir or the Wings (Paul McCartney)
- Team DK (David Kelly)
- Me (DBro)
- Dr. DeepRacer (Dr. Glenn Horan)

We are passionate about trying new technologies and winning.
Liberty Mutual’s AWS DeepRacer League was a great opportunity to achieve both.

AWS DeepRacer

Multiple agents brought for each event

The DeepRacer car or ‘agent’ as it’s also referred to is a fully autonomous race car, programmed by us in python and trained across many iterations in AWS SageMaker on a simulation environment spun up by AWS RoboMaker.

We don’t provide the training data upfront like in supervised and unsupervised machine learning, neither do we apply any labels initially.

Instead the agent supplies its own timed delay label, known as the ‘reward’.

The data is gathered by the agent’s photo lens which are turned to greyscale. These images are off the simulated track. It tries an ‘action’ (JSON format with properties of speed and angle), you set these before training. Then analyses rewards received for its attempts, and repeats the process with different actions to look for greater rewards, being a returned as a float number in your reward function (will discuss later).

In short, the agent’s single focus is; return the maximum rewards possible.

The Liberty League and Rules

The rules are simple. Each team gets 4 minutes to achieve their best lap time on the re:Invent track. You’re allowed to come off the track a maximum of 3 times in order to qualify a lap. But each “off course” must be fixed by manually re-plotting the car back on the track, eating into your lap time.

Liberty IT ran our own practice day, Outrun were able to take 1st place

Outrun’s fastest lap time of 9.26 seconds is currently 3rd place in the league.

Love that first corner

The top 8 teams go on to the semi-finals and the top 4 to the finals, on 29/30th October.

The winning team goes to re:Invent 2019 to race for the Vegas cup.

Winner of Vegas ends the year!

Reinforcement Learning

Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.

Unlike supervised learning where feedback provided to the agent is a correct set of actions for performing a task, reinforcement learning uses rewards and punishment as signals for positive and negative behaviour.

Compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in unsupervised learning is to find similarities and differences between data points, in reinforcement learning the goal is to find a suitable action model that would maximise the total cumulative reward of the agent. The figure to the left represents the basic idea and elements involved in a reinforcement learning model.

Environment: Physical world in which the agent operates.
State: Current situation of the agent.
Reward: Feedback from the environment.
Policy: Method to map agent’s state to actions.
Value: Future reward that an agent would receive by taking an action in a particular state.
SageMaker: With each batch of experiences from RoboMaker, SageMaker updates the neural network, “and hopefully your model has improved.”

RoboMaker Simulation Example

What the agent trains on, how it takes iterations based on your set actions

Tips and Tricks

3 defaults models are easily available
The reward graph seen after training is complete. Purple = progress.

4. Train your models for 1–2 hours, you can clone them to continue further training, but 1-2 hours gives a good indication if you are making progress on the track. (See on left).

My Experience

I’ve really enjoyed the DeepRacer experience as a fun competition but more of a way into understanding RL and machine learning in general. It’s taken a lot of time to get through the vast material but well worth the learning journey, if you are interested in learning more just let me know!
The best way to get involved is to race a model you’ve made yourself, then you’re hooked, which is a good thing.
I plan to write an enhanced deep racer guide in the future to focus on ways to be competitive and efficient with your time, hopefully they work for us at the end of October :)

Thank you!

LibertyIT

Liberty IT thoughts and tech stories on technical…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store