Welcome to a light read about DeepRacer and Reinforcement Learning.
The team above is ‘Outrun’ from the Quantum Unit.
Working left to right we have;
- Maths guru (David Fyffe)
- Without the Sir or the Wings (Paul McCartney)
- Team DK (David Kelly)
- Me (DBro)
- Dr. DeepRacer (Dr. Glenn Horan)
We are passionate about trying new technologies and winning.
Liberty Mutual’s AWS DeepRacer League was a great opportunity to achieve both.
The DeepRacer car or ‘agent’ as it’s also referred to is a fully autonomous race car, programmed by us in python and trained across many iterations in AWS SageMaker on a simulation environment spun up by AWS RoboMaker.
We don’t provide the training data upfront like in supervised and unsupervised machine learning, neither do we apply any labels initially.
Instead the agent supplies its own timed delay label, known as the ‘reward’.
The data is gathered by the agent’s photo lens which are turned to greyscale. These images are off the simulated track. It tries an ‘action’ (JSON format with properties of speed and angle), you set these before training. Then analyses rewards received for its attempts, and repeats the process with different actions to look for greater rewards, being a returned as a float number in your reward function (will discuss later).
In short, the agent’s single focus is; return the maximum rewards possible.
The Liberty League and Rules
The rules are simple. Each team gets 4 minutes to achieve their best lap time on the re:Invent track. You’re allowed to come off the track a maximum of 3 times in order to qualify a lap. But each “off course” must be fixed by manually re-plotting the car back on the track, eating into your lap time.
Outrun’s fastest lap time of 9.26 seconds is currently 3rd place in the league.
The top 8 teams go on to the semi-finals and the top 4 to the finals, on 29/30th October.
The winning team goes to re:Invent 2019 to race for the Vegas cup.
Winner of Vegas ends the year!
Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.
Unlike supervised learning where feedback provided to the agent is a correct set of actions for performing a task, reinforcement learning uses rewards and punishment as signals for positive and negative behaviour.
Compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in unsupervised learning is to find similarities and differences between data points, in reinforcement learning the goal is to find a suitable action model that would maximise the total cumulative reward of the agent. The figure to the left represents the basic idea and elements involved in a reinforcement learning model.
Environment: Physical world in which the agent operates.
State: Current situation of the agent.
Reward: Feedback from the environment.
Policy: Method to map agent’s state to actions.
Value: Future reward that an agent would receive by taking an action in a particular state.
SageMaker: With each batch of experiences from RoboMaker, SageMaker updates the neural network, “and hopefully your model has improved.”
RoboMaker Simulation Example
Tips and Tricks
- Keep your models simple, the model above focuses on keeping the car on the centre line, it is a great place to start with, wouldn’t change a thing above as a beginner.
- Don’t reward with a negative float, it can force the car to finish laps early to avoid them and cuts valuable training episodes. Punish instead with multiplying by decimal values.
- Get a look at the logs of your training. They are also on cloudWatch but not very readable.
4. Train your models for 1–2 hours, you can clone them to continue further training, but 1-2 hours gives a good indication if you are making progress on the track. (See on left).
I’ve really enjoyed the DeepRacer experience as a fun competition but more of a way into understanding RL and machine learning in general. It’s taken a lot of time to get through the vast material but well worth the learning journey, if you are interested in learning more just let me know!
The best way to get involved is to race a model you’ve made yourself, then you’re hooked, which is a good thing.
I plan to write an enhanced deep racer guide in the future to focus on ways to be competitive and efficient with your time, hopefully they work for us at the end of October :)
- Getting Started
- AWS Guide
- Build your own track https://medium.com/@autonomousracecarclub/guide-to-creating-a-full-re-invent-2018-deepracer-track-in-7-steps-979aff28a6f5
- AWS Training