DeepRacer: our journey to the top ten!

beSharp
beSharp
Jun 21 · 13 min read

How AWS DeepRacer and Reinforcement Learning work

Before starting to talk about racing and record time, it is good to take a look atthe interface of the AWS DeepRacer service, which is the model training tool. It seems silly to specify it, but it is essential

import mathdef reward_function(params):'''Use square root for center line'''track_width = params['track_width']distance_from_center = params['distance_from_center']reward = 1 - math.sqrt(distance_from_center / (track_width/2))if reward < 0:reward = 0return float(reward)
{"all_wheels_on_track": Boolean,    # flag to indicate if the vehicle is on the track"x": float,                        # vehicle's x-coordinate in meters"y": float,                        # vehicle's y-coordinate in meters"distance_from_center": float,     # distance in meters from the track center"is_left_of_center": Boolean,      # Flag to indicate if the vehicle is on the left side to the track center or not."heading": float,                  # vehicle's yaw in degrees"progress": float,                 # percentage of track completed"steps": int,                      # number steps completed"speed": float,                    # vehicle's speed in meters per second (m/s)"steering_angle": float,           # vehicle's steering angle in degrees"track_width": float,              # width of the track"waypoints": [[float, float], … ], # list of [x,y] as milestones along the track center"closest_waypoints": [int, int]    # indices of the two nearest waypoints.}
import mathdef reward_function(params):'''Use square root for center line'''track_width = params['track_width']distance_from_center = params['distance_from_center']steering = abs(params['steering_angle'])speed = params['speed']all_wheels_on_track = params['all_wheels_on_track']ABS_STEERING_THRESHOLD = 15reward = 1 - (distance_from_center / (track_width/2))**(4)if reward < 0:reward = 0if steering > ABS_STEERING_THRESHOLD:reward *= 0.8if not (all_wheels_on_track):reward = 0return float(reward)
  • If we go out of track, the reward is 0.

1st day: warm-up

We arrive at the circuit and join a large group of people who are preparing to compete. Assistants provide us with a USB key to load our model. Here is how to do it from the AWS console:

  • Turn off vehicle;
  • Set the speed: it is a numeric input with which you can increase the speed of the DeepRacer Car by a percentage at any time of the lap.
import mathdef reward_function(params):'''Use square root for center line'''track_width = params['track_width']distance_from_center = params['distance_from_center']speed = params['speed']progress = params['progress']all_wheels_on_track = params['all_wheels_on_track']SPEED_TRESHOLD = 6reward = 1 - (distance_from_center / (track_width/2))**(4)if reward < 0:reward = 0if speed > SPEED_TRESHOLD:reward *= 0.8if not (all_wheels_on_track):reward = 0if progress == 100:reward += 100return float(reward)

2nd and 3rd day: some progress on the times, but the competition is getting tougher…

Thanks to the improved algorithm, the times are reduced and our machines are able to settle around 11–12 seconds. It is not much, but in these races even a fraction of a second can make the difference. However, the more prepared begin to climb the rankings: we are witnessing the first 8 seconds laps!

4th day: the turning point!

On the last day, we present the newly trained model that uses waypoints:

import mathdef reward_function(params):track_width = params['track_width']distance_from_center = params['distance_from_center']steering = abs(params['steering_angle'])direction_stearing=params['steering_angle']speed = params['speed']steps = params['steps']progress = params['progress']all_wheels_on_track = params['all_wheels_on_track']ABS_STEERING_THRESHOLD = 15SPEED_TRESHOLD = 5TOTAL_NUM_STEPS = 85# Read input variableswaypoints = params['waypoints']closest_waypoints = params['closest_waypoints']heading = params['heading']reward = 1.0if progress == 100:reward += 100# Calculate the direction of the center line based on the closest waypointsnext_point = waypoints[closest_waypoints[1]]prev_point = waypoints[closest_waypoints[0]]# Calculate the direction in radius, arctan2(dy, dx), the result is (-pi, pi) in radianstrack_direction = math.atan2(next_point[1] - prev_point[1], next_point[0] - prev_point[0])# Convert to degreetrack_direction = math.degrees(track_direction)# Calculate the difference between the track direction and the heading direction of the cardirection_diff = abs(track_direction - heading)# Penalize the reward if the difference is too largeDIRECTION_THRESHOLD = 10.0malus=1if direction_diff > DIRECTION_THRESHOLD:malus=1-(direction_diff/50)if malus<0 or malus>1:malus = 0reward *= malusreturn reward
  • It is not strictly necessary to train a model for more than eight consecutive hours but, to obtain record times, it becomes essential;
  • It is always possible to increase confidence by changing the car’s degrees of freedom;
  • Using Waypoints allows you to outline the ideal path;
  • To gain those thousandths of a second that make the difference, you can manually vary the speed of the machine during the laps.

Faun

The Must-Read Publication for Aspiring Developers & DevOps Enthusiasts

beSharp

Written by

beSharp

AWS Cloud Experts!

Faun

Faun

The Must-Read Publication for Aspiring Developers & DevOps Enthusiasts