Results and Lessons: DeepRacer Student League March 2023

9 min readApr 2, 2023

Introduction

This is a new series I’m going to be doing, detailing my month-by-month progress in the DeepRacer Student League. The track and leaderboard resets every month, which is why the updates will be monthly. During these guides I will go over the track, my overall results, some model-by-model analysis, and what my plan for next month is.

I’m doing this because I’m finding it frustratingly hard to learn more things about DeepRacer because of a lack of people writing articles (most people just want to race more!).

The Track

This month’s track was “Hot Rod Super Speedway”.

It is one of the longest tracks so far, and also has some brutal hairpin turns. However, the model also has to realize when to speed up as the long stretches of track is ripe for the model to just go full throttle and cut some time.

Overall Results

During the March 2023, myself and my school’s club ran a total of 15 models, and the results of them are showed here:

Look at my really cool graph; it’s very colorful — Look at my cool graph; it’s very colorful

The best model ended up being speed-model with a dominant showing with a sub-200-second time, smashing all my previous and following models.

Please note models with a long plateau in performance might have done worse, but DeepRacer Student League did not allow viewing of last submitted until part-way through the season.

Something interesting to note is with a lot of the models, they run into what I believe to be “catastrophic forgetting”, characterized by a sharp improvement in the model’s performance (a downward slope), then a big decrease in a model’s performance (an upward slope), then what looks like the model slowly getting periodically getting better and worse (inverse logarithmic curve), but never as good as the initial improvement.

However, it does not look like the model ever reaches Standford University’s definition of converge (when it achieves a state during training in which loss settles to within an error range around the final value. In other words, a model converges when additional training will not improve the model) because some models do seem to keep getting better as training goes on, although slowly.

I found it interesting that all my best models trained for no more than 300 minutes, especially when I originally thought that overfitting to the track would be my best option.

Model-By-Model Results

For the model-by-model results, I’m only going through the first and third best models (models that ranked first and second were pretty much the same) because it’s currently 10:50pm and I really want to go to sleep.

speed_model: 193.795s, trained for 300min

def reward_function(params):
    
    # Huge penalties
    if not params['all_wheels_on_track']:
        return float(1e-3)

    # Base Reward
    reward = 0
    if params['speed'] > 0.9:
        reward = 1 + params['speed']
    else:
        reward = 1e-3
    
    # Center the car
    track_width_half = params['track_width'] / 2
    quadratic_center = (params['distance_from_center'] / track_width_half) ** 4
    distance_from_center_reward = 1 - quadratic_center
    reward += distance_from_center_reward

    # Give a higher reward if the car is making progress towards the finish line
    reward += (params['progress'] / params['steps']) * 100
        
    # Give a higher reward if the car completes the track in the allotted time
    if params['progress'] == 100:
        reward += 2
    
    return float(reward)

This model is titled “speed-model” because of its emphasis on speed.

The first huge penalty is pretty standard, and it’s to ensure the model does not go off the track, because an off the track model is a bad model.

The base reward is the most interesting part, and is a piecewise function that can either be 0 if under the speed threshold, or in the inclusive range of 1.9 to 2. This massive gap makes it so the reward is highly encouraged to maintain a speed of 0.9, because unlike just setting “reward = speed”, this change almost doubles how much the model cares about speed.

The center the car reward is not too interesting, and is what I usually use to ensure the car generally stays in the center. Instead of being linear (reward += 1 —(params[‘distance_from_center’] / track_width_half), it is quadratic. This gives the model a little more freedom in the center of the track (the reward doesn’t change too much until params[‘distance_from_center’] = track_width_half/3), but is more harsh as the model start to deviate further. This is visualize below, with x being the normalized distance from the center, and y is the reward.

This is here mostly because over and over I’ve read that using “(params[‘progress’] / params[‘steps’]) * 100” is usually good. However, I have no idea what the range on it is. For example, if I do reward = ((params[‘progress’] / params[‘steps’]) * 100) + params[‘speed’], dependent on the range of ((params[‘progress’] / params[‘steps’]) * 100), the reward function might either not factor in speed at all or only care about speed. I have no idea how to calculate the impact of this, but it seems to help and I have no idea why.

Finally, I give the model a lot of reward for finishing a lap. It seems like a good idea and again, I’ve read a lot of good functions do this. However, I am unsure of the impact it actually has.

ResearchProductModel2: 208.856, trained for 90min

import math

def reward_function(params):
    # Reward for completing the track
    if params['progress'] == 100:
        reward = 100

    # Penalize for driving off track
    elif not params['all_wheels_on_track']:
        reward = 0

    else:
        # Get params
        progress = params['progress']
        steps = params['steps']
        speed = params['speed']
        track_width = params['track_width']
        distance_from_center = params['distance_from_center']
        
        # {0 < x < 1} Reward for target number of steps
        STEP_GOAL_PER_LAP = 55
        step_goal_right_now = progress * STEP_GOAL_PER_LAP
        if progress != 0:
            step_reward = steps / step_goal_right_now
        else:
            step_reward = (1e-3)
        
        # {0 < x < 1} Reward for high speed
        if speed > 0.8:
            speed_reward = speed * 2
        else:
            speed_reward = (1e-3)
                            
        # {0 < x < 1} Penalize for getting too far from the center
        track_width_half = track_width / 2
        quadratic_center = (distance_from_center / track_width_half) ** 4
        distance_reward = 1 - quadratic_center

        # Standard process steps reward
        standard_reward = progress / steps

        # Combine Rewards
        reward = 0
        reward += step_reward
        reward += speed_reward
        reward += distance_reward
        reward += standard_reward
        
        # Speed is really good
        reward *= math.exp(2 * speed)

    return float(reward)

The idea behind this model was a combination of using everything I had learnt over the month, in addition to hand-holding the model a tad more. Because of the 10 hour maximum training time and the effects of catastrophic forgetting, I tried to get the model to do well earlier in its journey (which did seem to work) by putting in more guard rails for it, represented by a longer model.

Similar to the first model, I had heavy rewards and punishments for completing a lap and going off the track, respectively.

For the step goals reward, I calculated the amount of steps the best model completed the lap in (using information from my last blog post) and then calculated how many steps the best model would have at the current amount of progress, and divdided my current amount of steps by that to get a measure of how well my model is doing based off of steps (1 being the best).

The reward for high speed is very similar to the model above, except the speed threshold is more generous and speed is emphasized even more. The lower speed threshold was supposed to allow the model to go slower around corners with the “* 2” on the reward encourages to go faster, which in hindsight cancel each other out.

The distance from the center reward is the exact same as above, and so is the standard reward (expect not scaled by 100x).

Finally, I encouraged high speed even further by multiplying the reward by e^(2*speed). I chose doing the natural number to 2*speed because it has the sharpest reward, as seen below:

What I Learnt

Catastrophic forgetting seems like a much bigger problem than I thought. When the model gets so good at not going off the track, it no longer needs to spend valuable memory on the “don’t go off the track” stuff because it’s a non-issue. However, this quickly becomes an issue because the model then starts going off the track and freaks out because it has no idea what to do and has to relearn not going off the track while still min-maxing other parameters, which I suspect leads to the learning pattern that emerges. Keep in mind this is all a theory (a game theory!!!) and I have no way to prove or disprove this currently.

I learn that steps and progress reset each lap. This made looking at other reward functions and programming my own so much easier because stuff finally made sense. I put a lot of these realizations in my last blog post, like how the speed in the student league is capped at 1 m/s.

Speed is really important, but racing line is also super important. I focused most of my efforts on high speed, which by the end I was hitting no problem. However, because I didn’t have a good racing line I was losing a lot of time, which I suspect top competitors minimized by a better use of either waypoints or (x,y) coordinates.

When the model resets itself on track, the reward is default set to 0. Thus, if there is a chance to get negative rewards, the car will do its best to hurl itself off the track as fast as possible to avoid these super bad rewards. Because it is so focused on going off track, it seems like it doesn’t even improve. Thus, you should never use negative reward anywhere for any reason. This also explains why the lowest reward you should give should always be above 0, or the model will be indifferent ot improving or just resetting itself.

Local training is a thing, and the student league uses it! I didn’t know this, and it now makes sense how the top models are lightyears ahead of me: they’re using strategies years ahead of me. I really need to use local training for testing of models if I want to have any hope for winning.

My Gameplan for Next Season

1. Set up local training.
All the top racers that I’ve spoken to told me that they tested and trained stuff on their home computer, before retraining it in the student league. This gives them a big advantage as they have a lot more data about their models, and is something I’m looking forward to doing.

2. Plan out the different types of models I want to try, some of which include:
a) Waypoints Model: According to other racers, the waypoints this last track were actually messed up, which would explain why the waypoints model did so incredibly poorly
b) Current Models That Worked: I want to try out the models that worked well this season to see if it was a fluke, or if they actually worked
) Current Models That Didn’t Work: I want to try out the models that did terrible this season to also see if it was a fluke, or if they are built on bad principles

3. Test out my best models locally with different time intervals and hyperparameters.
When training locally, I want to figure out the best amount of time to train a model to see if it is catastrophic learning creating the problems above, or if it’s something completely different. Further, when I say hyperparameters, I don’t mean stuff like RNN size (hyperparameters refers to a very specific set of things in the professional league). I mean stuff like adjusting constants, and the automated testing of said constants.

4. When I have a good model, do it on my main account (DRC-ProcessingModel)
If you want to follow along with my progress, you can check the leaderboards for “DRC-ProcessingModel”, and check out the rest of the people in my club as well (they all have DRC-______ as their name).