Telkomsel Tech Fest 2024 — AWS DeepRacer

Muhammad Hanif
Life at Telkomsel
Published in
11 min readJan 29, 2024
Telkomsel Tech Fest Poster
Telkomsel AI Deep Racer League
My DeepRacer result for Telkomsel AI Deep Racer League (Virtual Race)

I think anyone know about AWS, but do you know about DeepRacer?

AWS DeepRacer Introduction

AWS DeepRacer is a fully autonomous 1/18th scale race car driven by reinforcement learning. It consists of the following components:

Reinforcement Learning

In reinforcement learning, an agent, such as a physical or virtual AWS DeepRacer vehicle, with an objective to achieve an intended goal interacts with an environment to maximize the agent’s total reward. The agent takes an action, guided by a strategy referred to as a policy, at a given environment state and reaches a new state. There is an immediate reward associated with any action. The reward is a measure of the desirability of the action. This immediate reward is considered to be returned by the environment.

The goal of the reinforcement learning in AWS DeepRacer is to learn the optimal policy in a given environment. Learning is an iterative process of trials and errors. The agent takes the random initial action to arrive at a new state. Then the agent iterates the step from the new state to the next one. Over time, the agent discovers actions that lead to the maximum long-term rewards. The interaction of the agent from an initial state to a terminal state is called an episode.

Reinforcement Learning

The agent embodies a neural network that represents a function to approximate the agent’s policy. The image from the vehicle’s front camera is the environment state and the agent action is defined by the agent’s speed and steering angles.

The agent receives positive rewards if it stays on-track to finish the race and negative rewards for going off-track. An episode starts with the agent somewhere on the race track and finishes when the agent either goes off-track or completes a lap.

Action Space and Reward Function

Action space

In reinforcement learning, the set of all valid actions, or choices, available to an agent as it interacts with an environment is called an action space. In the AWS DeepRacer console, you can train agents in either a discrete or continuous action space.

Discrete action space

A discrete action space represents all of an agent’s possible actions for each state in a finite set. For AWS DeepRacer, this means that for every incrementally different environmental situation, the agent’s neural network selects a speed and direction for the car based on input from its camera(s) and (optional) LiDAR sensor. The choice is limited to a grouping of predefined steering angle and throttle value combinations.

An AWS DeepRacer car in a discrete action space approaching a turn can choose to accelerate or brake and turn left, right, or go straight. These actions are defined as a combination of steering angle and speed creating a menu of options, 0–9, for the agent. For example, 0 could represent -30 degrees and 0.4 m/s, 1 could represent -30 degrees and 0.8 m/s, 2 could represent -15 degrees and 0.4 m/s, 3 could represent -15 degrees and 0.8 m/s and so on through 9. Negative degrees turn the car right, positive degrees turn the car left and 0 keeps the wheels straight.

Default configuration for discrete action space

Continuous Action Space

A continuous action space allows the agent to select an action from a range of values for each state. Just as with a discrete action space, this means for every incrementally different environmental situation, the agent’s neural network selects a speed and direction for the car based on input from its camera(s) and (optional) LiDAR sensor. However, in a continuous action space, you can define the range of options the agent picks its action from.

In this example, the AWS DeepRacer car in a continuous action space approaching a turn can choose a speed from 0.1 m/s to 4m/s and turn left, right, or go straight by choosing a steering angle from -30 to 30 degrees.

Discrete vs Action Space

The benefit of using a continuous action space is that you can write reward functions that train models to incentivize speed/steering actions at specific points on a track that optimize performance. Picking from a range of actions also creates the potential for smooth changes in speed and steering values that, in a well trained model, may produce better results in real-life conditions.

Reward Function

As the agent explores the environment, the agent learns a value function. The value function helps your agent judge how good an action taken is, after observing the environment. The value function uses the reward function that you write in the AWS DeepRacer console to score the action. For example, in the follow the center line sample reward function in the AWS DeepRacer console, a good action would keep the agent near the center of the track and be scored higher than a bad action, which would move the agent away from the center of the track.

Training Algorithms

Proximal Policy Optimization (PPO) versus Soft Actor Critic (SAC)

PPO vs SAC differences

Stable vs Data Hungry

The information learned by the PPO and SAC algorithms’ policies while exploring an environment is utilized differently. PPO uses on-policy learning which means that it learns its value function from observations made by the current policy exploring the environment. SAC uses off-policy learning which means that it can use observations made by previous policies’ exploration of the environment. The trade-off between off-policy and on-policy learning is often stability vs. data efficiency. On-policy algorithms tend to be more stable but data hungry, whereas off-policy algorithms tend to be the opposite.

Exploration vs Exploitation

Exploration vs. exploitation is a key challenge in RL. An algorithm should exploit known information from previous experiences to achieve higher cumulative rewards, but it also needs to explore to gain new experiences that can be used in finding the optimum policy in the future. As a policy is trained over multiple iterations and learns more about an environment, it becomes more certain about choosing an action for a given observation. However, if the policy doesn’t explore enough, it will likely stick to information already learned even if it’s not at an optimum. The PPO algorithm encourages exploration by using entropy regularization, which prevents agents from converging to local optima. The SAC algorithm strikes an exceptional balance between exploration and exploitation by adding entropy to its maximization objective.

Entropy

In this context, “entropy” is a measure of the uncertainty in the policy, so it can be interpreted as a measure of how confident a policy is at choosing an action for a given state. A policy with low entropy is very confident at choosing an action, whereas a policy with high entropy is unsure of which action to choose.

Let’s Get Started!

1. Open AWS console (aws.amazon.com, then click “Sign in to Console” at top-right page) and log in into your aws account.

2. Fill the account id, username and password, then click “Sign in”.

Login AWS console

3. After you’ve signed in into AWS console, search for AWS DeepRacer in the search bar.

AWS DeepRacer in the search bar

4. If you encountered Region Unsupported , make sure to just click US East (North Virginia) .

5. Then, you need to create your own DeepRacer account by creating an account first. Click New to DeepRacer? Create an account .

DeepRacer registration

6. Fill all the fields. You can add the same email as your AWS account or your can register different email. Then, click Create your account .

Create account for AWS DeepRacer

6. Enter the verification that is sent to your email. Click Confirm registration and voila! you have registered your account. It will redirect to Sign in page .

Verification code for AWS DeepRacer registration

7. Enter email and password in AWS DeepRacer sign in page, then click Sign in .

Sign in AWS DeepRacer

8. AWS DeepRacer home page will show and you can read or learn tutorial first there.

AWS DeepRacer home page

9. Before join the race, make sure to create racer profile. Go to Racing League then click Your racer profile . If you are in community races, make sure follow the rules, like racer name format (nickname-employee_id) in order to make it track easily.

My racer profile

10. You need to prepare your car and model. Go to Reinforcement learning then click Your models . Click create model in the top-right of list.

11. There are 5 steps to create model:

The overview of create model
  • Specify the model name and environment
left: fill training details, right: choose the track and direction

tip: For the track and direction, you need to follow the race rules in order the training will get the maximum result. If the environment is different between your model and the race, your model will be difficult to get maximum result

  • Choose race type and training algorithm
left: choose race type, middle: choose algorithm, right: fill all the hyperparameters

tip: Read this documentation to be familiar with the hyperparameters

  • Define action space
Action space overview

tip: Chose discrete action space to customize the steering angle and speed, it is more flexible than contionuous action space.

  • Choose vehicle
Vehicle for the model
  • Customize reward function
Reward fucntion overview

tip: If you are familiar with python language, better you code reward function yourself, like this example. If you are not familiar with it, just click Reward function examples and choose your best.

Reward function examples

For the stop conditions, just write down the time, there is no standard. If the specified stop conditions is met, training will stop, make sure to choose it carefully and spare time for the next model. You can ignore Submit this model to the following race after training completion . Then, click Create model .

Stop conditions and automatically submit to the race

11. Your model will train based on the environment and it will stop when the condition meets.

Training is running
Video of training simulation

Training and Evalution

Training

After the training process is finished, you can see the result by downloading logs or just see from the reward graph.

Training result after the training is finished

For the reward graph explanation, you can see here

Evaluation

Before submit to the real race, you can start evaluation to see if your model is already good or not, so make sure to create the same environment in the evaluation.

My best model evaluation

Telkomsel AI Deep Racer League

My First Model

My first model is not really good, it run really slow, but it still on track, it took 15,736 seconds 😂

Model v1 took 15,736 seconds
12th place of 15 participant

Then, I modified the speed and steering angle. I also added more than 1 speed in the same angle in order the model had range of speed and made it more flexible when there is turn and straight.

Model v14 took 10,051 seconds
Model v14 discrete action space
Also 12th place of 41 participants 😭 (Final leaderboard)

tip: Never give up when you see the others, you just need to evaluate your model by seeing the record and learn the model of 1st place.

1st place model

As you can see, the car is really fast, it also could make a shap turn smoothly. You can analyze it and implement it to your model.

My total submissions (38)

Since the time is not enough, i just submitted 38 submission with many different versions. And also, to take a note, the evaluation time in your model is not always same result when you submit to the race. So, don’t trust 100% of the evaluation result, when in the race it can be better or even worse. Just like the real race competition, for example: in the F1 competition, the racer get 1st place in qualification time, but when the real race begins, it is not always the same as qualification time, many factors can be happen there.

--

--

Muhammad Hanif
Life at Telkomsel

Frontend Developer @Telkomsel. Part-time Runner @10.11Runners. Retail Investor @TLKM. Sport & Technology enthusiast.