My Experience with DeepRacer: A Headstart and My Journey
Deepracerπ is an AWS event that is conducted virtually and physically in π track. The main motive is to learn machine learning in specific reinforcement learning in a fun way where we get to enjoy the π process.
Deepracer is a reinforcement learning technique to enable autonomous driving for the AWS DeepRacer vehicle.
Reinforcement Learning (RL) is the π§ science of decision-making. It is about learning the π‘ optimal behavior in an environment to obtain maximum π reward.
RL doesnβt need any π·οΈ labeled data, it takes a large amount of data, identifies patterns in the data, and makes decisions.
Terminologies you need to know for sure about RL:
π‘ Agent β Our Deepracer vehicle is our π agent who is going to perform all our ποΈ actions
ποΈ Environment β It is a place where the agent interacts, here it is our π race track.
π£οΈ State β The position the agent is currently in refers to the state of the agent.
π¦ Action β The decisions that result in a movement made by our agent.
π± Policy β It is a set of actions that makes decisions for the next move. The policy gets updated after each move.
π Reward β They are points awarded to our agent based on the decisions they make.
π Value β Evaluation of the actions made by the agent.
DeepRacer works based on the βοΈ exploration-exploitation dilemma, which is a π§ fundamental concept in reinforcement learning. The dilemma is between choosing to π± exploit the current best-known policy or exploring new policies to improve π performance.
βπ‘ Exploration is the process of trying new things and learning about the ποΈ environment. This can be done by taking actions that the model has not taken π before or by exploring different parts of the π environment.
βπ± Exploitation is the process of using the knowledge that the model has learned to make decisions that are likely to be π successful. This can be done by taking actions that have been successful in the π past or by taking actions that are predicted to be successful.
Diving into training your own π DeepRacer model, which is similar to π©βπ³ cooking in that it is a π‘ creative process that involves π©βπ³ trial and error, π§ͺ experimentation, and a deep understanding of the π§ underlying principles.
Letβs π₯ compare DeepRacer and cooking, and π©βπ³ cook our DeepRacer dish together.
The Deepracer Dish
π³ First, you need your pan to start your dish which is your Deepracer consoleπ».
π₯ The primary ingredients required for our DeepRacer dish are:
β Agent β 1 π
βοΈ Model β 1 ποΈ
β Track β 1 π
β Hyperparameters as required βοΈ
π©βπ³ Letβs start cooking, and here are a few steps that describe what you need to do, Iβll explain what they mean after weβre done cooking.
1οΈβ£ Step 1: Start your lab and enter into the deepracer π» console.
2οΈβ£ Step 2: Design your vehicle with the required cameras and sensors according to your physical π model.
Note: Your physical model wonβt fit in with the wrong virtual model features
Step 3: Creating your model ππ.
You need to build a model and train it before racing on the track.
So to cook your deepracer curry base of the model you need to select the Track type π, ποΈ Race type π, Algorithm π€ , Hyperparameters π, Action space π¦, Agent ποΈ , Reward function π and Training period β³ .
Step 4: Let the model be cooked π³ i.e., training and weβll taste it before serving β³π₯
π₯ Now letβs take a look at our curry base π₯ ingredients,
π Once you have entered your console, you can start building your DeepRacer model by choosing the ingredients that you need. These ingredients include:
- Track type: π The track type is the environment in which the π agent will learn to race. There is a wide range of π track types provided by the Deepracer platform and you can choose accordingly as per your π₯ needs.
- Race type: π The race type determines the goal of the π training.
β’ Time trialπ: The goal of a time trial is to get the π₯ fastest lap time possible. This is a good way to learn the basics of racing and to get a feel for how the π» DeepRacer platform works.
β’ Object avoidanceπ: The goal of an object avoidance race is to avoid obstacles while driving as fast as possible. This is a more challenging race type, but it can help you improve your agentβs ability to make decisions in difficult situations.
β’ Head-to-head racingπ: The goal of a head-to-head race is to race against other agents and finish π₯ first. This is the most challenging race type, but it can be a lot of fun. - Training Algorithm: π€ The two training algorithms available in DeepRacer are Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). Both algorithms work by learning a policy that maps states to actions. However, PPO is a π± policy gradient algorithm, while SAC is a π± value-based algorithm.
- Hyperparameters: βοΈ The hyperparameters are the settings that control the training process. You can adjust these settings to improve the performance of your model.
- Action space: π¦ The action space is the set of all possible actions that the agent can take. You can choose from a variety of action spaces, such as continuous or discrete.
- Reward function: π The reward function is the metric that the agent will use to evaluate its performance. You can choose from a variety of reward functions, such as the lap time or the number of obstacles avoided.
- Training period: β³ The training period is the amount of β±οΈ time that the agent will train. You can adjust this setting to improve the performance of your model
I guess that our model is well-cooked, letβs taste our freshly cooked model
Evaluating the model π₯ consists of measuring the performance measure π, which tells us how well the model is cooked π₯ and Iβll be continuing this in the next part.
My Experience with DeepRacer
My journey with DeepRacer π was a lot of fun π€© and started through the Deepracer League conducted By KGiSL EDU and AWS virtual community races π and recently a workshop conducted by Thoughtworks, Coimbatore.
Tbh, It wasnβt super easy or fun in the beginning π , but eventually, it got better and turned out well in the end π. Letβs get to know what DeepRacer is, and Iβll share the mistakes I made and what I learned over time and trainingπ.
Deepracer π comes under the few things I wouldnβt regret trying π€©.
At the start, I just went to the console π» and trained the model randomly by adjusting the hyperparameters βοΈ and a few other requirements π, but I never touched the reward function π. I know thatβs not the right way to practice π
, but I learned from my mistake π₯² and grew π.
I learned a lot about reinforcement learning π± and autonomous driving ποΈ through this journey π£οΈ. I also had a lot of fun racing against other models π. If youβre interested in machine learning π€ or autonomous driving ποΈ, I highly recommend trying out DeepRacer π.
Donβt be afraid to experiment. There is no one right way to train a DeepRacer model. The best way to learn is by trying different things.
Letβs connect socially:
β‘οΈhttps://www.linkedin.com/in/akshayavarshieni/