AWS DeepRacer — The NAB Story
Part two
When it comes to understanding how DeepRacer works and it’s setup, the only question is — how deep you want to get? There are several layers to it overall, and AWS has made an effort to allow everyone to understand the basic idea. I’m going to go a bit deeper than the DeepRacer Console — follow the link to become more familiar.
For any type of machine learning or any solution for that matter, it’s important that the problem statement is established first. So let’s do that for DeepRacer, this is my take on it.
The problem statement is: “What series of actions would lead to a car completing a lap of a known track as fast as possible?”
From this statement, you can see that the way DeepRacer is setup is only one way of solving the problem. There are three sub problems I can see from our problem statement.
1. What data are we going to use and how do we get it?
2. What algorithms do we use to process that data into known actions?
3. How do we know what actions are correct?
I am not going to go into how the AWS DeepRacer solves those problems explicitly, but you should get a general idea of how they have solved them in this article. For now, let’s get into the actual setup of DeepRacer.
When you click the start button to train a model in the console, two major services are started. Sagemaker, and Robomaker. These services then use underlying technology to achieve different roles; that is covered later. You may ask how these two services communicate, and they do this through S3 bucket(s) and a Redis Pub/Sub Cache. The checkpoints are “snapshots” of the model, see the agent class in Intel Coach for what it’s actually doing. The “model.pb” files are the files representing the model. There are also other files that get stored like some CSV files for the training performance. The following figure is just a basic overview of this idea and is important to realise that DeepRacer is invoking the two parts.
To understand more of what is going on with DeepRacer, it might be helpful to have a basic list of what algorithms and technology are being used.
- Sagemaker
- Intel Coach
- Tensorflow
- Clipped PPO/PPO
- Redis
- Robomaker Simulation
- ROS
- Gazebo
- Intel Coach
- Tensorflow
- Clipped PPO/PPO
The following two figures will explain where the algorithms and framework sit and how Sagemaker and Robomaker use them for DeepRacer.
For Figure 2, Sagemaker and Robomaker start at the same time, Sagemaker a bit before because it sets up Redis and the network connections. Sagemaker uses Coach to create a random neural network, then store that as a checkpoint in S3. Once Robomaker sees there is a checkpoint, it starts to race the car in the simulation and generate data, storing it in Redis for Coach to use. Once enough “episodes” have passed, it will stop and Coach will process that data into the neural network, using Clipped PPO. After that is processed in, the two networks are synced and the chosen algorithm ensures not too much has changed through different means. Ensuring not too much has changed when syncing is important because it keeps the model from going way off the correct learning path.
If you’d like to read more about the Clipped PPO or PPO agents, I’d suggest having a glance at Intel Coach’s documentation on Clipped PPO first, or the source code, then go more into the theory behind it all. You might recognise some of the code on line 280 from the logs if you’ve tried your hand at creating a model.
The third figure here illustrates the cycle that is generating the data and training based off that, to go back into generating.
Once you’ve run enough episodes, generated enough data and processed it all, multiple times, the model will be ready for evaluation. This is where you hit the evaluation button in the DeepRacer console. All the same technology is used for the evaluation cycle, however no new data is being generated (e.g. no reward function invocation), instead it’s just using the model to decide an action based on the image that is passed in from the simulated camera.
All in all, that is how the DeepRacer Console is using technology underneath to create and evaluate DeepRacer models. There is a lot more we could go into, such as how the generated data is structured, how Intel Coach invokes Tensorflow, how we can measure the “goodness” of the model while it is training etc..
If you’re interested in learning more and about working in technology at NAB, click here.
About the author: Chris Rhodes is a NAB intern studying computer science at Deakin University. With a love for programming from a young age, Chris dreams of creating technology that everyone can use.