Training a neural network for driving an autonomous RC car

Published in

Acta Schola Automata Polonica

10 min readSep 2, 2019

About machine learning and mobile robotics, and how “they go together like a horse and carriage”.

After several months of building a model of a self-driving car, I was able to get it to drive around a pond that’s next to my house. Here’s a teaser:

The “what”

It’ll be easier for me to refer to the mobile robot by it’s name, “Karr” (an homage to KARR), simply because it’s shorter than “an autonomous RC car” and less ambiguous.

Karr was strongly inspired by the specification for the F1/10th competition but it doesn’t have a LIDAR (which is too damn expensive) and instead uses a depth camera as it’s primary sensor. It’s main computing unit is the NVIDIA Jetson TX-1 and the chassis is a two-wheel drive Traxxas Slash. The depth camera is mounted on top of a “safety cage” scaffold (built by my Dad — thanks, Dad!) so that: 1) it has a better view point, 2) the hardware as a whole is safer (in case of a rolloever or an unfortunate collision).

Most notably, the computing unit has a GPGPU that supports CUDA which, in particular, let’s you get a shorter inference time for a neural network than on a CPU of, for example, a Raspberry Pi. With 256 Maxwell CUDA cores it actually allowed for the model to yield predictions in ~3.5ms (pre-processing and everything else included)!

Here’s how the world looks like from Karr’s perspective (full movie):

The closer something is, the darker it is in the depth map. Also, although this will not be relevant in this article, a depth camera lets you get point clouds which can then be used e.g. for SLAM (Simultaneous Localization and Mapping). This is how it looks like:

This is a mapping of the aforementioned pond that’s next to my house, and the path around it — that’s the “race track” around which I wanted the neural network model to be able to drive.

The “why”

My ultimate goal is to train a neural network on depth maps in the CARLA simulator so that it can drive Karr in the real world. But that’s for another blog post, in this one Karr was driven by a model trained using the same pipeline I implemented in a previous project (which I tested out in a simulation environment). The only two differences were: 1) the model was simpler, and 2) the data was produced by a richer, more convincing simulator — our reality ;)

Software

The neural network was trained using keras, however it was much easier to install Theano rather than TensorFlow on the Jetson so I used the former as backend.

The depth camera has its own dedicated library, librealsense, with various plugins (e.g. for Unreal Engine 4), APIs (in particular, for Python), and an impressive suite of additional tools, tutorials, and example use-cases. I compiled this library with -DBUILD_WITH_CUDA flag (“looky here”) thus yielding 60FPS, although running with the lowest resolution (which was far more than I needed).

I didn’t mention one more hardware component: an Arduino-like microcontroller, Teensy-LC. The Jetson communicates with this microcontroller which then sends a PWM signal that controls the power of the engine and the steering angle of the front wheels. For that, the microcontroller needed a small program which I wrote using the standard Arduino IDE. The source code can be found here.

Also, I used a wireless game pad for manual control, and no, I didn’t have to program that, but it’s good to mention this now because I wanted to make a general point regarding software: it was getting complicated. There are all these peripherals, each with its own API, each sending messages asynchronously at its own rate, each prone to its own set of errors... you get the idea. And the aim is to bring all these elements together, collect data for training, deploy the model, use its predictions to control the motor and the steering angle, improve, and repeat. Phew.

To manage all that, I used a framework called the Robot Operating System (ROS) which isn’t actually an operating system (strictly speaking) but it’s so rich in functionalities, submodules for all sorts of sensors, tools for messaging, for data visualization, and so much more — that by virtue of its utility it deserves the “operating system” title. If you haven’t heard about ROS before, check out the Wikipedia article, it gives a pretty good overview.

But, roughly, ROS abstracts the various sensors and procedures being calculated as a computational graph in which nodes correspond to processes and edges to topics through which processes get their required data in a form of structured messages. I’ve written those terms in italics because those are terms actually used in ROS (for a reference check out the Concepts ROS wiki page). The messages are passed asynchronously, each at its own rate, with processes being triggered by the arrival of a new portions of data.

I used ROS’es wrappers for communicating with the depth camera, with the game pad, and with the microcontroller. But ROS is not just a repository of wrappers, it’s also (and in my opinion, first and foremost) a development platform. Thus, I used ROS to implement a so-called package for controlling the car, traxxas_control, and if you’re interested, I’ve open-sourced the code.

The traxxas_control package is organized into nodes for controlling the car using: a neural network, a game pad, and a heuristic based on the depth map. I gathered data for training by steering the car myself, and by using the heuristic. In the next section I’m discussing how I actually trained the model and on which data.

Training the model

I trained a neural network model to clone some method of steering the car, human-generated or deterministic. As mentioned above, I used a pipeline developed for a previous project, however I needed to keep the model simple so that the inference on the Jetson was fast, yet complex enough to perform well. The architecture uses a single convolutional layer and predicts steering angle and throttle for 10 steps into the future. The architecture is summarized in the following plot (it’s a bit perplexing, I know, but you can check out the code for a detailed definition):

When driving at low speed (see the movie in the Results section), I used the steer__1 prediction, i.e. steering angle 1 step into the future. The reason is that getting an image from the camera, processing it, producing a prediction, and then communicating that to Karr’s actuators — all this takes time, a prediction for that image is overdue. I’ve calculated the latency and it was roughly the same as the rate at which the images come from the camera. But that’s a rough estimation, I’ll need to account for speed of the car in the future. For medium and high speeds I used steer__4 and steer__8 respectively, they seemed to be handling the car better but I haven’t really made a systematic analysis, it’s completely subjective.

I’m ignoring the model’s throttle predictions for now. It’s not that I don’t trust the model (I don’t), I just wanted to keep the setup simple, focus just on the steering angle.

Also, going back to inference time, I needed to reshape the depth map images from 424×240 to 42×24. This is how the world looks like after reshaping (I’ve also included model predictions):

Not great, not terrible. I wouldn’t want to drive in those conditions, but the lower resolution didn’t deteriorate the quality of the model that much — MSE on the test set was ~0.010 and ~0.009 correspondingly for the decimated and normal resolution variants.

On a different note, I anticipated there will be problems in mimicking the way a human controls the car. In a previous blog post I focused on how different sources of data (human-generated vs. deterministic) influence the quality of the model and concluded that it’s relatively difficult to train a model based on expert data (i.e. generated by a human driver).

But that was in a simulator — what’s the case if we’re operating in reality?

First approach: human-generated data

As it turned out, reality aggravates the problem with training a model on expert data. I was able to get the model to ride around the track, but it required multiple corrections and very low speed. I’m not going to show its performance in this blog post, I think I might work on it a bit more. I’m confident I could gather the right type of data (drive away from the track’s barriers and thus “teach” the model how not to get off the track, like I did in a simulation). But I knew it’ll be easier to train a model based on a more consistent method of steering than that coming from a human, and that’s what I focused on.

The behavioral cloning approach is pretty hard and I’ve already mentioned the reasons “why” in earlier blog post, but suffice it to say that humans base their decision on more than an ML model has at its disposal + they’re rather inconsistent. But I refuse to leave it like that, I’m currently working on this problem and… we’ll see, that’s all I can say for now.

Second approach: heuristic based on depth map

Instead, I used a simple, deterministic procedure that simply steers in the direction in which there’s the most space to go. It’s a variation of a method called “follow the gap”, with a few differences that account for: 1) the fact that the view is constantly shaking, 2) noise in the depth map, and 3) a pretty narrow field of view of the camera.

Roughly, the idea is to take a set of points lying on an elliptical arc, take depth readings in the neighborhood of each point, calculate the average depth in that neighborhood, and then choose the point with the highest average depth. That point corresponds to a steering angle. I stored a decimated image and its corresponding angle, and those pairs (image, angle) were then fed to the model. On the left is a depth map in which the points lying on the arc were replaced by their average neighborhood distance.

Prediction took, on average ~35ms (on the Jetson) using this method. That’s actually not bad and it’s definitely “driveable”. However, a neural network model trained on the labels generated with this heuristic yields a prediction (including communication and pre-processing) in ~3.5ms even though the GPU has just “only” 256 CUDA cores.

This is a neat trick: make a costly computation on a larger machine, use the result as label for a neural network, train it, and then use its predictions for making decisions on a smaller, embedded system.

Results

I’m going to first introduce you to the pond next to my flat around which Karr was driving, then I’ll show three videos of the car driving at: low, medium, and high speed. And then, I’ll include a short video of how the same neural network performed in an entirely different environment.

Driving around the pond

Here’s how the pond looks like on Google Maps:

And here’s a video from the neural network controlling the steering angle and making a single lap around the pond, low speed:

Here’s a lap at medium speed (sorry for the shaky camera, I had to run behind the car):

And here’s one of the turns at high speed:

I didn’t capture the whole lap because I didn’t trust myself to react fast enough to stop the car and avoid any possible collisions, but also: I would look absurdly sprinting around the pond behind a “toy car”.

As a bonus, and to give you an idea of how fast a car like this can go, here’s a short video at maximum speed, on a straight, but still controlled by the neural network:

Driving in a cellar

I used the exact same NN model and tested it out in a completely different environment (full movie):

Two hacks and—et voila, the model can take a U-turn in a completely different environment!

This is a corridor in my cellar and it’s just one turn, I know, but it’s a 180°, much tighter turn than any of the ones Karr experienced driving around the pond. I had to use two hacks to make it work (can you guess what they were?), but honestly, it was way easier to get to work than I anticipated. And as a result a model trained on data gathered around the pond also works indoors.

I think that’s awesome :)

Acknowledgements

I would like to thank:

the company Skriware for 3D-printing the blue body parts you could see on top of the Traxxas chassis
the organizers of the F1/10th competition for providing excellent materials for amateurs like me to follow and build my own autonomous RC car. The upcoming F1/10th competition will take place in New York, in October (we’ll be there!), and the one after that will be in Berlin, probably in April or May 2020. It’s an awesome event, with great atmosphere and people, and you, Dear Reader, should definitely take part in it ;)
and last but not least— my Dad for building the scaffold to which the camera was attached.