Self drives me crazy: from 0 to autonomous car in 150 hours

Project M
Project M
Feb 19, 2018 · 9 min read

How hard is it to build a self-driving car with a budget of $60 in more or less 150 hours? Well, (spoilers alert) more than we have thought. In this post, we explain how we have assembled and successfully trained a robot car using a deep learning framework.

Image for post
Image for post

The plan

Image for post
Image for post
Training the robot

Okay, it didn’t look like a new Tesla at all (can you see the fancy rubber bands that we used to make everything stick together?) but it turns out to be a good proof of concept. We are releasing the code on GitHub so that anyone can replicate the results.

Robot details

For our driving purposes, we synchronized the motors to achieve all our high-level commands — “up”, “right” and “left” (since we used the arrow keys to command the robot, by “up” we mean “forward”).

Image for post
Image for post
Shakey 2 (2018) and the original Shakey (1972)

One major issue when dealing with embedded systems is the limited resources available; tons of great ideas don’t come out of the paper because of memory and processor constraints. Fortunately, at least for driving our robot car, this wasn’t an issue.

The program that drives the NXT car was written in Python using the open source library nxt-python, and ran entirely on Raspberry Pi. This approach relied on the Lego Communications Protocol (LCP), embedded in leJOS and Lego firmware, to send and receive data from actuators and sensors without having any code actually running on the NXT intelligent brick (the “robot’s brain”).

Imitation as supervised learning

From the machine learning perspective this problem is quite straightforward. We can see the driving task both as a classification problem or as a regression problem.

  • As a classification problem we have a collection of images and commands; for each image we associate the command as the image’s label. We train a parameterized model to map each image to a distribution over the classes. It’s desirable that this distribution is close to the one that has generated the dataset, hence we train the model using the maximum likelihood estimation method.
  • In the case of regression we associate each image to a vector of real values (acceleration, steering wheel angle, etc.). Here our parameterized model tries to generate predictions as close as the real ones, so we train the model by minimizing some error measurement between the model’s prediction and the ground truth (like the mean squared error).

As a first approach, we decided to frame the driving problem as a classification one. We abstract the robot’s control to only three categories: “up”, “left” and “right”. So, we search for a function f mapping 45x80x3 road images to a probability distribution over the actions.

Image for post
Image for post
Parameterized model being applied to an image

The function f could be from any family of models, in this experiment we restrict ourselves to two families only: Deep Feedforward Networks (DFN) and Convolutional Neural Networks (CNN).

The Data

Image for post
Image for post
Training track

The data collection phase was composed of laps along the track. We controlled the car and most of the time we tried to drive the car in the center of the lane. We got almost 4 hours of data (way less if compare to Nvidia’s 72 hours of training data). In this kind of track we saw ourselves going straightforward way more than turning, as a result we acquired a non-balanced dataset.

Image for post
Image for post
The data histogram

In order to fix this, we created new data points by flipping images with labels “left” and “right”.

One of the bottlenecks of this project is the “forward pass”, i.e. calculating the probability distribution from an image. All the training was done in a Desktop computer way more powerful than the Raspberry Pi, but to perform the car’s control the forward pass takes place inside the Raspberry Pi. So we decided to experiment using images with one channel only (this reduces the number of features from 10800 to 3600). The transformation that we got the best result was the binarization.

Image for post
Image for post
Binarized images

Hence, we ended up with two types of models: the ones trained on the original dataset, and the ones trained on the binarized dataset. For anyone interested in the data, it’s also available on GitHub.

Deeper is better?

Image for post
Image for post
Result table for different DFN models

The simplest model — the softmax classifier, [3] — already presents some interesting results; we’ve noticed that it was easier to get good results for the categories “left” and “right” (with accuracy of more than 80% for each one) but we have struggled with the accuracy for the label “up”. The car should be able to turn properly, but it should also know how to exploit any straight path (not only the ones in the middle of the lane) in order to move forward. Hence we prefer models with good accuracy on all the different categories (like the one with architecture [1333, 200, 3]).

To describe the different CNN models we used a similar notation: “[(24, 5), 731, 3]” indicates that there is one convolutional layer with 24 filters (with kernel size= 5x5) one hidden layer of size 731 and one output layer of size 3 — we always add a max pooling layer (with kernel size= 2x2) after a convolutional layer.

Image for post
Image for post
Result table for different CNN models

One might think that a deeper architecture model with high accuracy, like the DFNs with two hidden layers, would be the best choice. That would be reasonable if we had enough computational power and did not require an almost real-time response from our system. So for our on-board processing requirement, shallow networks under 1000 units were a better choice.

Using the accuracy on the test set we ended up selecting 4 DFN and 2 CNN models. The best model was a CNN with architecture [(36, 5), 3] trained on the original dataset. The confusion matrix for this model is shown below:

Image for post
Image for post
Confusion matrix for a CNN with architecture [(36, 5), 3] using images without preprocessing

Well, did it work or not?

Yes, it did!

Model “simulation”

At the end, we choose two models to perform in the real world:

  • a DFN with architecture [276, 3] using the original image as input.
  • a CNN with architecture [(36, 5), 3] also using the original image as input.

Both models presented a forward pass time of approximately 1.35 seconds. The models trained on the binarized dataset were faster (with forward pass time of approximately 0.6 seconds), but the accuracy of these models were no better than the accuracy of the models mentioned above. At the end, we decide to be slow and safe. And boy, oh boy, the robot car was slow!

Self-driving on training track — video 2x faster

Once our choice of design was a differential drive car, we had no trouble with the power applied for each wheel while turning “left” or “right”, since the car can turn on its own axis. However for “up” labels, finding the right amount of power was a major thing for achieving success on this project.

We started with 20% of power for both wheels and at first it seemed as a good choice, but we had some problems with the “up” command. We noticed at this point that the robot predicted the correct turn action but performed it outside of the paper track, due to its velocity.

To cope with the processing time of “take image — predict action — execute action” loop, we choose to slow down to 10% of power in forward movements. Which made it slow as turtle, but it got the job done.

After successfully driving in the training track we assembled a test track with a new shape (new types of curves and different lane sizes) to check if the model could generalize to tracks that it had not observed previously.

Image for post
Image for post
Test track

And it did!

Self-driving on test track — video 2x faster


The use of the NXT Lego robot doesn’t make this project a low-cost one (one NXT kit costs $549.99). We chose this robot simply because it was available for us. If you don’t have it, there are a lot of different and cheaper options available (for example, the project DeepPicar uses a New Bright 1:24 scale RC car — it costs only $10). Our code can easily be adapted for any car platform, so we welcome anyone to use it and contribute to it :)

Well, the fun isn’t over. There are also a lot of things to pursue next, here are some examples:

  • Sometimes the kind of data that the model sees in training is way different than the data that it sees while driving in the real world: a miss-classification (for example, changing the command “left” for “right”) can put the car in a different position in the track exposing the car to images never encountered before. We think that one way to circumvent this problem is to create a new dataset where we control the car normally but we introduce some random disturbances while driving.
  • With better hardware, it is possible to use robust CNN models together with some visualization techniques to have some understanding of the network’s inner workings.
  • The Ackerman drive model is the standard in robotics cars community. A natural next step is to combine our framework with this model, allowing the robot car to accelerate and steer at the same time. This would require a new and refined dataset, that associates for each image a label composed by steering angle and motor power, also reviewing all the machine learning process.

The road to autonomous vehicles is not a short one, but we hope to give you the sense that it is way easier than it seems to take the first step.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store