Teaching a Machine to Steer a Car

We challenged our students & they delivered. All code open sourced!

Oliver Cameron

Published in

Udacity Inc

9 min readDec 19, 2016

The first placed model from **Team Komanda**. **Green** is the predicted steering angle, and **blue** is actual.

The Udacity Self-Driving Car

As detailed in this post, a critical part of our process in launching the Self-Driving Car Nanodegree program is to build our own self-driving vehicle. To achieve this, we formed a core Self-Driving Car Team with Google Self-Driving Car founder and Udacity President Sebastian Thrun. One of the first decisions we made together? Open source code, written by hundreds of students from across the globe!

We are breaking down the problem of making the car autonomous into Udacity Challenges, and today we are excited to announce the results of Challenge #2 and open source all the models and code!

udacity/self-driving-car

self-driving-car - Information about the Udacity open source self-driving car project

github.com

What an experience! 51 teams with 312 participants participated in our Udacity open source self-driving car challenge. The challenge? Use Deep Learning to predict steering angles, a key function of any self-driving car.

Just some of the Challenge #2 participants!

Here’s a quick reminder of what the challenge entailed:

You may have seen this incredible video from NVIDIA, one of our Nanodegree partners, which highlights their efforts of teaching a car how to drive using only cameras and deep learning. The second challenge for the Udacity Self-Driving Car initiative is to replicate these results using a convolutional neural network that you design and build! End-to-end solutions like this, where a single network takes raw input (camera imagery) and produces a direct steering command, are considered the holy-grail of current autonomous vehicle technology.

We couldn’t be more thrilled with how this challenge turned out. Along the way we open sourced over 10 hours of driving data (under the MIT license), but more importantly we saw incredible energy, enthusiasm and creativity from our participants. We’ve invited our top three teams to write about their experience, and how they achieved what they did. I’ll let them take it from here!

1st Place: Team Komanda

Team Lead: Ilya Edrenkin
View Model on GitHub

Green = predicted, blue = actual

I have considered the problem as a specific instance of a sequence-to-sequence mapping problem: the model had to map the sequence of images to an equal-length sequence of the steering decisions. Of course this mapping had to be causal, i.e. the model was only allowed to look at the camera samples from the past to predict the steering angles in the future.

The model architecture was composed of a convolutional vision stack followed by a stateful recurrent network. Convolution was performed not only in the image plane, but also in time; it allowed the model to build motion detectors and account for the driving dynamics. The recurrent cell tracked the state of the vehicle, which was supported by an auxiliary cost function — the model tried to predict not only the steering angle, but also the steering wheel torque and the speed of the vehicle. Well-established methods like residual connections, layer normalization and an aggressive regularization via dropout were necessary to obtain the best results.

Fun fact: the resulting model slightly resembles statistical text-to-speech models, where output vocoder frames are predicted from a sequence of linguistic features. A related trick is a joint optimization of an autoregressive and ground-truth-fed models. Hopefully it could also make the driving behaviour of the model more human-like.

I have provided an extensive writeup to my solution. If something is unclear, please drop me a line, I will try to clarify it.

I would like to express my gratitude to:

The Udacity team, which has collected and open-sourced large amounts of the driving data, and successfully organized this challenge despite all the technical difficulties;
Ross Wightman, whose open-source tools for ROSbag data preparation helped me and other participants to get started and became a “standard” for this challenge;
Sergey Zimin, who told me about this challenge and invited me to participate in it!

I am excited by the fact that the data generously open-sourced by Udacity can be used for building and testing driving models that are not constrained by the challenge conditions and are generally more realistic. The code that I provide could be possibly used as a starting point for some approaches.
Modern machine learning enables us to build fantastic things, and I am really glad that we have such a powerful tool to improve our world.

For professional matters I can be found on LinkedIn.

2nd Place: Team Rambo

Team Lead: Tanel Pärnamaa
View Model on GitHub

When I was writing my bachelor’s thesis in 2012 about Bayesian nonparametrics for novelty detection I got curious: How is it possible for Google to have self-driving cars while my machine learning experiments are greatly affected by noise and outliers. My application could tolerate misclassifications, but these are very costly for autonomous cars. Four years later, I became very fascinated when I heard that Udacity was building an open-source self-driving car, and I would have a chance to play around with real driving data.

This is a toy example in many ML 101 classes showing how to break linear regression. Most of the teams used the same Euclidean loss when predicting steering angles from images. How do we know it’s all good?

Data Pre-Processing

The first phase of the challenge was based on driving data from El Camino Real (small curves, mostly straight driving), and the second phase was based on driving data from San Mateo to Half Moon Bay (curvy, and highway driving). Even though we ended up downloading over 100GB of driving data, the duration of the driving was around 1h which is clearly not enough to build a generalizable end-to-end solution. We tried to pre-process the data in a way that makes modelling motion easier for the network. We had high hopes for optical flow features, but they didn’t seem to help.

We ended up resizing input images to 256 x 192, converting them to grayscale, computing lag 1 differences between frames and used 2 consecutive differenced images as input. For example, at time t we used [x_{t} — x_{t-1}, x_{t-1} — x_{t-2}] as input where x corresponds to the grayscale image. No future frames were used to predict the current steering angle.

Model

Our final model consisted of 3 streams that we merged at the final layer. Two of the streams were inspired by the NVIDIA self-driving car paper, and one of the streams was inspired by comma.ai’s steering model.

We were surprised that many tricks that work well with classification networks did not transfer over to this regression problem. For example, we did not use dropout, batch normalization or VGG-style 3x3 filters. It was hard to get the model to predict something other than the average value. Of course, there might have been a problem in the hyperparameter selection.

We tried to discretize the regression output to bins and use cross-entropy loss at first to pretrain the network and later fine-tune the network with regression loss. Our initial results were not encouraging, and we abandoned this approach because of time. However, we still believe that this would be a fruitful approach for steering angle regression.

Model Predictions

It’s good to see RMSE metric decreasing, but visualizing the predictions gives much more insight and helps to spot the weaknesses of the model. For example, in the following visualization we can see how the model does on curvy parts. The predicted angles are more-or-less reasonable, however, we can also spot that human steering angles go deeper on high curves.

Curvy roads. White circle shows the true angle, black circle shows the predicted angle.

In addition, human steering movements are smoother on straight roads while the model zig-zags. Our final model is only trained on phase 2 data. More data or output smoothing would make the model better.

Straight road. The model zig-zags while human movements are smooth.

3rd Place: Team Chauffeur

Team Lead: Matt Forbes
View Model on GitHub

What a fun project! I went from mildly interested to deeply obsessed within the first week of this challenge. My first goal was to build out some tooling so that my teammate and I would have a unified format for working with the data, applying transformations, as well as filtering, advanced subsampling, reproducible/downloadable models, etc. While I kicked off this work, Nishanth set out to get an MVP model out, using NVIDIA-published topology as a starting point. After some tweaking and tuning, we ended up with a model that spit out believable outputs and we were on our way!

Having never working in the self-driving car space, my next approach was to simplify the problem. Re-framing from “give me an exact steering angle at every frame” to “should we turn left, right, or stay straight?” made the problem sufficiently easy to get some quick intuition on what works and what doesn’t. We expanded from left/right/center classification to arbitrary partitions in the output steering angle space. Eventually we found we could start predicting exact angles directly.

Our submission in the first round was a type of mixture model that first predicted whether the car was currently in a sharp turn and then used a specialized model if so. We decided to drop this approach in the final round as we knew the only way to compete was to use RNNs. Using an RNN to drive a car is just begging for trouble. Any error in predictions will compound, causing the model to get into a bad feedback loop. This is not an issue in the training set because your predictions don’t actually influence future actions since it’s all pre-recorded. Nonetheless, the competition was not testing for road-worthiness so we pushed on.

Our RNN was trained using transfer learning. We started with a big expensive stacked CNN which essentially performs automatic feature extraction on the input images. I think we reduced the dimension to about 3000 features in the final feature maps of this model. We then applied the CNN feature extraction to every image in the training set, caching it entirely in memory. At this point, we had a ton of flexibility on the RNN training in terms of number of timesteps, width/depth of the model, etc. without paying the huge computational price of evaluating/training the convolution layers. In the end, our best model used windows of 50 images (2.5 seconds of video) in a pretty simple LSTM network.

Initially all our training was performed on AWS using their GPU instances. This quickly became way too expensive considering the deadlines were extended multiple times and we just kept iterating. I decided it’s not worth throwing money at Amazon, so I built a computer and threw a couple NVIDIA GTX 1080s in it and we used that instead. I’d recommend this route to pretty much anybody willing to get their hands dirty because it will quickly pay off compared to using a cloud service, and you get to keep the machine for gaming!

Building an open-source self-driving car is a great mission and it has been fun to contribute in this way! I am excited and optimistic about the future of this project.

Congratulations to our winners, and a huge thank you to everyone who participated in this challenge!

We see the results of this challenge as just the beginning of our open source self-driving car project. The power of open source is constant iteration and improvement, and we can’t wait to see where the models we’ve open sourced today will be in 6 months. As these models are trained on more and more data (including augmented datasets), we’ll see continued increased performance in real-world environments. We hope you’ll take some time to check out the code, and send pull requests.

Want to learn more about self-driving cars? Apply for admission to our Self-Driving Car Nanodegree, partnered with great companies like Mercedes-Benz, BMW, NVIDIA, Otto, McLaren and more!