Coding a Deep Neural Network to Steer a Car: Step By Step
Using only camera frames as an input
We recently announced the winners of Challenge #2, where Udacity students were tasked with building an efficient deep neural network that could steer a car using only camera frames. We saw over 200 submissions from 50 teams all around the world, and we decided to open source the best performing models! Browse the models (and code!) here.
Here’s a quick reminder of what Challenge #2 entailed:
You may have seen this incredible video from NVIDIA, one of our Nanodegreepartners, which highlights their efforts of teaching a car how to drive using only cameras and deep learning. The second challenge for the Udacity Self-Driving Car initiative is to replicate these results using a convolutional neural network that you design and build! End-to-end solutions like this, where a single network takes raw input (camera imagery) and produces a direct steering command, are considered the holy-grail of current autonomous vehicle technology.
The top scoring network (measured by how close the steering angles generated were to a human) for Challenge #2 was built by the amazing Ilya Edrenkin, a Senior Researcher at Yandex. He generously wrote an iPython Notebook explaining how his neural network was constructed, and I thought it needed to be shared with the world. Enjoy!
Some further notes from Ilya:
I have considered the problem as a specific instance of a sequence-to-sequence mapping problem: the model had to map the sequence of images to an equal-length sequence of the steering decisions. Of course this mapping had to be causal, i.e. the model was only allowed to look at the camera samples from the past to predict the steering angles in the future.
The model architecture was composed of a convolutional vision stack followed by a stateful recurrent network. Convolution was performed not only in the image plane, but also in time; it allowed the model to build motion detectors and account for the driving dynamics. The recurrent cell tracked the state of the vehicle, which was supported by an auxiliary cost function — the model tried to predict not only the steering angle, but also the steering wheel torque and the speed of the vehicle. Well-established methods like residual connections, layer normalization, and an aggressive regularization via dropout were necessary to obtain the best results.
And here’s a video of the model in action: