Behavioral Cloning. NVidia Neural Network in Action.

Machine Learning & Data Science A-Z Guide.

Dmytro Nasyrov
Pharos Production
4 min readAug 21, 2017

--

NVidia Convolutional Neural Network

Give us a message if you’re interested in Blockchain and FinTech software development or just say Hi at Pharos Production Inc.

Or follow us on Youtube to know more about Software Architecture, Distributed Systems, Blockchain, High-load Systems, Microservices, and Enterprise Design Patterns.

Pharos Production Youtube channel

This is a writeup on Project 3 from Udacity course Self Driving Car Engineer. This time we will talk about Behavioral Cloning. Our goal is to use manually collected image data to teach the car to steer left and right based on conditions around. We have a simulator created with Unity, we can drive a car on two different tracks like in Need for Speed in 1999. The car has 3 cameras on board — left, right and center camera. Cameras snapshot images of the road. We will use these images to train our neural network.

center | left | right cameras on track 1
center | left | right cameras on track 2

To collect more data from a single track we have to drive the car in both directions of the track. This should generalize the prediction of the model. Also, we can add image augmentation to simulate shadows and bright highlights — different environment — but in future.

We have 3 options for the network. We can create it from the scratch and pray to make it work, we can use NVidia neural network (see image above), and we can use Comma.ai neural network. Our first approach was to try to make a neural network by yourself. That approach sucked after 2 weeks of tries. We have chosen Nvidia’s solution.

You can find much more about this DNN architecture here:

All about comma.ai you can find here:

https://github.com/commaai/research

Input is a 3 channels image with 200 widths and 66 height. Images from the camera have a different resolution. So we need to prepare them to make it work. First, we crop them to the road range to avoid learning from the sky and trees. We can blur image just a little to make pixelated road lane smoother. Also, let’s convert the image to YUV from RGB. Also, we need to analyze and prepare the data to avoid a biased result, because we have a lot of straight drive. We will use data from both tracks of the simulator.

Original image
Scaled to 200x66
Blurred
In YUV colorspace

For the framework, we choose Keras to simplify our life with a Tensorflow backend. The first layer is a normalization to -0.5–0.5 from 0–255. Network scheme is presented above, for the activation layer, we will use ELU to make prediction smoother. Before the flatten layer we add dropout. Then we have a flattening layer and 3 fully connected layers. To save RAM we will use a batch generator. That’s all! Now we will run training for tens of epochs and check the result.

What we can improve here? Probably it’s a good idea to play with different color spaces combinations and use convolutional blur instead of plain Gaussian. Also, we need to collect more data from track 2 to make it less stuck to track’s environment. Also, it should be cool to try comma.ai’s network structure instead of Nvidia and to compare both of them.

You can find the full source code here.

--

--

Dmytro Nasyrov
Pharos Production

We build high-load software. Pharos Production founder and CTO.