Autonomous Driving using Deep Learning and Behavioural Cloning.

An autonomous car is a vehicle that can guide itself without human conduction. It is also known as a driverless car, robot car, self-driving car or autonomous vehicle. Autonomous cars use various technologies- they may use GPS sensing knowledge for navigation, and various sensors to avoid collisions. There are many companies playing in this space- Google, Nvidia, Uber, Waymo are some of them.

Deep Learning is one of ways to make autonomous driving possible. This tutorial will use Nvidia’s “End to End Learning for Self-Driving Cars” network. We are going to use Udacity’s Self-Driving Car Simulator to do this. Udacity built this simulator for their Self-Driving Car Nanodegree and have now open-sourced it. It was built using Unity- Unity is a cross-platform game engine developed by Unity Technologies, which is primarily used to develop both three-dimensional and two-dimensional video games and simulations for computers, consoles, and mobile devices.

There are 3 parts to the whole process:

  1. Data Generation for the Autonomous System
  2. Training the Autonomous System
  3. Testing the Autonomous System

Data Generation for the Autonomous System

This part is inspired by Nvidia’s Self-Driving Car’s data collection process. They attached 3 cameras to a car. One is placed in the center and the other 2 are placed on each side of the car. They recorded the steering wheel’s data- capturing the steering angle. So, in our process, we are going to capture:

  1. Images from Center, Left and Right Cameras.
  2. Steering Angle.
  3. Speed of the car.
  4. Throttle.
  5. Brake.

These recorded values are stored in a CSV file. Here’s a glimpse of what the CSV file looks like:

This is what the Nvidia’s Hardware design looked like:

High-level view of the Nvidia’s data collection system

The steering wheel is attached to the system via the Controller Area Network(CAN) to feed the value of steering wheel, throttle, break. The cameras are also connect to the system where they are feeding in continuous stream of video data. The system is called Nvidia Drive PX- Nvidia Drive is a AI platform that allows the users to build and deploy self-driving cars, trucks and shuttles. It combines deep learning, sensor fusion, and surround vision to change the driving experience. A Solid State Drive is used to store all the data that is collected. The Udacity’s Self Driving Car simulator mimics the same.

You can find the simulator here. It’s a binary file! You don’t have to worry about compiling it. You just double click on it and the simulator starts! So cool!

The simulator has 2 modes- Training mode and Autonomous mode:

The Simulator’s two modes.

The training mode looks like this:

Training mode

In the training mode, the simulator captures the images from the 3 cameras, speed value, throttle value, brake, and steering angle. You have to click on the record icon on the top right of the screen to start the recording.

The autonomous mode looks like this:

Autonomous mode

In this is mode the simulator acts as a server and the python script acts as client. You will know what I mean as you read on.

Training the Autonomous System

The data collected during the Data Generation part involved collecting all the camera images, steering angle and others while a human driver was driving the car. We are going to train a model that clones how the human was driving the car- essentially clones the driver’s behaviour to different road scenarios. This is called Behavioural Cloning. To formally define it- Behavioural cloning is a method by which human sub-cognitive skills can be captured and reproduced in a computer program.

Below is graphical representation of how the training works:

Training the Neural Network

The images captured from the 3 cameras are randomly shifted and rotated, and then fed into the Neural Network. Based on these inputs, the Neural Network would output a single value- the steering angle. Essentially, based on the input images, the Neural Neural decides by what angle the car must be steered. This output value is compared with the steering data collected from human driving to compute the error in the Neural Network’s decision. With this error, the model uses Backpropogation algorithm to optimize the parameter (weights) of the model to reduce this error.

We will use Nvidia’s Convolutional Neural Network(CNN) architecture. Here’s how the network looks like:

Nvidia’s CNN Architecture

The network consists of 9 layers- a normalization layer, 5 convolutional layers and 3 fully connected layers. The input image is converted to YUV. The first layer normalises the image. The normalisation values are hardcoded and it is not trainable. The convolutional layers perform feature extraction. The 1st 3 layers use strided convolutions with a 5×5 kernel, and the last 2 convolutional layer use non-strided convolution with a 3×3 kernel size. The convolutional layers are followed by 3 fully connected layers. The fully connected layers are supposed to act as a controller for the Autonomous system. But, given the end-to-end learning of the whole network, it is hard to say if the fully connected layers are the ones solely responsible for the controller.

Testing the Autonomous System

The testing happens using only the center camera images. The center camera input is fed to the Neural Network, the Neural Network ouputs the steering angle value, this value is fed to the Autonomous car. The Udacity’s Self Driving Car simulator follows the Server-Client architecture as follows:

Simulator (Server)- Neural Network (Client)

The server will be the simulator and the client would be the Neural Network, or rather the Python program. This whole process is a cyclic feedback loop. The simulator outputs the images, the python program analyses it and outputs the steering angle and the throttle. The simulator receives this and turns the car accordingly. And the whole process goes on cyclically.

The Data

When you really think about it, you realise that the data must be imbalanced. Well, you drive straight more than you take turns. So, the data is highly tilted to straight drive images than turn images. Here’s a bar-graphical representation of the data that I generated while driving in the manual(rather training) mode:

Distribution of Training data without any augmentation.

As you can see, the training data is dominated by 0-steering angle images. There are some left steering image but right steering images are fewer. If you train your model on this data without accounting for the imbalance, there is a high probability that your model will have a great bias towards the 0-steering angle images. When I trained the model directly on this, I noticed that the car hardly steered and it kept driving straight.

There are three ways to handle this:

  1. Cut down on the straight drive images.
  2. Include the left and right camera images.
  3. Augment the turn images.

Cut down on the straight drive images

Well, cutting down on the straight drive images are easy. You just have to filter out images whose steering angles are 0. Or, you can choose a small angle instead of 0. I chose 0.15 degrees.

Include the left and right camera images

As you know that there are three cameras attached to the car. So far, we were only using the center camera. We can use the images from the left and right camera as well. But, you can’t directly use the left/right camera images as you would have to account for the shift in camera. To account for this, we add/subtract a steering correction value. I chose 0.25 to be the steering correction value. Below is a bar-graphical representation of training data after adding the left and right camera images:

Distribution after adding the left and right camera images.

Augment the turn images

We can augment the turn images by flipping them horizontally, zooming them a bit and brightening them. There’s only so much you can do to augment the turn images. Even then the 0-steering angle images might dominate the other images. You can consider eliminating the near-0-steering angle images altogether as they don’t really carry much information. In this case, we are only concerned about steering the car left or right when needed. Below is a bar-graphical representation of the training data after elimination near-0-steering angle images:

Distribution after augmentation and filtering out images with steering angle magnitude less than 0.15

Data Pre-processing

Pre-processing the data in the right format is very important. When you run the simulator in the training mode, it saves the image in RGB format. But, our Neural Network accepts images in YUV encoding. We will have to convert the RGB image to YUV encoding. Here’s how the RGB and YUV image looks like:

RGB Image
YUV Image

When you look at the above image, you will realise that not all the information in the image are required or maybe some information might hinder the learning of our Neural Network. The only thing the Neural Network needs to concentrate on is the road while ignore everything else (the sky, the car’s bonnet and the side view of the road). We will crop out this data to retain only the road information in the image. The images before and after cropping look like this:

Before cropping
After Cropping

As you can see, cropping the image narrows the information fed to the Neural Network.

We can also blur the image to smoothen it. Here’s how the blurred image looks like:

Training Information

I collected data for 5 laps and then randomly turned on the recorder after driving the car to the edge of the road so that it captures only the turning part. This way, we will have more data for the turning scenario. I trained the model for 5 epochs due to resource constraints.

Results

Here’s a video of how the trained Neural Network drives the car:

The Neural Networking driving the car!

You can find all the code in this repository!

Credits

The credits go to Udacity for open-sourcing the simulator, Nvidia for sharing their architecture with the world!