Building an Autonomous Vehicle Part 1: Using Behavioral Cloning to make a Self-Driving Car drive like yourself

Akhil Suri
6 min readApr 1, 2018

This is my first blog post. In this we’ll deep dive on the Deep learning that enables your car to drive autonomously and introduce the concept of behavioral cloning with Convolution Neural Networks (CNNs). CNNs have emerged as the state of the art modelling technique for computer vision tasks, including questions that your car may encounter, such as, “is there a straight road or turn in front of me?”

Introduction

The objective of this project was to apply deep learning principles to effectively teach a car to drive autonomously in a simulator. The simulator includes both training and autonomous (test 🚔) mode. In training mode you need to use your driving (gaming 😜) skills to drive the car on the track and collect the data which is required to train your network (CNN). The network learns from your driving behavior, i.e when to turn and when to move forward. After completing the learning/training, we need to save the model in a file which will be used by the simulator to test it in Autonomous mode (on a track which it’s never seen).

Approach

My solution to this project was influenced by LeNet, NVidia and comma.ai deep learning architectures for behavior cloning. I also used Keras deep learning library which used Tensorflow as a back-end, to create my model.

The overall strategy for deriving model architecture was to keep the car running on center of the road.

My first step was to use a convolution neural network model similar to the LeNet Architecture. I thought this model might be appropriate because it is one of the powerful and first successful image recognition models.

In order to gauge how well the model was working, I split my image and steering angle data into a training and validation set. I found that my first model had a high mean squared error (mse) on the training set. To reduce that, I normalized the input image and the mse stared decreasing. After training the network for 5 epochs, mse was low but the results were not up the mark 😔.

Then I tried NVIDIA architecture. In that architecture I cropped and down-scaled the input images so that the model can learn faster. But in this architecture I was getting a low mse on training set but high mse on the validation set. This implied that the model was over fitting 😭.

So to combat the over fitting, I modified the model by removing some Convolution Layers and Fully Connected Layers 😝.

After changing the model architecture and training it for 3 epochs, I finally got the training loss as 0.0146% and validation loss of 0.0117% 😋.

Creation of the Training Set & Training Process

To capture good driving behavior, I first recorded two laps on track one using center lane driving. Here is an example image of center lane driving:

Images taken from center camera(Track-1)

Then I tried the model on simulator to see how well the car is driving around track one. There were a few spots where the vehicle fell off the track. To improve the driving behavior in these cases, I augmented the data by flipping the images taken by center camera 🎥, adding more images in the training set taken from left and right cameras with steering angle correction 🛠 of +-0.20 degrees.

Also, I recorded some images of the vehicle recovering from the left side and right sides of road back to the center so that the vehicle would learn to keep itself in middle of the road on steep turns 🙈 . These images show what a recovery looks like.

Recovery Images

After the collection process, I had 25712 number of data points. I then preprocessed this data by converting the color scheme from BGR to RGB (since cv2 reads the image in BGR color scheme and simulator sends images in RGB format). Then I cropped and re-scaled the image to get the final image shape as 32, 80, 3. This helped in training my model faster 🚴🏻.

I finally randomly shuffled the data set and put 20% of the data into a validation set and rest in the training set.

Then I used this training data for training the model. The validation set helped in determining if the model was over or under fitting. The ideal number of epochs was 3 as after 3rd epoch the model started oscillating around 0.014% training loss. Also, I used Adam optimizer so that manually training the learning rate wasn’t necessary.

Final Model Architecture

My model consists of two convolutions neural network layers with 5x5 filter sizes each and depths of 24 and 32 respectively. I also did MaxPooling after both convolution layers.

The model includes ReLU activation function to introduce non-linearity, and the data is normalized in the model using a Keras lambda layer.

The model flattens the output of last convolution layer and then passes the flattened output through the 3 Dense layers of size 32, 16, and 1 respectively.

Final Model Architecture
  • Convolution Layer 1: 5x5 with 24 filters
  • Convolution Layer 2: 5x5 with 32 filters
  • Fully Connected Layer 1
  • Fully Connected Layer 2
  • Fully Connected Layer 3
  • Output Layer

Attempts to reduce overfitting in the model

The model contains 2 dropout layers in order to reduce over-fitting.

The model was trained and validated on different data sets to ensure that the model was not over fitting. I used sklearn library to split the data-set into 2 parts, one for training and another one for validation with the test_size of 20%. The model was tested by running it through the simulator and ensuring that the vehicle should stay on the track.

Model parameter tuning

The model uses the Adam optimizer, so the learning rate was not tuned manually.

Results

The results came as a pleasant surprise after several nights without any progress. Shown below are the results of the model on the test track.
I was surprised how well the car drove on the test track. It recovered successfully from a few critical situations 😮, even though none of those maneuvers had been performed during training 🤔.

Here for every frame the simulator sends the center camera image to the model in back-end to get the steering angle which is determined in real-time by processing the image and passing it through the network.

Conclusions
Summarizing, this was a really interesting project. It would be interesting to see whether recovery events can also be simulated from the real world data. Although for that we need a big model architecture and very large variety of data to train the network. The project cost me countless of hours of sleep over a week of time including cursing, but the result was well worth it. Deep learning is an exciting field and I feel I am lucky to live in these times of discovery 🙌

You can find this project and other Self-Driving Car related projects on my GitHub here. Though this project is completed, I would like to improve or maybe train a car to drive on the off road track. Please feel free to provide me any suggestions, corrections or additions in the comments :).

--

--