Convolutional neural networks and their application in self driving cars

Isma-Ilou Sadou
5 min readJun 4, 2018

--

What are convolutional neural networks anyway? Can they do more than just identify cats?

What are convolutional neural networks

Computer vision is one of the fields that has been rapidly advancing thanks to deep learning. It is driving most of the recent innovations. You may have noticed the increase in the accuracy of Snapchat and Instagram face filters. Deep learning computer vision is also helping autonomous vehicles and robots figure out their surrounding, or more precisely what they see (object classification) and if they see an object (object detection). At the core of all of these are neural networks.

Figure 1: Neuron in the human nervous system

Neural networks are inspired by the working of the human nervous system. The basic unit of the nervous system is the neuron. Each neuron is composed of dendrites which take inputs from other neurons, the cell nucleus which processes the inputs and the axons which transmit outputs.

Similarly, a neuron (node) in neural network takes several inputs, assigns different weights to each before summing them up and passing the net input to an activation function which in turn produces an output. The main benefit of using neurons is that they can adjust the weights to obtain the desired output during training. In other words, they can “learn.” When many of these neurons are interconnected, they form a neural network. Neural networks are structured in layers with every neuron in one layer connected to every other neuron in the next layer.

Figure 2: An artificial neuron (left) and neural network (right)

Adding convolutional layers to a neural network is what turns it into a convolutional neural network. What a convolution layer basically does is detect useful features. The layers closer to the input detect lower level features like edges while the ones closer to the output use these features to detect more complex ones like faces.

A convolutional layer is usually followed by a pooling layer. As mentioned above, every node in one layer is connected to every other node in the next layer. If for instance the first layer has 1000 nodes and the second 500, there will be 1000 x 500 that is 500,000 connections. This also means the neural network will be focusing on so many features. A pooling layer reduces the representation size by breaking the input into regions and selects the strongest feature in each of them. This speeds up computation and makes the features detected clearer. At the end of a convolutional neural network is usually a fully connected layer which uses the features detected to classify the input into various classes according to the dataset.

Figure 3: Example of a convolutional neural network

Before proceeding, we need to know that computers do not see images the way we do. Each image is viewed as 3 matrices of pixel values (1 for each RGB channel). The convolution process makes use of “feature detectors” called filters or kernels. The logic behind this is that if a filter can detect a certain feature in part of an image. it can probably detect similar ones in other images or different parts of the same image.

Figure 4: Convolving a 5x5 image using a 3x3 filter. Source

Convolutional neural networks in self driving cars

As part of a team that recently won the national mini autonomous racecar competition OpenZeka MARC, I was able to test these on a real mini vehicle. The mini car I used is equipped with an NVIDIA Jetson TX2 Developer kit, a ZED camera, a LIDAR, and an Inertial Measurement Unit. The car uses the Robot Operating System, ROS. For the sake of this post, I made an experiment with only the ZED camera and convolution neural networks. Below is an overview of my approach.

Figure 5: A Racecar/J mini autonomous racecar. Source

The first step was collecting data for training. In ROS, each sensor publishes data on a topic. My script in charge of collecting data subscribes to the ZED and VESC motor controller topics to get image frames, steering angle and speed. It then saves each image frame in a dataset folder and its path, corresponding steering angle and speed in a csv file.

The second step was using the data collected to train 2 models: one for the steering angle and one for the speed. I was able to use the same convolutional neural network architecture for both models.

And lastly, I imported both models and used them to predict the steering angle and speed just by using images from the ZED camera as input. The results were astounding as the car was able to drive, follow traffic signs, and park by itself with a fairly good level of accuracy.

This accuracy could have been increased by using the other sensors, providing a better training dataset, and pre-processing the images say for instance using OpenCV to reduce light reflection and colouring the lanes. This experiment at least achieved its objective of showing the strength and applications of convolutional neural networks.

In a real life situation, an autonomous car needs a lot more sensors, and a way to store and transport the massive amount of data generated. Collecting enough training data for safe driving is also an issue. Whether or not today’s technology is enough to find cheap solutions to these problems while increasing safety is a topic for debate. The only way to find out though is by trying to solve them.

This is my first post on Medium. If you liked it, don’t forget to show 💛 through 👏 👏 👏 and follow me on Medium or LinkedIn. Your remarks and suggestions would be greatly appreciated.

--

--