Literature Review on the Application of Deep Learning in Building Self Driving Cars

An overview of various design approaches

Published in

The Research Nest

12 min readMay 28, 2019

Self-driving cars are expected to have a revolutionary impact on multiple industries fast-tracking the next wave of technological advancement. Research in autonomous navigation was done from as early as the 1900s with the first concept of the automated vehicle exhibited by General Motors in 1939. However, most techniques used by early researchers proved to be less effective or costly. In recent times, with cutting edge developments in artificial intelligence, sensor technologies, and cognitive science, researchers have got a step closer to realizing a practical implementation of a self-driving agent. New design approaches involving neural networks, multiple sensors like cameras and Light Detection And Ranging (LiDAR), computer vision, and other techniques are extensively being researched upon and tested by several companies like Google, Uber, and Lyft as well as top universities like MIT and the University of Toronto.

Although these techniques create an efficient system, the end product can turn out to be expensive. If a system using only regular, inexpensive cameras managed to yield superhuman performance, the cost of commercial autonomous driving systems and the cost of further research could be reduced. A self-driving car as a whole consists of numerous subsystems that work together to achieve seamless autonomous navigation. An essential part of driving a vehicle is to steer in the right direction. Computers have been used to estimate the steering angle for long, but these early techniques were based on multiple steps such as lane line analysis.

A commercially successful self-driving car is expected to pave the way to higher speed limits, smoother rides, reduced traffic collision, associated costs, and increased roadway capacity.

Related Work

One of the earliest reported use of a neural network for autonomous navigation comes from the research conducted by Pomerleau in 1989 who built the Autonomous Land Vehicle in a Neural Network (ALVINN) system. It was a straightforward architecture consisting of fully connected layers. The network predicted actions from pixel inputs applied to simple driving scenarios with few obstacles. It did succeed in easy situations, and that was it. However, this research showcased the untapped potential of neural networks for autonomous navigation.

In 2016, NVIDIA released a paper regarding a similar idea that benefited from ALVINN. In the paper, the authors used a CNN architecture to extract features from the driving frames. The network was trained using augmented data, which was found to improve the model’s performance. Shifted and rotated images were generated from the training set with corresponding modified steering angles. This approach was found to work well in simple real-world scenarios, such as highway lane-following and driving in flat, obstacle-free courses. Several research efforts have been undertaken to build more complex perception-action models to tackle the myriad of environments, and unpredictable situations usually encountered in urban environments.

One proposed approach was to train a model on very large scale driving video data and perform transfer learning. This model worked decently but was limited to only certain functionalities and was prone to failures when exposed to newer scenarios.

Another line of work aims to consider autonomous navigation as equivalent to predicting the next frame of a video. Comma.ai has proposed to learn a driving simulator with an approach that combines a Variational Auto-encoder (VAE) and a Generative Adversarial Network (GAN). Their plan was able to keep predicting realistic looking video for several frames based on previous frames despite the transition model optimized without a cost function in the pixel space. This method is a special case of the more general task of video prediction. There are examples of video prediction models being applied to driving scenarios. However, in many scenarios, video prediction is ill-constrained as preceding actions are not given as input the model address this by conditioning the prediction on the model’s previous actions.

In many older approaches, machine learning algorithms were applied on data acquired by various sensors to solve it like a classification problem and were generally trained and tested on datasets of limited size.

Some examples stated in research literature include:

Using neural networks on event data to recognize cards of a deck (4 classes), faces (7 classes) or characters (36 classes).
Training a network to identify three types of gestures (rocks, papers, scissors) in dynamic scenes.
Estimation problems in which the unknown variable is continuous was generally tackled by discretization, i.e., the solution space was divided into a finite number of classes, converting it into a classification problem. For example, in predator-prey robots, a network trained on the combined input of events and grayscale frames from a Dynamic and Active-pixel Vision Sensor (DAVIS) to produce one of four outputs: the prey is on the left, center, or right of the predator’s field of view (FOV), or it is not visible in the FOV.
Another example is that of the optical flow estimation method where the network produced motion vectors from a set with eight different directions and eight different speeds (i.e., 64 classes).

Steering a car is a continuous process with immediate consequences. Treating it as a classification task is not possible nor accurate. Instead, we consider it as a regression problem. A steering controller ideally must generate the corresponding steering angle the car should have at every instant of time. These angles can take any value, not just discrete ones, in the steering range.

An interesting technique to consider in this scenario is Imitation learning, which has previously been used in a variety of situations including articulated motion, autonomous ﬂight, modeling navigational behavior, off-road driving, and road following. The underlying idea behind all these applications is the same. The difference lies in the input parameters, the control signals predicted, the neural architecture, and the learned representations. In the end, it all boils down to feeding a neural network with data showcasing data on how to perform a particular action and expecting the system to pick up a representation of its own and imitate from the training data to accomplish the same response (and hence called Imitation Learning). In the case of self-driving cars, it would be to steer the vehicle given the input vision at every instant of time by learning from an actual car driving in reality.

Another approach of interest is deep reinforcement learning (RL). RL has been in the limelight only recently concerning autonomous navigation with its success being demonstrated by learning of games like Atari and Go by Google DeepMind.

In this method, models are trained using a predefined reward function such that it can learn and correct itself from past experiences. In these works, the main aim is to learn purely from experience and discover hierarchical structure automatically. This is hard, time intensive, and is, in general, an open problem, particularly for sensorimotor skills with the complexity found in self-driving cars.

In contrast, in imitation learning, we provide additional information on the expert’s intentions during the demonstration. This formulation makes the learning problem more tractable and produces a human-controllable policy.

In another line of classification in a much broader sense, design approaches differ in their level of modularity as follows:

1. Highly Tuned Systems: These are generally based on computer vision algorithms, some hard-coded rules, etc to create a model that can be used for planning and control.

2. End to End System: Here, models are trained to map the inputs from the sensors to control commands. In this process, the controller is provided with commands that specify the driver’s intent, along with the sensory information during the training phase. This technically comes under imitation learning.

Concurrent methods in the industry have used neural network predictions for a variety of tasks such as object detection, lane segmentation as inputs to a rule-based control system.

Having learned about the overview of various design approaches. here are some insights into what a self-driving car actually consists of.

Architecture Of A Self Driving Car

As mentioned, a self-driving car comprises several sub-modules integrated. Here, we explore the basic system design to understand how a modern self-driving car functions.

Key Physical Components (Generalised)

- Cameras — Provide real-time obstacle detection to facilitate lane departure and track roadway information (like road signs).

- Radar — Radio waves detect short & long-range depth.

- LIDAR — Measures distance by illuminating the target with pulsed laser light and measuring reflected pulses with sensors to create a 3-D map of the area.

- GPS — Triangulates position of the car using satellites. The current GPS technology is limited to a certain distance.

- Ultrasonic Sensors — Uses high-frequency sound waves and bounce-back to calculate distance. Best in close range.

- Central Computer — “Brain” of the vehicle. Receives information from various components and helps direct vehicle overall.

- Receiver/Antenna — Communications device permitting the vehicle to communicate with other vehicles (V2V) using DSRC, a wireless communication standard that enables reliable data transmission in active safety applications.

Apart from these physical components, the system can be divided into a perception system and a decision-making system. In general, a perception system helps the autonomous agent to accurately collect data from its surrounding environment to understand and map the same while the decision-making system takes care of the actual driving of the car based on all the information made available by the perception system.

Requirements And Engineering Challenges

An autonomous car is expected to have at least human level of environmental awareness if not more. This is generally achieved using multiple sensors like cameras and LiDAR. Apart from that, it should have a robust mechanism for navigation, path planning, and maneuver control. Current engineering design challenges include the need for more reliable algorithms, better object detection efficiency, sensor efficiency, safety and reliability, computational resources, and security.

Autonomy in self-driving cars is generally measured in standardized levels first framed by SAE International (Society of Automotive Engineers) in 2014. These are as follows:

Modern Design Approaches

We have previously seen different design ideas based on the deep learning algorithms used and in terms of how the driving agent is trained. Here, we discuss another line of thought to design in terms of how the perception and the decision-making system are made.

In one method called mediated perception, the current environment is unknown, and the perception-related components are used to recognize critical driving-related features such as lane, road, crossing points, pedestrians, and so forth. After detecting and tracking various objects in the environment, driving decisions are taken.

In the other category called, behavior reflex mode, a neural network is used to train the system based on human behavior, which is carefully observed and learned to take decisions for autonomous driving. This comes under imitation learning that was discussed before with the autonomous agent trying to imitate a human driving a car.

Direct perception method uses CNNs where they define key perception indicators. The system learns the mapping from an acquired image to several affordances related to driving actions such as current steering angle, adjustment with the lane, and staying within the lane. On the other hand, the most common technique for sensor management, called sensor fusion, is used where data is intelligently collected from multiple sensors to aid the decision support system.

Individual submodules can use neural networks to achieve their functionality as well. There is a proposed deep learning mechanism to detect obstacles on the road based on deep autoencoders and stereovision. The results obtained in this work show that the hybrid autoencoder-based solution yields a high accuracy on average on different datasets, and clearly outperforms the Deep Belief Network (DBN) and Stacked Auto-encoders (SDA).

Road detection is one of the vital requirement of the autonomous car and is usually realized through different sensors such as onboard camera and LIDAR. However, in behavioral cloning approach, the neural network is expected to extract the required features by itself without training it explicitly to detect roads.

End To End Learning

Here is a special account on End to End learning which I’ll demonstrate in a small project tutorial in a future article.

To start with. it is a type of deep learning process in which all of the parameters are trained jointly, rather than step by step. Here, the machine uses previously gained human input, to execute its task and hence is a supervised learning method. This method is particularly prevalent in the autonomous cars industry, as this process’s benefits fit perfectly with the car’s Convolutional neural networks (CNNs).

A convolutional layer is one of the basic building blocks of the end to end learning architecture we used. It is basically a layer which performs the convolution operation of the input (Image pixel values) and passes on the output to the next layer.

The end-to-end learning process can be separated into two major phases. The first is the training phase, where the machine records all of the parameters executed by the human operator (through Convolutional neural networks (CNNs)). Next is the Inference Phase, with the machine acting upon previously gained experience from the training phase of the end-to-end learning process.

As stated in computer science wiki, “The only difference between the end-to-end learning process and deep learning process is that the end-to-end learning process must collect all of the parameters jointly (at the same time), while deep learning process can collect the parameters ether jointly or step by step. Therefore, every end-to-end learning process is a deep learning process, but not every deep learning process is an end-to-end learning process.”

Software Requirements

There are several software tools and frameworks available to assist with the design, development, research, and testing of autonomous vehicles. Training models in simulation first, and then fine-tuning them with real-world data would substantially reduce the amount of data you will need for a fully trained model. It also cuts the cost and training time. Error correction and debugging are smooth and real-world testing is not feasible in all scenarios.

According to the RAND Corporation, for self-driving cars to be even 20 percent better than humans, it would require 11 billion miles of validation. Doing the basic math, this implies that it would take 500 years of non-stop driving by a fleet of 100 cars to cover this distance!

As stated in the FloydHub blog, “Simulators are a great solution to this problem. They are a safe way for developers to reliably test and validate the performance of self-driving hardware and software. In September 2018, NVIDIA opened up their DRIVE Constellation simulation platform for partners to integrate their world models, vehicle models and traffic scenarios.”

Some of the most popular simulation tools for self-driving cars include TORCS, Udacity self-driving car simulator, and CARLA Simulator.

Deep learning frameworks like Tensorflow and Keras could be used for the development, compilation, and training of your neural networks. Standard python libraries like Open-CV and Pillow can be used to implement various image processing techniques used for data augmentation and reshaping of training images.

Stay tuned for a future project tutorial/article on the design and testing of a deep learning based steering angle controller using one of the approaches discussed here.

References

(To learn more about the latest trends and research in this domain, you can refer to the following papers referred for this literature survey)

1. Hussain, R., & Zeadally, S. (2018). Autonomous cars: Research results, issues and future challenges. IEEE Communications Surveys & Tutorials.

2. Badue, C., Guidolini, R., Carneiro, R. V., Azevedo, P., Cardoso, V. B., Forechi, A., … & Oliveira-Santos, T. (2019). Self-Driving Cars: A Survey. arXiv preprint arXiv:1901.04407.

3. Heylen, J., Iven, S., De Brabandere, B., Oramas, J., Van Gool, L., & Tuytelaars, T. (2018, March). From Pixels to Actions: Learning to Drive a Car with Deep Neural Networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 606–615). IEEE.

4. Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., … & Zhang, X. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.

5. Stafylopatis, A., & Blekas, K. (1998). Autonomous vehicle navigation using evolutionary reinforcement learning. European Journal of Operational Research, 108(2), 306–318.

6. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. arXiv preprint arXiv:1711.03938.

7. Sallab, A. E., Abdou, M., Perot, E., & Yogamani, S. (2017). Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19), 70–76.

8. Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S. N., Rosaen, K., & Vasudevan, R. (2016). Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?. arXiv preprint arXiv:1610.01983.

9. Sharifzadeh, S., Chiotellis, I., Triebel, R., & Cremers, D. (2016). Learning to drive using inverse reinforcement learning and deep q-networks. arXiv preprint arXiv:1612.03653.

10. Faisal, A., Yigitcanlar, T., Kamruzzaman, M., & Currie, G. (2019). Understanding autonomous vehicles: A systematic literature review on capability, impact, planning and policy. Journal of Transport and Land Use, 12(1), 45–72.

11. Zhang, X., Gao, H., Guo, M., Li, G., Liu, Y., & Li, D. (2016). A study on key technologies of unmanned driving. CAAI Transactions on Intelligence Technology, 1(1), 4–13.

12. Du, S., Guo, H., & Simpson, A. (2017). Self-driving car steering angle prediction based on image recognition. Department of Computer Science, Stanford University, Tech. Rep. CS231–626.

13. Maqueda, A. I., Loquercio, A., Gallego, G., García, N., & Scaramuzza, D. (2018). Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5419–5427).

14. Santana, E., & Hotz, G. (2016). Learning a driving simulator. arXiv preprint arXiv:1608.01230.

15. Rosenzweig, J., & Bartl, M. (2015). A review and analysis of literature on autonomous driving. E-Journal Making-of Innovation.

Note: This article was first published on LinkedIn here