End-to-End Autonomous Driving using Deep Learning

5 min readOct 9, 2023

Comparison between End-to-End and modular pipelines. End-to-End is a single pipeline that generates the control signal directly from perception input, whereas a modular pipeline consists of various sub-modules, each with task-specific functionalities.

Autonomous driving refers to the capability of a vehicle to drive partly or entirely without human intervention. The modular architecture is a widely used approach in autonomous driving systems, which divides the driving pipeline into discrete sub-tasks. This architecture relies on individual sensors and algorithms to process data and generate control outputs. It encompasses interconnected modules, including perception, planning, and control. However, the modular architecture has certain drawbacks that impede further advancements in autonomous driving (AD). One significant limitation is its susceptibility to error propagation. For instance, errors in the perception module of a self-driving vehicle, such as misclassification, can propagate to subsequent planning and control modules, potentially leading to unsafe behaviors. Additionally, the complexity of managing interconnected modules and the computational inefficiency of processing data at each stage pose additional challenges associated with the modular approach. To address these shortcomings, an alternative approach called End-to-End driving has emerged. This approach aims to overcome the limitations of the modular architecture.

END-TO-END SYSTEM generates ego-motion from the sensory input. It optimizes the driving pipeline (Fig) by bypassing the sub-tasks related to perception and planning, allowing for continuous learning to sense and act, similar to humans.

ARCHITECTURE : End-to-End driving generates ego-motion based on sensory input, which can be of various modalities. However, the prominent ones are the camera , Light Detection and Ranging (LiDAR) navigation commands and vehicle dynamics, such as speed . This sensory information is utilized as the input to the backbone model, which is responsible for generating control signals. Ego-motion can involve different types of motions, such as acceleration, turning, steering, and pedaling. Additionally, many models also output additional information, such as a cost map for safe maneuvers, interpretable outputs, and other auxiliary outputs. There are two main approaches for End-to-End driving: either the driving model is explored and improved via Reinforcement Learning (RL) , or it is trained in a supervised manner using Imitation Learning (IL) to resemble human driving behavior. The supervised learning paradigm aims to learn the driving style from expert demonstrations, which serve as training examples for the model. However, expanding an autonomous driving system based on IL is challenging since it is impossible to cover every instance during the learning phase. On the other hand, RL works by maximizing cumulative rewards over time through interaction with the environment, and the network makes driving decisions to obtain rewards or penalties based on its actions. While RL model training occurs online and allows exploration of the environment during training, it is less effective in utilizing data compared to imitation learning.

LEARNING APPROACHES:

A) Imitation learning: Imitation learning (IL) is based on the principle of learning from expert demonstrations. These demonstrations train the system to mimic the expert’s behavior in various driving scenarios. Large-scale expert driving datasets are readily available, which can be leveraged by imitation learning to train models that perform at human-like standards.

The main objective is to train a policy 𝜋𝜃 (𝑠) that maps each given state to a corresponding action as closely as possible to the given expert policy 𝜋 ∗ , given an expert dataset with state action pair (𝑠, 𝑎). where 𝑃 (𝑠 ∣ 𝜃) represents the state distribution of the trained policy 𝜋𝜃 .

B) Reinforcement learning: Reinforcement Learning (RL) aims to maximize cumulative rewards over time by interacting with the environment, and the network makes driving decisions to obtain rewards or penalties based on its actions. IL cannot handle novel situations significantly different from the training dataset, RL is robust to this issue as it explores scenario under given environment. Recently, Human-In-The-Loop (HITL) approaches have gained attention . These approaches are based on the premise that expert demonstrations provide valuable guidance for achieving high-reward policies. Several studies have focused on incorporating human expertise into the training process of traditional RL or IL paradigms. One such example is EGPO , which aims to develop an expert-guided policy optimization technique where an expert policy supervises the learning agent.

SAFETY: Ensuring safety in End-to-End autonomous driving systems is a complex challenge. While these systems offer high-performance potential, several considerations and approaches are essential for maintaining safety throughout the pipeline. First, training the system with diverse and high-quality data that covers a wide range of scenarios, including rare and critical situations. Like training on critical scenarios helps the system learn robust and safe behaviors and prepares it for environmental conditions and potential hazards. These scenarios include unprotected turnings at intersections, pedestrians emerging from occluded regions, aggressive lane changing, and other safety heuristics.

Integrating safety constraints and rules into the End-to-End system is another vital aspect. The system can prioritize safe behavior by incorporating safety considerations during learning or post-processing system outputs. . Safety constraints include a safety cost function, avoiding unsafe maneuvers, and collision avoidance strategies.

EXPLAINIBILITY: Explainability refers to the ability to understand the logic of an agent and is focused on how a user interprets the relationships between the input and output of a model. In the context of explainability for end-to-end autonomous driving systems, we can categorize explanation approaches into two main types (Fig): local explanations and global explanations. A local explanation aims to describe the rationale behind the predictions of the model. On the other hand, global explanations aim to comprehensively comprehend the model’s behavior by describing the underlying knowledge.

Over the past few years, there has been significant interest in End-to-End autonomous driving due to the simplicity of its design compared to conventional modular autonomous driving. Despite the impressive performance of End-to-End approaches, there is a need for continued exploration and improvement in safety and interpretability to achieve broader technology acceptance.

End-to-End Autonomous Driving using Deep Learning

Written by Pranavs Chib