Leveraging Large Language Models for Intelligent Driving Scenario

shi tianyu
AI4SM
Published in
5 min readSep 17, 2023

In the field of autonomous driving, where both human drivers and artificial intelligence collaborate in decision-making, has emerged as a vital area of study. In autonomous driving scenarios, the interaction between human drivers and AI systems presents some critical challenges concerning trust, comprehensibility, and teamwork.

An example of a non-signalized intersection is depicted in the following scenario, see Figure 1. On the left side, the gray car appears unaware of the social aspect of driving and fails to yield the right of way to the green car. Therefore, both cars experience a longer standstill, leading to inefficiency in traffic flow. Conversely, on the right side, the driving interaction unfolds seamlessly as the gray car detects the green car’s cooperative social cue. In response, the gray car accelerates to mirror the green car’s willingness to give the right of way, resulting in a smoother and more efficient driving experience for both vehicles.

Figure 1: Example of a non-signalized intersection scenario

In the second example, see Figure 2, we observe a highway exit scenario. On the left, there is a gray car traveling at 20m/s, intending to exit immediately. The adjacent lane’s vehicles are moving at a higher speed of 30m/s. If the gray car is unaware of the social convention that suggests matching the speed of vehicles on the highway to ensure a smooth merge, it might enter the highway at a significantly slower speed, disrupting the traffic flow and potentially leading to a collision with the green car. On the right, in the socially aware case, the gray car demonstrates awareness of the social convention and decides to accelerate, allowing it to seamlessly merge into the adjacent lane and smoothly exit the highway. By recognizing and adhering to the social conventions of matching speeds during merging, the gray car avoids potential collisions and ensures a safe and efficient driving experience.

Figure 2: Example of highway exit scenario

LLMs, exemplified by models like GPT-4, LLaMA, and PALM-E, have demonstrated remarkable capabilities in reasoning and common sense across various domains. These models leverage Reinforcement Learning from Human Feedback (RLHF) to fine-tune their behavior based on user intent, showing significant advancements in aligning AI behavior with human expectations.

In this project, we aim to explore the potential of using LLMs to enhance mixed autonomy systems’ optimization and performance. We hypothesize that LLMs can assist in various aspects of mixed autonomy, such as reward design, explainability, and ethical decision-making. By leveraging the language understanding and reasoning capabilities of LLMs, we envision improving human-AI collaboration in intelligent driving scenarios, leading to more trustworthy and socially acceptable AI behavior.

The main potential research objectives of this research are below:

(1) To investigate how LLMs can be utilized to generate and shape reward functions for RL agents, facilitating a more efficient learning process and guiding the agents’ behavior towards desired outcomes.

(3) To examine what behaviors the LLM-powered driving agent will learn and how the performance compares with other well-established driving models.

(3) To enhance the interpretability of RL agents’ decisions by leveraging LLMs to provide human-readable explanations for their actions. This interpretability is crucial for building trust and facilitating cooperation between human drivers and AI-controlled vehicles in traffic scenarios.

(4) To explore the potential applications of LLM agents in a multi-agent mixed autonomy setting, where numerous autonomous driving vehicles collaborate to alleviate traffic congestion.

The method is proposed below, see Figure 3.

Figure 3: Potential framework

The proposed potential method aims to enhance the performance of autonomous vehicle driving scenarios by leveraging Large Language Models (LLMs). The process involves transferring the traffic environment’s input observation into a human-understandable format, creating an internal state (e.g., “Your speed is 25 m/s. You can accelerate or conduct a lane change”) and an external state (e.g., “You see a Faster vehicle on your right with speed 30m/s. You are 100 meters to the exit lane”). These states are then used to guide the generation of the vehicle’s goals using a defined LLM prompt, which includes instructions like “Drive safely and efficiently, following all traffic rules and regulations, and be friendly to other vehicles. What should you do next? The LLM generates corresponding goals, such as “Accelerate” or “Find enough space to merge to the exit.” On the other hand, the reflection module in the Large Language Model (LLM) serves as an additional component that enables the LLM to provide human-readable explanations for its decision-making process. When the LLM is queried with questions about why it made a particular decision based on given observations, the reflection module leverages internal states and reasoning mechanisms to generate coherent and interpretable responses. This module aims to enhance the transparency and trustworthiness of the LLM’s decisions, especially in complex tasks like autonomous driving. By offering explanations for critical decisions, such as lane changes, merging, or speed adjustments, the reflection module helps users, developers, and regulators gain insights into the factors influencing the LLM’s choices.

Related work:

[1] Du, Y., Watkins, O., Wang, Z., Colas, C., Darrell, T., Abbeel, P., … & Andreas, J. (2023). Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692.

[2] Hu, H., & Sadigh, D. (2023). Language-instructed reinforcement learning for human-ai coordination. arXiv preprint arXiv:2304.07297.

[3] Huang, W., Wang, C., Zhang, R., Li, Y., Wu, J., & Fei-Fei, L. (2023). VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models. arXiv preprint arXiv:2307.05973.

[4] Liu, Z., Bahety, A., & Song, S. (2023). REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction. arXiv preprint arXiv:2306.15724.

[5] Bousmalis, K., Vezzani, G., Rao, D., Devin, C., Lee, A. X., Bauza, M., … & Heess, N. (2023). RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation. arXiv preprint arXiv:2306.11706.

[6] Fu, D., Li, X., Wen, L., Dou, M., Cai, P., Shi, B., & Qiao, Y. (2023). Drive Like a Human: Rethinking Autonomous Driving with Large Language Models. arXiv preprint arXiv:2307.07162.

--

--