AUTOPILOT: An Advanced Perception, Localization and Path Planning Techniques for Autonomous Vehicles using YOLOv7 and MiDaS

5 min readMar 2, 2024

Introduction:

Self-driving cars are reshaping the future of transportation, and our research project explores a game-changing integration of object detection YOLOv7 (You Only Look Once) and depth sensing using monocular camera MiDaS technologies. This blog dives into the key components of object detection, depth sensing, and path planning, showcasing a promising approach that aims to propel self-driving car technology to new heights.

Demystifying the Essentials: A Look Under the Hood

Just like a human driver relies on their senses to navigate, self-driving cars depend on a complex interplay of components to function safely and efficiently. Let’s explore three crucial aspects:

1. Perception:

Object Detection: Seeing the Road Ahead

Object detection in self-driving cars replicates the human process of scanning the environment for objects, using sophisticated algorithms like YOLO. YOLO analyzes data from cameras mounted on the car, acting like a digital eye that recognizes and classifies objects in its surroundings. This is crucial for ensuring the car avoids collisions and navigates safely.

Depth Sensing: Understanding the Distance

Judging the distance of objects on the road is essential for safe driving. This is where MiDaS comes in. It utilizes monocular camera data to estimate the depth of objects in the scene, it is easier said than done. This allows a better and more robust option which is further utilized by the Localization module of the network.

2. Localization:

The original perception architecture gives us the image space predictions, but the catch is we cannot directly drive into image space, we somehow need a mechanism to transform this image space into so called vector space, i.e. creating a perception of the surroundings using manually calibrated perspective transform algorithm. This allows the car to understand the spatial relationships between itself and other objects, its position relative to the environment thereby enabling safe and informed maneuvers optimized using path planning algorithms.

3. Path Planning: Charting the Course

The information of vector space directly flows into the path planning module. Once the car “sees’’ its surroundings and understands the distance of objects, it needs to determine the best way to reach its destination. This is where path planning comes into play. Algorithms analyze data from related sources, including bird’s eye view vector space and map information(if present, in our case it’s just local path planning with no memory dependency), to calculate the optimal route that considers factors like traffic flow, speed limits, and potential obstacles.

Performance Highlights and Considerations

Our research utilizing YOLOv7 and MiDaS demonstrates promising results. The object detection system achieves an impressive average precision of 89%, meaning it identifies objects correctly 89% of the time. Additionally, it boasts a recall of 92%, indicating it rarely misses crucial objects. The real-time depth sensing operates at a commendable 74 FPS to 80 FPS on the Nvidia RTX3060, showcasing its efficiency in handling dynamic driving scenarios.

However, acknowledging challenges and considerations for further development is crucial:

Data Dependence: Currently, object detection models like YOLO rely heavily on labeled training data. This means they need to be shown numerous examples of objects to learn effective recognition. This can pose limitations in scenarios where the car encounters entirely novel objects not present in the training data.
Monocular Depth Estimation: MiDaS uses only a single front-facing camera for depth estimation, which can introduce limitations compared to systems utilizing multiple cameras or LiDARs (Light Detection and Ranging) sensors. While it performs well in many scenarios, complex environments or occlusions might affect its accuracy.

But the most important of all is to utilize the things we currently have and optimize its use case and build robust systems given constraints.
In my personal opinion, the Training pipeline can be modified to make use and train models on that particular data in which it had failed or not confident setting manual threshold and this particular pipeline should be mostly autonomous with slight or no dependency on annotators or researchers.

Charting the Course: Future Research Opportunities for a Brighter Tomorrow

The journey towards fully autonomous vehicles requires continuous exploration and improvement. Here are some key areas for future research:

1. Temporal decision: The current system makes predictions and alters its course of path using just a single frame at a time, but we quickly realise that’s not how humans drive. We have to consider the temporal aspect of the world, it shares a significant amount of information which right now isn’t used by us. The initial research in this step would be to be able to use temporary occlusions, path planning over time period, object tracking and even future trajectory prediction of an object after a certain time.

2. Global Localization and path planning: Precise and reliable localization, even in challenging situations like GPS signal loss, is essential. Exploring techniques like Simultaneous Localization and Mapping (SLAM) could offer more robust positioning solutions.

3. Faster Processing: Real-time processing of vast amounts of sensor data demands efficient system architectures. Research efforts should focus on developing faster and more scalable processing systems to ensure timely decision-making and safe operation.

4. Adverse Weather Conditions: Rain, fog, and snow can hinder sensor performance and impact object detection and depth estimation accuracy. Research should explore weather-resistant sensors and algorithms or real-time weather data integration for improved performance in diverse conditions.

5. Infrastructure Integration: As self-driving cars become more prevalent, seamless integration with existing transportation infrastructure is critical. This involves establishing communication protocols and technologies that enable self-driving cars to interact safely and efficiently with traditional vehicles, pedestrians, and cyclists.

6. Ethical Considerations: As self-driving cars navigate complex situations on the road, ethical considerations arise. Defining clear guidelines for decision-making in unavoidable dilemmas becomes crucial.

Conclusion: A Vision for the Future

Our research marks a significant step towards a future where self-driving cars seamlessly navigate our roads. The fusion of YOLOv7 and MiDaS, coupled with innovative approaches, sets a basic benchmark for autonomous vehicle development. As technology continues to advance, addressing the challenges and exploring new avenues outlined above will shape the next era of self-driving cars. By prioritizing safety, ethics, and responsible development, we can ensure a future where self-driving technology improves transportation efficiency, accessibility, and most importantly, safety for all. Incorporating transformers enhances the self-driving system’s ability to understand complex patterns, make informed decisions, and adapt to changing conditions, contributing to safer and more efficient autonomous driving but that remains a topic to be discussed in future blog posts….

GitHub Repository: https://github.com/Harsh19012003/Autopilot

Research Paper: Devmurari Harshkumar, Gautham Kuckian, and Prajjwal Vishwakarma. “AUTOPILOT: An Advanced Perception, Localization and Path Planning Techniques for Autonomous Vehicles using YOLOv7 and MiDaS.”(ICACTA), IEEE, 2023

DOI: 10.1109/ICACTA58201.2023.10393218

Author:

Harshkumar Devmurari

Gauthamkuckian

Prajjwal Vishwakarma

AUTOPILOT: An Advanced Perception, Localization and Path Planning Techniques for Autonomous Vehicles using YOLOv7 and MiDaS

Written by Prajjwal Vishwakarma