Fatigue detection and driver distraction detection
In this blog, the work done during Google Summer of Code (GSoC) 2024 with HumanAI is described.
PROJECT OVERVIEW
The code for this project can be found here.
Drowsy & Distracted Driving: A Looming Threat
Distracted and drowsy driving are major threats on the road, contributing significantly to global road accidents. In the United States alone, distracted driving caused a staggering 8% of fatal crashes, 14% of injury crashes, and 13% of all reported crashes in 2021. Drowsy driving is even more concerning, estimated to cause roughly 100,000 accidents annually according to the National Highway Traffic Safety Administration (NHTSA). While phone use is a concern (at 5%), fatigue poses a much greater risk (17%). [1,2,3]
Beyond Phones: The Three Faces of Distraction
While cell phone use is a common culprit, distracted driving encompasses a broader spectrum. The Center for Disease Control and Prevention (CDC) categorises it into three main types [4]:
- Visual distraction: Taking your eyes off the road (GPS, billboards, even passengers).
- Manual distraction: Removing hands from the wheel (eating, adjusting controls, reaching for objects).
- Cognitive distraction: Taking your mind off driving (singing, conversations).
The Underestimated Threat
The true impact of distracted driving might be significantly underestimated. Verifying distraction after a crash is challenging, and drivers may not admit it. Crash reports suggest only 5% of U.S. fatal crashes in 2021 involved distracted driving, but studies suggest a much higher number. Research using real-world driving data shows nearly 90% of crashes causing injury or property damage involve driver error, impairment, fatigue, or distraction. Additionally, studies have linked distracted driving to the severity of crashes.
This project specifically addresses drowsy driving, targeting professions like surgical residents and truck drivers who often face gruelling schedules. Here a camera is used to detect signs of drowsiness in the driver. It also estimates their mental workload. By analysing these factors, the system can warn drivers who are too tired or distracted to continue driving safely.
Choice of Dataset for fatigue & distraction detection
For this project the Driver Monitoring Dataset (DMD) is chosen because is a comprehensive resource for studying driver behaviour and distractions. It features synchronised footage from multiple cameras (body, face, hands) and various streams (RGB, Depth, IR) captured in both real-world driving scenarios and driving simulators. The dataset includes videos showcasing a range of distraction-related activities, such as:
- Applying Hair and Makeup
- Talking on the Phone (Left and Right)
- Texting (Left and Right)
- Drowsy Behaviour
An example of the distractions mentioned above is shown in the following image
Fatigue Detection
Mediapipe for Facial Landmark Detection
MediaPipe is an open-source framework developed by Google for building multimodal (video, audio, and sensor data) machine learning applications. It provides a collection of customizable and efficient tools for various tasks in computer vision and audio processing. It offers a variety of pre-trained models for tasks like face detection, object detection, pose estimation, and hand tracking, making it easier to implement complex applications without starting from scratch. MediaPipe is optimised for real-time performance, making it suitable for applications such as augmented reality, video conferencing or edge deployment.
The MediaPipe Face Landmarker task enables the detection of facial landmarks and expressions in both images and videos. Utilizing advanced machine learning models, it can process single images or continuous video streams. The output includes 3D face landmarks, blendshape scores (coefficients that represent various facial expressions), and transformation matrices, which facilitate the rendering of visual effects in real-time. An example is as shown below
Eye Aspect Ratio (EAR)
The Eye Aspect Ratio (EAR) is a metric used to assess whether a person’s eyes are open or closed,It is defined as the ratio of the vertical distance between the eye’s upper and lower eyelids to the horizontal distance between the eyes’ inner and outer corners.
The formula used to calculate eye aspect ratio is given as follows
Drowsiness Detection
The proposed drowsiness detection system follows a two-step approach. First, it employs the MediaPipe facial landmark detection model to accurately capture facial landmark coordinates. Next, it extracts eye landmarks to compute the Eye Aspect Ratio (EAR). By analyzing EAR values over time, the system classifies the driver’s state as either drowsy or alert. Specifically, if the EAR is greater than 5, the individual is classified as drowsy; otherwise, they are deemed alert.
Distracted Behaviour Detection
Dataset creation for the distraction detection
The selected dataset does not contain bounding box annotations for instances of distracted behavior. To address this, we employ an artificial annotation approach to enhance the dataset.
Utilising the CLIP Model for Distraction Classification
The CLIP (Contrastive Language–Image Pretraining) model, developed by OpenAI, is a powerful neural network designed for understanding and connecting visual and textual information. CLIP simultaneously processes images and text, enabling it to learn associations between visual elements and their corresponding textual descriptions. This multimodal capability allows for versatile applications. The model uses a contrastive learning approach, training on a diverse dataset of images paired with captions. By maximizing the similarity between corresponding image-text pairs and minimizing it for non-matching pairs, CLIP learns rich representations.
One of CLIP’s most notable features is its ability to perform zero-shot classification, allowing it to categorize images into classes it has not encountered during training. This capability relies entirely on the textual descriptions provided during inference. An example of applying the CLIP model to the dataset is illustrated below.
These frames are saved separately and then a yolo model (yolov10 in this case) is used to get the bounding boxes for the frames.
YOLOv10 for Distraction detection
YOLOv10 is a state-of-the-art object detection model that builds upon the YOLO architecture, known for its speed and accuracy in real-time object detection tasks. YOLOv10 is designed for high-speed processing, making it suitable for applications requiring real-time object detection, such as video surveillance and autonomous driving. With enhancements in architecture and training techniques, YOLOv10 achieves better accuracy in detecting objects across various sizes and types compared to its predecessors.
Deployment on Jetson Orin Nano
The NVIDIA Jetson Orin Nano is a powerful AI computing platform designed for edge devices, particularly suited for robotics, drones, smart cameras, and other embedded applications.
The NVIDIA Jetson Orin Nano 8GB Developer Kit is utilized for deploying the MediaPipe and YOLOv10 models. The device delivers a frame rate of approximately 11–15 fps when processing video at a resolution of 1280 x 780. The results are presented below.
Future Scope
Yawning, a common indicator of drowsiness, can also be used for detecting drowsiness. Current model relies on frontal-facing camera data. To enhance robustness, images captured from various angles and under different lighting conditions should be incorporated into the training dataset. Additionally, the current model is limited to classifying a predefined set of dangerous driving behaviors. This can hinder its performance when encountering unfamiliar driving situations. A contrastive learning approach could be a promising solution to address this limitation.
Acknowledgement
This project is supported by the Google Summer of Code program and HumanAI. I would like to express my gratitude to my mentor, Piyush Pawar, for his invaluable guidance throughout the project. I also want to thank Dr. Sergie Gleyzer and Andrea Underhill for their insightful suggestions during the project review.
References
[1] National Center for Statistics and Analysis, “Distracted Driving in 2021,” National Highway Traffic Safety Administration, Research Note DOT HS 813 443, May 2023.
[2] “Global status report on road safety 2023,” World Health Organization, Geneva, Safety and Mobility (SAM), Social Determinants of Health (SDH) 81, 2023, licence: CC BY-NC-SA 3.0 IGO.
[3] National Highway Traffic Safety Administration, “Overview of the National Highway Traffic Safety Administration’s Driver Distraction Program,” 2010, U.S. Department of Transportation, Washington, DC. 8 February 2022