Distracted Driver Detection using CNN

3 min readFeb 14, 2019

This project aims to recognize unsafe behaviour and send real time feedback to the driver using short sound alert, vibration in the seat or steering wheel, etc. The system can be used to classify the behaviour of a driver via live video feed or a pre-recorded video.
The system uses Convolutional Neural Network for extracting features in the training phase of the warning system. The dataset for training has been
divided into 10 classes, including 9 distracted and 1 safe driving class. The frames from the video feed are classified into one of these classes. Based on the results from a set of previous frames, the driver is alerted using a sound alert.

You can check whole code here

Dataset

In April 2016, State Farm’s distracted driver detection competition on Kaggle defined ten postures to be detected (Safe driving + 9 distracted behaviours). This was the first dataset to consider wide variety of distractions and was publicly available. They released their dataset of 2D dashboard camera images for a Kaggle challenge. The dataset had 22400 training images and 79727 testing images. Resolution was 640 x 480 pixels. The training images had corresponding labels attached. Labels belonged to one of the ten classes as mentioned below:
- c0: normal driving
- c1: texting — right
- c2: talking on the phone — right
- c3: texting — left
- c4: talking on the phone — left
- c5: operating the radio
- c6: drinking
- c7: reaching behind
- c8: hair and makeup
- c9: talking to passenger

Data Processing

Given images are of size 640 x 480 pixels so processing is too lengthy. So convert these into size 256 x 256 pixels and save into numpy array.

CNN Model

Input Layer : 256 x 256 pixels images ( Color Images )
Convolutional Layer : 5 different layers of different filter size with activation function “relu”
Pooling : Max Pooling
Fully Connected Layer : 2 layers with activation function “relu” (1024 and 512 neurons)
Output Layer: fully connected layer with softmax activation function
Dropout: 60 % dropout is used
Optimizer : Adam Optimizer
Loss Function : Log loss = − ( y log (p) + (1 − y) log (1−p) )

Output

Output the class of the corresponding image.

The Convolutional Neural Network provides great accuracy in identifying the images however the human actions take more than a single frame to happen. In the implementation if out of every 10 frames more than 4 frames are showing positive distraction then an alert is generated.