Blink Detection using Python

Neha Chaudhari

Published in

AlgoAsylum

9 min readJul 12, 2020

Written By Neha Chaudhari, Kanchan Sarolkar, Kimaya Badhe, Samruddhi Kanhed, Shrirang Karandikar.

Introduction to Eye-tracking

Our eyes are one of the primary sensing tools we use to learn and react to the environment. Eye movements can tell us a lot about human behavior. The field of measuring human eye movements is called eye-tracking. It has various applications in gaming, medical diagnostics, market research, and psychology.

In this article, we explore blinking, which is one of the easiest eye movements to detect and has numerous applications in the fields of medical diagnostics and human computer interaction. Blinking can be involuntary or voluntary.

Involuntary blinking is used as criteria to diagnose medical conditions. For example, if a person blinks excessively it may indicate the onset of Tourette syndrome, strokes, or disorders of the nervous system. A reduced rate of blinking is associated with Parkinson’s disease.

Voluntary blinking can be used as a means of communication. The interface to a computing device has traditionally been a keyboard or a mouse, and more recently, a touchscreen. If blinking can be detected, it can serve as an additional interface, with appropriate responses for a specified action.

There have been many approaches to detecting blinks. In this article, we show you how to detect blinks using a commodity webcam and python and a simplified algorithm. Ready to get started? Let’s look at the prerequisites.

Prerequisites

Install python

Python version 3.5 is recommended for compatibility with dlib and OpenCV.

Install OpenCV

OpenCV is a library of programming functions mainly aimed at real-time computer vision. We recommend version 3.3 for better compatibility with dlib. Here, OpenCV is used to capture frames by accessing the webcam in real time.

Install dlib

dlib is an open-source library used for face detection. We recommend installing dlib version 19.4 for better compatibility with OpenCV. Given a face, dlib can extract features from the face like eyes, nose, lips, and jaw using facial landmarks. The required facial landmark file can be downloaded from the following link. This file should be placed in the folder which has the rest of the code.

Implementation

Let’s focus on the implementation now. Our approach in this presentation is to build up the implementation incrementally. We present a series of steps that are individually easy to understand, describing, and adding functionality as we proceed.

Steps for the blink detection algorithm.

Step 1: Using OpenCV to load or capture video

In the above code, we use the VideoCapture module in OpenCV to capture a live video. We create a VideoCapture object using the constructor provided in the module. The argument to the constructor can be either a device index or the path of the video file. Device index is a number used to identify the webcam and, in most cases, the value is 0 The object of the VideoCapture module enables us to capture frame-by-frame video data. cap.read() returns the frame as well as a boolean value indicating if the frame is captured. We exit the application if the frame is not captured successfully. In the case of video input, it indicates that the video is over.

We stop capturing the frames once the escape key is pressed (27 is the key code for the escape key). We use cv2.waitKey()to detect key press. Once the key is detected we break the while loop to stop capturing new frames and then release the VideoCapture object.

Step 2: Converting to the frames to grayscale

The frame we have captured is a 3-channel RGB colored image. We detect face and eyes in the frame. dlib’s face detection works perfectly fine on grayscale images as well as colored images. As grayscale image is a single channel image, we convert the frame to grayscale to reduce to the processing time required by further steps of the algorithm. Click here for an in-depth explanation.

We use cv2.cvtColor(frame, flag) for color conversion. Here we use the flag cv2.COLOR_BGR2GRAY to convert the colored image to grayscale.

Step 3: Face Detection using dlib

We are now going to use dlib’s default face detector to detect all the faces in the image. This face detection model is based on HoG and SVM. We first load the detector using the get_frontal_face_detector(). Then we pass the image in the detector as the first argument. The second argument is the number of times we want to upscale the image. The third argument is the value of the threshold. In our application, we have used the default values for these arguments. This detector returns a list of dlib.rectangle objects which are the bounding boxes of the faces detected in the frame.

Step 4: Getting facial landmarks using dlib.

Well, to get started with blink detection we need to detect the eye first. We do this by using a pre-trained model which gives us 68 facial landmarks. We then map these landmarks on the face we detected in the previous step. We implement this in three parts. First, we load the contents of the pre-trained model in an object. Then we use this object to map the landmarks to the face detected in the frame and finally, we extract the Cartesian coordinates of these landmarks.

Part 1: Load the shape predictor

Here dlib.shape_predictor is a constructor used to load the contents of the shape_predictor_68_face_landmarks.dat in the variable predictor. left_eye_landmarks and right_eye_landmarks are the lists of indexes of landmarks required to detect the eyes. They are highlighted in the figure above.

Part 2: Map the facial landmarks

Now we call the shape predictor object and pass the frame and the face bounding box as the arguments. The predictor will give the location i.e. the x, y coordinates of the facial landmarks in the frame.

Part 3: Extract Cartesian coordinates

Here we extract the x and y coordinates from thedlib.full_object_detection object. The part function will return a dlib.point object of the specified landmark index. point is a tuple of the x and y coordinates of the landmark in the frame.

Step 5: Getting to know the Blink Ratio

Now that we have detected eyes, let’s focus on detecting blinks. We are going to use the horizontal length to vertical length ratio of the imaginary bounding box of the eye. This is the blink ratio.

Blinking is the movement of eyelids. In our application, we quantify the movement of eyelids by using the vertical length. Hence, the vertical length reduces when the subject closes their eyes. As opposed to this, blinking does not affect the horizontal length. So, we assume it to be constant throughout the run-time. When we calculate the ratio of horizontal length to vertical length, the ratio increases if the vertical length reduces. So, we conclude that if the blink ratio crosses a certain threshold the subject must have blinked. This how we detect blinking.

We get the horizontal length by calculating the Euclidean distance between landmarks 36 and 39 for the left eye and landmarks 42 and 45 for the right eye. We calculate the vertical length by first calculating the midpoints of landmarks 37, 38, and 40,41 and then calculate the Euclidean distance between the midpoints for the left eye. Similarly, for the right eye, we calculate the midpoint of landmarks 43,44 and 46,47 and then calculate the Euclidean distance between the midpoints. We then calculate the ratio of the horizontal length to the vertical length.

An interesting question that you may think of is ‘Why don’t we use just the vertical length to detect blinking? Why do we even need a ratio?’ Let’s say if the person is farther from the screen the size of the eye is smaller compared to when they are closer. This means that the vertical length is dependent on the position of the head in the frame and not solely on the movement of the eye. If we use only vertical length instead of the blink ratio, the value of the threshold that indicates blinks changes if a person moves closer to or away from the camera. Because of this, blink might not be detected if the head position of the subject changes. This problem will be eliminated if we calculate the ratio as we will always have the horizontal length as a constant reference.

The aim of this section is to calculate the blink ratio. We use a few auxiliary functions as follows. The first step is to implement a function that returns the midpoint of two dlib.points and then a function to calculate Euclidean distance between two points. In the second step, we work on a function that returns blink ratio for a single eye. Finally, we call the function to get the blink ratio for both eyes.

Part 1: Prerequisite functions

Part 2: The Blink Ratio Function

The function get_blink_ratio returns the value of the blink ratio of one eye. It has two arguments, eye_points is the array of indexes of the eye landmarks (part 4.1) and dlib.full_object_detection object discussed in part 4.2.

We now extract the points required to calculate the horizontal and vertical lengths. corner_left and corner_right are the points used to calculate the horizontal length. We directly extract them from the facial landmarks as explained in part 4.3. Unlike corner_left and corner_right we have to calculate center_top and center_bottom from the facial landmarks available. center_top and center_bottom are the midpoints of the two eye landmarks on the top and the bottom respectively. They are used to calculate the vertical length.

After extracting all the points, we calculate the length using the euclidean_distance function to get horizontal_length and vertical_length. Then we calculate and then return the blink eye ratio of one eye.

Part 3: The function calls

We call the function get_blink_ratio for every frame to get the blink ratios of both eyes by passing the required landmarks. For our application, we used the average of the blink ratio for the left and the right eye. The value of the blink ratio remains higher even if only one eye is closed. This differentiates blinking from winking.

Step 6: Detect Blinks!

Now that we have calculated the blink ratio, we need to decide a threshold. We check for every frame if the blink ratio is above the threshold. If it is above the threshold, in our case we display a message on the OpenCV window.

Here the threshold to detect a blink is 5.7. This value was decided empirically. The cv2.putText function displays the text “BLINKING” on the OpenCV window once a blink is detected. Depending on the goal of your application, you may have to do something different.

The code will detect all the blinks until you press the ‘escape’ key. You can experiment with the threshold to adjust the sensitivity. This is how we do blink detection using just a webcam!

Conclusion

Detecting blinks using Python is easy and fun. In this article, we demonstrate that we can do blink detection using just a commodity webcam — we don’t have to use fancy and expensive equipment.

Blinks detection can be used to implement new interfaces for gaming or as a diagnostic tool. To start you can look at a game we developed here. The game is a simple slot machine. The traditional slot machine is a game of pure chance. The user pulls a lever to spin all the slots and if all the slots match up they win. We modified this game to be a game of skill rather than a game of chance by challenging the player’s reaction time. In our version, once the drums start to rotate the player has to blink to stop each drum. The player must blink at the right time so that all the slots match up to win the game.

Our game is only a basic introduction to the plethora of applications you can come develop using blink detection. The best part about this implementation is that it makes experimenting easy. You can use it to develop your own games or for data collection for diagnostic purposes. You can improve on the implementation and make a low-cost diagnostic tool that would help millions. It’s up to you. You can change the world for the better.

All code is available at this repository.

HAPPY CODING!