Tracking your eyes with Python

Eye tracking in action

This article is an in-depth tutorial for detecting and tracking your pupils’ movements with Python using the OpenCV library. It’s a step-by-step guide with detailed explanations, so even newbies can follow along.

It might sound complex and difficult at first, but if we divide the whole process into subcategories, it becomes quite simple. To start, we need to install packages that we will be using:

pip install opencv-python==3.4.5.20

Even though it’s only one line, since OpenCV is a large library that uses additional instruments, it will install some dependencies like NumPy. We specify the 3.4 version because if we don’t, it’ll install a 4.x version, and all of them are either buggy or lack in functionality.

In general, “detection” processes are machine-learning based classifications that classify between object or non-object images. For example, whether a picture has a face on it or not, and where the face is if it does.

To classify, you need a classifier. There are available face and eyes classifiers(haar cascades) that come with the OpenCV library, you can download them from their official github repository: Eye Classifier, Face Classifier

To download them, right click “Raw” => “Save link as”. Make sure they are in your working directory.


With the introduction out of the way, let’s start coding. Create a file track.py in your working directory and write the following lines there. They’ll import and initiate everything we’ll need.

import cv2
import numpy
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')

Although we will be tracking eyes on a video eventually, we’ll start with an image since it’s much faster, and the code that works on a picture will work on a video, because any video is just N pictures(frames) per second. So, download a portrait somewhere or use your own photo for that. I’ll be using a stock picture.

The stock image I’m using

Once it’s in your working directory, add the following line to your code:

img = cv2.imread(“your_image_name.jpg”)

In object detection, there’s a simple rule: from big to small. Meaning you don’t start with detecting eyes on a picture, you start with detecting faces. Then you proceed to eyes, pupils and so on. It saves a lot of computational power and makes the process much faster. Also it saves us from potential false detections.

To detect faces on a picture, we first need to make it gray. Then we’ll detect faces.

gray_picture = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)#make picture gray
faces = face_cascade.detectMultiScale(gray_picture, 1.3, 5)

Faces object is just an array with small sub arrays consisting of four numbers. They are X, Y, width and height of the detected face. For example, it might be something like this:

[[356  87 212 212]
[ 50 88 207 207]]

It would mean that there are two faces on the image. 212x212 and 207x207 are their sizes and (356,87) and (50, 88) are their coordinates.

To see if it works for us, we’ll draw a rectangle at (X, Y) of width and height size:

for (x,y,w,h) in faces:
cv2.rectangle(img,(x,y),(x+w,y+h),(255,255,0),2)

Those lines draw rectangles on our image with (255, 255, 0) color in RGB space and contour thickness of 2 pixels.

Now we can display the result by adding the following lines at the very end of our file:

cv2.imshow('my image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Now that we’ve confirmed everything works, we can continue. We’ll detect eyes the same way. But on the face frame now, not the whole picture. Under the cv2.rectangle(img,(x,y),(x+w,y+h),(255,255,0),2) line add:

gray_face = gray_picture[y:y+h, x:x+w] # cut the gray face frame out
face = img[y:y+h, x:x+w] # cut the face frame out
eyes = eye_cascade.detectMultiScale(gray_face)

The eyes object is just like faces object — it contains X, Y, width and height of the eyes’ frames. You can display it in a similar fashion:

for (ex,ey,ew,eh) in eyes: 
cv2.rectangle(face,(ex,ey),(ex+ew,ey+eh),(0,225,255),2)

Notice that although we detect everything on grayscale images, we draw the lines on the colored ones. The sizes match, so it’s not an issue. We’ll use this principle of detecting objects on one picture, but drawing them on another later.

Looks like we’ve ran into trouble for the first time:

Is this chin really an eye?

Our detector thinks the chin is an eye too, for some reason. What can be done here? Of course, you could gather some faces around the internet and train the model to be more proficient. But there’s another, the Computer Vision way.

If you think about it, eyes are always in the top half of your face frame. I don’t think anyone has ever seen a person with their eyes at the bottom of their face. So, when going over our detected objects, we can simply filter out those that can’t exist according to the nature of our object. Like with eyes, we know they can’t be in the bottom half of the face, so we just filter out any eye whose Y coordinate is more than half the face frame’s Y height.

Eyes can’t really be in the lower half of your frame

We’ll put everything in a separate function called detect_eyes:

def detect_eyes(img, img_gray, classifier):
coords = cascade.detectMultiScale(img_gray, 1.3, 5)# detect eyes
height = np.size(image, 0) # get face frame height
for (x, y, w, h) in coords:
if y+h > height/2: # pass if the eye is at the bottom
pass

We’ll leave it like that for now, because for future purposes we’ll also have to return left and right eye separately. OpenCV can put them in any order when detecting them, so it’s better to determine what side an eye belongs to using our coordinate analysis. If the eye’s center is in the left part of the image, it’s the left eye and vice-versa.

We’ll cut the image in two by introducing the width variable:

def detect_eyes(img, classifier):
gray_frame = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
eyes = cascade.detectMultiScale(gray_frame, 1.3, 5) # detect eyes
width = np.size(image, 1) # get face frame width
height = np.size(image, 0) # get face frame height
for (x, y, w, h) in eyes:
if y > height / 2:
pass
eyecenter = x + w / 2 # get the eye center
if eyecenter < width * 0.5:
left_eye = img[y:y + h, x:x + w]
else:
right_eye = img[y:y + h, x:x + w]
return left_eye, right_eye

But what if no eyes are detected? Then the program will crash, because the function is trying to return left_eye and right_eye variables which haven’t been defined. So to avoid that, we’ll add two lines that pre-define our left and right eyes variables:

def detect_eyes(img, classifier):
gray_frame = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
eyes = cascade.detectMultiScale(gray_frame, 1.3, 5) # detect eyes
width = np.size(image, 1) # get face frame width
height = np.size(image, 0) # get face frame height
left_eye = None
right_eye = None
for (x, y, w, h) in coords:
....

Now, if an eye isn’t detected for some reason, it’ll return None for that eye.

Before we jump to the next section, pupil tracking, let’s quickly put our face detection algorithm into a function too. It’s nothing difficult compared to our eye procedure. I’ll just note that false detections happen for faces too, and the best filter in that case is the size. Usually some small objects in the background tend to be considered faces by the algorithm, so to filter them out we’ll return only the biggest detected face frame:

def detect_faces(img, classifier):
gray_frame = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
coords = cascade.detectMultiScale(gray_frame, 1.3, 5)
if len(coords) > 1:
biggest = (0, 0, 0, 0)
for i in coords:
if i[3] > biggest[3]:
biggest = i
biggest = np.array([i], np.int32)
elif len(coords) == 1:
biggest = coords
else:
return None
for (x, y, w, h) in biggest:
frame = img[y:y + h, x:x + w]
return frame

Also notice how we once again detect everything on a gray picture, but work with the colored one.


Okay, now we have a separate function to grab our face and a separate function to grab eyes from that face. Onto the eye tracking. We use the blob detection algorithm, so we need to initialize the detector first. It doesn’t require any files like with faces and eyes, because blobs are universal and more general:

detector_params = cv2.SimpleBlobDetector_Params()
detector_params.filterByArea = True
detector_params.maxArea = 1500
detector = cv2.SimpleBlobDetector_create(detector_params)

It needs to be initialized only once, so better put those lines at the very beginning, among other initialization lines. Also, we need area filtering for better results. Trust me, no pupil will be more than 1500 pixels. But many false detections are.

Now, to tracking eyes. Our eye frame looks something like this:

We need to effectively spot the pupil like that:

Blob detector detects what its name suggests: blobs. The good thing about it is that it works with binary images(only two colors). To get a binary image, we need a grayscale image first. Luckily, we have those.

Now, the way binary thresholding works is that each pixel on a grayscale image has a value ranging from 0 to 255 that stands for its color. You pass a threshold value to the function and it makes every pixel below the value 0 and every pixel above the value the value that you pass next, we pass 255 so it’s white.

_, img = cv2.threshold(img, threshold, 255, cv2.THRESH_BINARY)

_ stands for an unneeded variable, retval in our case, we don’t need it. So we just make it _ and forget about it. The result image with threshold=127 will be something like this:

Every pixel that was below 127 is now black, every pixel above 127 is now white

Looks terrible, so let’s lower our threshold. With threshold=86 it’s like this:

The same procedure with 86 threshold

Better already, but still not good enough. We need to make the pupil distinguishable, so let’s experiment for now. Maybe on your photo the lighting is different, and a different threshold works best. Anyway, the result should be like this:

Now it’s 42

The pupil is a huge black point here, while its surroundings are just some narrow lines. Also, on this stage we’ll use another CV analysis-based trick: the eyebrows always take ~25% of the image starting from the top, so we’ll make a cut_eyebrows function that cuts eyebrows from the eye frame, because they sometimes are detected instead of the pupil by our blob detector.

def cut_eyebrows(img):
height, width = img.shape[:2]
eyebrow_h = int(height / 4)
img = img[eyebrow_h:height, 0:width] # cut eyebrows out (15 px)
return img

Now, with what we have done, the eye frames look like this:

Let’s try detecting and drawing blobs on those frames:

def blob_process(img, detector):
gray_frame = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, img = cv2.threshold(gray_frame, 42, 255, cv2.THRESH_BINARY)
keypoints = detector.detect(img)
return keypoints
keypoints = blob_process(eye, detector)
cv2.drawKeypoints(eye, keypoints, eye, (0, 0, 255), cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

The problem is that our picture isn’t processed enough and the result looks like this:

But we are almost there! Just some image processing magic and the eye frame we had turns into a pure pupil blob:

Believe it or not, it’s the same eye frame

Just add the following lines to your blob processing function:

def blob_process(img, detector):
gray_frame = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, img = cv2.threshold(gray_frame, 42, 255, cv2.THRESH_BINARY)
img = cv2.erode(img, None, iterations=2) #1
img = cv2.dilate(img, None, iterations=4) #2
img = cv2.medianBlur(img, 5) #3
keypoints = detector.detect(img)
return keypoints

We did a series of erosions and dilations to reduce the “noise” we had. That trick is commonly used in different CV scenarios, but it works well in our situation. After that, we blurred the image so it’s smoother.

Now, if we try to detect blobs on that image, it’ll give us this:

And since that was originally our eye image, we can draw the same circle on our eye image:

Believe it or not, that’s basically all. All that’s left is setting up camera capture and passing its every frame to our functions. Let’s define a main() function that’ll start video recording and process every frame using our functions. Notice the if not None conditions, they are here for cases when nothing was detected. If not for them, the program would crash if you were to blinked.

def main():
cap = cv2.VideoCapture(0)
while True:
_, frame = cap.read()
face_frame = detect_faces(frame, face_cascade)
if face_frame is not None:
eyes = detect_eyes(face_frame, eye_cascade)
for eye in eyes:
if eye is not None:
eye = cut_eyebrows(eye)
keypoints = blob_process(eye, detector)
eye = cv2.drawKeypoints(eye, keypoints, eye, (0, 0, 255), cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('my image', face_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()

Everything would be working well here, if your lighting was exactly like at my stock picture. On it, the threshold of 42 is needed. But your lighting condition is most likely different. And no blobs will be detected. You need a different threshold.

It would be best if we could dynamically set our threshold. For that, we’ll set up a threshold slider. It needs a named window and a range of values:

def main():
cap = cv2.VideoCapture(0)
cv2.namedWindow('image')
cv2.createTrackbar('threshold', 'image', 0, 255, nothing)
while True:
_, frame = cap.read()
face_frame = detect_faces(frame, face_cascade)
if face_frame is not None:
eyes = detect_eyes(face_frame, eye_cascade)
for eye in eyes:
if eye is not None:
threshold = cv2.getTrackbarPos('threshold', 'image')
eye = cut_eyebrows(eye)
keypoints = blob_process(eye, threshold, detector)
eye = cv2.drawKeypoints(eye, keypoints, eye, (0, 0, 255), cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('image', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()

Now on every iteration it grabs the value of the threshold and passes it to your blob_process function which we’ll change now so it accepts a threshold value too:

def blob_process(img, threshold, detector):
gray_frame = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, img = cv2.threshold(gray_frame, threshold, 255, cv2.THRESH_BINARY)
img = cv2.erode(img, None, iterations=2)
img = cv2.dilate(img, None, iterations=4)
img = cv2.medianBlur(img, 5)
keypoints = detector.detect(img)
print(keypoints)
return keypoints

Now it’s not a hard-coded 42 threshold, but the threshold you set yourself.

The issue with OpenCV track bars is that they require a function that will happen on each track bar movement. We don’t need any sort of action, we only need the value of our track bar, so we create a nothing() function:

def nothing(x):
pass

So now, if you launch your program, you’ll see yourself and there will be a slider above you that you should drag until your pupils are properly tracked.

Trying it out with different movements
Works when it’s dark too. It’s all about the threshold

There are many more tricks available for better tracking, like keeping your previous iteration’s blob value and so on. But what we did so far should be enough for a basic level.

My github is http://github.com/stepacool/ — you can find eye tracking code here that uses some advanced methods for better accuracy.

You can find what we wrote today in the “No GUI” branch: https://github.com/stepacool/Eye-Tracker/tree/No_GUI

YouTube demonstration is available here:

https://www.youtube.com/watch?v=zDN-wwd5cfo

Feel free to contact me at stepanfilonov@gmail.com