How to build a Face Detection application using PyTorch and OpenCV

Rosa Gradilla
2 min readJul 28, 2020

--

In this post I will show you how to build a face detection application capable of detecting faces and their landmarks through a live webcam feed. In the following post I will also show you how to integrate a classifier to recognize your face (or someone else’s) and blur it out.

Face Detector in action

For this project I leveraged facenet-pytorch’s MTCNN module, this is the GitHub repo. This framework was developed based on the paper: “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.” by Zhang, Kaipeng et al. IEEE Signal Processing Letters 23.10 (2016): 1499–1503. Multi-task Cascaded Convolutional Networks (MTCNN) adopt a cascaded structure that predicts face and landmark locations in a coarse-to-fine manner.

If you want to learn more about Multi-task Cascaded Convolutional Neural Networks you should check out my previous post, in which I explain the network’s architecture step by step.

Multi-task Cascaded Convolutional Networks (MTCNN) adopts a cascaded structure that predicts face and landmark locations in a coarse-to-fine manner. For this project your project folder structure should look like this:

The first thing you will need to do is install facenet-pytorch, you can do this with a simple pip command:

> pip install facenet-pytorch

0. Use MTCNN and OpenCV to Detect Faces with your webcam

First, inside the face_detector folder we will create a script to declare the FaceDetector class and its methods. Take a moment to look at the code:

If you prefer a video explanation, I have a video going over the code below.

lines 14–40 include the _draw() method for the class, this method will be used to draw the bounding boxes for the detected faces as well as the probability of being a face, and the facial landmarks: eyes, nose and mouth.

And finally lines 42–66 run the FaceDetector. Line 46 initiates the connection with your laptop’s webcam though OpenCV’s VideoCapture() method. Then we run a while loop to read the frames from the camera and use the draw method to draw bounding boxes, landmarks and probabilities.

Lines 62–63 stop the video if the letter ‘q’ is pressed on the keyboard. And lastly, the last three lines are creating and instance of MTCNN to pass to the FaceDetector and run it.

To incorporate a classifier to recognize and blur out your face, check out my next post.

--

--

Rosa Gradilla

Data Science graduate student interested in deep learning and computer vision