SVM based multi view face recognition using HOG(Histogram of Oriented Gradients) technique and dlib library.

Sreenand k
Analytics Vidhya
Published in
3 min readMay 11, 2020


Facial recognition is a popular method used in various fields to detect and identify the faces. Even though the accuracy rate of facial recognition is lower than that of the other biometric technologies like iris scanning and fingerprint, it is considered as a safe method due to its contact less and non-invasive process. It is predicted more people will begin to implement and prefer facial recognition over other biometric processes in the coming years due to COVID19 scenarios. There are many facial recognition libraries and frameworks developed over the last few years. In this project i have used histogram of oriented gradients technology (HOG), this method was developed early in 2005. A study by Navneet Dalal and Bill Triggs showed that HOG was able to detect humans relatively better compared to the existing methods like wavelets. SVM (Support Vector Machine) is used to classify images in HOG method.

What is HOG and how it works ?

HOG is a feature descriptor used to extract the features pixel by pixel with the help of gradients. This is primarily used for face detection, recognition and object detection. HOG works on grey scale images. Every image has a particular gradient orientation which helps the HOG extract the unique features from an image.

Requirements -

In this project i have used various videos clips from the famous web series Money Heist, the clips were collected from youtube uploaded by various users. First, the algorithm was trained with a single image of every character appearing in the video labeled separately. The model was able to identify the characters to an extend but not in multiple angles. To create a better dataset the code was tweaked a little bit in such a way that each time a face is detected the the model crops the detected face and saves it into the appropriate path with appropriate file name (character_name+n.bmp), and later on the faces collected were merged into a single image and used as the training dataset, which made it possible to detect the faces in multiple angles. In certain cases 3000+ plus faces per character were merged together to create a robust model.

I faced issued with cv2.imwrite trying to export the image into png format, so the images were saved in .bmp format.

The merged image combined with all the faces of a character looks like the image below.

The final output -

Link to the github repository of the code. Feel free to fork it :)