Face detection and recognition in fine art portraits

Wilbert Tabone
CyberCoffee
Published in
9 min readJun 13, 2023

The following article was originally published on my blog CyberCoffee back in February 2017. In an effort to move my writing to Medium, I am reproducing it as originally written below.

Disclaimer: the research below was conducted in 2012, before the emergency of deep learning methods. It is therefore somwhat technologically outdated. The purpose of this post is for archive purposes.

Face detection and recognition are two popular research areas under the umbrella of computer vision. Indeed, such is their popularity that they have broken out of the academic arena and into the mainstream media and knowledge-base. We see the application of such techniques in our daily lives: from logging into our personal computers (security) to tagging our friends on social networking sites.

Across all the different spectra in which these systems were applied to, a technical window was identified in the area of fine arts: the identification of visages containing similar facial characteristics across a collection of portraits. Previous research attempted to identify where a face has been featured across different mediums (e.g: sculpture, painting, textiles). The system we proposed in Tabone et.al, 2015 however exclusively allows for the input of a photograph or scan of painted media which is subsequently examined in order to identify facial features for recognition. These are then matched with other paintings to produce a list that ranks other paintings according to the similarity and indicate the similar faces based on similarity score. The latter information would aid in establishing new links between the human models and the artists and pave way to new interesting concepts.

For the purpose of this research, paintings from the Baroque period, specifically focusing on works by Francesco Zahra, were chosen as a case study. It is noted that there are several occurrences in Zahra’s work which feature the same face, or another with very similar characteristics. The artworks would pertain to the same or different character being portrayed [2]. This characteristic in Zahra’s work made the evaluation of such a domain, which is normally difficult to evaluate, possible. Furthermore, the results obtained were directly compared to the human visual system in order to assess the accuracy and success of the rankings.

Rotation Invariant Face Detection (RIFD)

As a pre-processing step, the mapping of the image from the 3D to the 2D colour domains (colour to grayscale) was performed followed by histogram equalisation. This process was performed in order to improve the contrast of the image and stretch the intensity range by mapping one distribution (the given histogram of intensity values) to a wider, therefore ideal distribution of uniform intensity values. What this basically means is that histogram equalisation aims to spread the y-axis values of the original distribution as evenly as possible when creating the new distribution. An example of the produced effect can be observed in Figure 1.

Figure 1: Preprocessing process. (A) Original image, (B) Grayscale image, (C) Histogram equalisation. Source: [1].

From the different detector classes that exist, the appearance-based methods were the most suitable for the system as these probabilistic-methods classify a random variable x as belonging to a face class or a non-face class [3]. Hence, the classification problem becomes binary. The chosen detector from this category was the Viola-Jones Haar classifier [4] which produces an integral image I. This is an array that contains the sum of the pixel intensities of the pixel at location (x,y) and the pixel on top and to its left [5].

In order to detect the facial features, the Haar wavelets, which are rectangular groups of pixels (Figure 2) that are formed based on the intensity values are applied to the smoothed image in order for regional face detection to occur.

Figure 2: A subset of Haar wavelets. Source: [6].

The Haar-like features, which form the core basis of Haar classi fier object detection, use the change in contrast values between two or three adjacent rectangular groups of pixels rather than the intensity values in order to aid classi fication. These variations help to determine between relative light or dark areas.

Rotation Invariance

In order to improve the rotation invariance of the detector, user-operated system (through the use of a pointing-device) was created. Through basic coordinate geometry, a line is constructed between the two eye facial regions of the portrait whilst another straight horizontal line is constructed from the topmost eye to the position where the bottom-most eye should be if the face had the correct orientation. A geometrical representation is presented in Figure 3.

Figure 3: Mathematical principle behind the head pose rectification solution. Source: [1].

The gradient m1 of the line connecting P1 (top eye) to P2 (lower eye) is calculated followed by the gradient m2 between the line connecting the top eye to the wanted bottom eye position P3. At this point, the assumptions that are made in the context of image processing is that y1 < y2 and if not, P1 and P2 should be swapped so that P1 is always the topmost eye. Moreover it must also be assured that m1.m2 ≠ 1 (the eyes are lined up vertically to each other) as this would result in a mathematical error when calculating the angle θ between the two lines using the obtained gradients.

The calculation of angle θ is done by utilising Equation 1 and then taking its arctan.

Rotating the image by the result of Equation 1 will align the eyes and face, signi cantly improving the chances of a positive detection. It is to be noted that the rules of association of angles were respected in the implementation for correct rotation.

Face Recognition

The Wilkie, Stonham and Aleksander’s Recognition Device (WiSARD) was chosen as the recognition algorithm to be applied on bi-level images (binary images) [7]. As the commodity of having several images of the same face at different angles is not provided by paintings, it was not possible to train any classifiers. Image similarity measurements were employed instead. There was also a need for a crosscheck since the system needed to be automated. Therefore, the list of outputted similar faces would have to be filtered in order to only allow the most accurate results to be displayed.

SSIM was chosen for discrimination as it is based on the human visual system, and hence it gave the most positive results when applied to both the paintings and photographs. Further experiments concluded that as the brightness of the input images increases, a better WiSARD recognition percentage is achieved due to a higher quality binary image being produced. On the other hand, it was noticed that the further a detected face image is cropped, the lower the WiSARD measurement and SSIM values become. From this observation, it was decided that there would be no further treatment to the images produced by the detector.

Figure 4: An example of a Structural Simiarity Index Measure (SSIM) system. Source: [8].

The WiSARD is a collection of RAM-discriminates, which are structures based on Random Access Memories that hold binary information. In general, the WiSARD classifier has a collection of RAM units (neurons) each of which are trained on a particular pattern. When it receives an input pattern, each neuron outputs a 0 if there is no match or 1 if there is a match in the pattern area assigned to it. The WiSARD sums up the result to produce the total value, which may be expressed as a percentage of recognition [1]. The architecture of such a system is presented in Figure 5.

Figure 5: Representation of a 10 RAM descriminator WiSARD [7].

Ranking System

In this application, a set of pre-selected images are presented as part of the test set selection module and the query window. First the user constructs a test set made up of six images to be used by the WiSARD and SSIM and consequently the system allows the user to select one of twelve query images. Each of the query images may be rotated, if necessary, before being processed for ranking.

Once the detection is completed, each detection area is processed through the WiSARD classifier, which determines, through a threshold of 20.0, which detected faces will go on for ranking. Subsequently, the SSIM is calculated between the detected face and each previously selected test image. Following this process, the similar faces are scored based on similarity using both the WiSARD and SSIM values, as per Equation 2.

A ranking window containing the WiSARD value, together with the three most similar faces to the detected face (with the highest SSIM value) are displayed. Next to each face, a link to the source paining is presented together with the calculated similarity score percentage. The window is depicted below in Figure 6.

Figure 6: The ranking interface. Source: [1].

Application and Results

An evaluation based on a direct comparison between the HVS and the similarity score was performed. Respondents to a survey were asked to rank the similarity of a detected face to a similar face or otherwise selected form another painting on a Likert scale enumerated from 0–10 (low-high). The vote was subsequently multiplied by 100 to get a percentage that could be directly compared to the similarity score obtained from the application. An overall success rate of 83% was achieved [1].

Experts in the field of curation and the arts were presented with the collected days and the results. They commended the application’s potential in the field of research and its future use in creative applications. The author has already published a follow-up application of this research in the area of augmented vision.

Conclusive Remarks

The system was successful in detecting facial features in the input paintings whilst also correctly utilising its primary recognition system in order to assess if any similar faces exist in the test set and subsequently utilising the SSIM measurement to pinpoint and rank the actual similar faces by their structural similarity value.

It is strongly believed that such a system would help art researchers in amassing collections of links in similarity between works of the same or different artists. This will aid in building upon or commencing new research in their respective areas.

This research has been published in Volume 9256 of the series Lecture Notes in Computer Science by Springer and can be acquired at the following location.

References

[1] Tabone, W. and Seychell, D., 2015, September. Recognising Familiar Facial Features in Paintings Belonging to Separate Domains. In International Conference on Computer Analysis of Images and Patterns (pp. 125–136). Springer International Publishing.

[2] Sciberras, K.: Francesco Zahra: His life and art in mid-18th century Malta 1710–1773. Midsea Books (2010).

[3] Yang,M.H.,Ahuja,N.:Facedetectionandgesturerecognitionforhuman-computer interaction, vol. 1. Springer (2001).

[4] Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Com- puter Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I-511. IEEE (2001).

[5] Medioni, G., Kang, S.B.: Emerging topics in computer vision. Prentice Hall PTR (2004).

[6] Nilsson, M.: Face detection. Presentation by the Mathematical Imaging Group, Centre for Math ematical Sciences, Lund University (2014).

[7] Aleksander, I., De Gregorio, M., Fran¸ca, F.M.G., Lima, P.M.V., Morton, H.: A brief introduction to weightless neural systems. In: ESANN. Citeseer (2009).

[8] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4), 600–612 (2004).

--

--

Wilbert Tabone
CyberCoffee

Human-Robot Interaction PhD candidate with a background in AI and a passion for culture and art. Working on AR for automated vehicles. #VR #AR #AI #UX #HCI