Image Scoring: Allocating Percentage Score to Images for Their Quality

This article illustrates a way to quantify the image quality based on certain image quality attributes.

Prateek Chhikara
Engineering @ Housing/Proptiger/Makaan
7 min readApr 19, 2022

--

Figure 1. Pipeline of the used approach.

Describing image quality and aesthetics has been widely researched in digital image processing and computer vision. The technical quality assessment could be done at the pixel level, where we can estimate the degradations of pixels in terms of noise, blur, compression artifacts, etc. Generally, image quality assessment problems can be solved using full-reference or no-reference approaches. In the case of the reference image, we use image quality metrics such as Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) to contrast the ground truth and the enhanced image. However, we rely on statistical models to predict image quality when a reference image is unavailable. The main goal of both approaches is to predict a quality score that correlates well with human perception.

This article will cover the non-reference-based image quality assessment approach by aggregating four methods: blur, luminosity, contrast, and Image Quality Assessment (IQA). Image quality is assessed on these four parameters, and finally, a cumulative score is generated by aggregating the scores of each of the four methods.

We will now discuss the approach used to allocate a quality score to an image. We have taken a random sample of 10,000 real-estate images from the Housing.com platform for analysis and experimentation. For each of the four approaches, we calculated a score on a scale of 0 to 1, where 1 being the best score and 0 being the lowest score.

After getting values for each of the four methods, we then picked 1,000 image samples from these 10,000 samples and manually annotated their quality on a scale of 1 to 5, where {1:worst, 2:bad, 3:average, 4:good, 5:best}. Further, a classifier is trained using these 1,000 labeled images, and using that; we found the weights that can be assigned to these four features to find the overall metric corresponding to an image.

1. BLUR

Blur in an image makes the image less sharp, less clear, less distinct, and reduces the detail in an image. Blurred images might also induce negative affective responses, such as uncanniness, to the visual environment. We have used the Laplacian method to calculate the blur in an image [2]. Figure 2 shows the box plot of blur scores calculated through the Laplacian method for 10,000 images. The higher the value, the more the blur in the image; a value close to zero signifies almost no blur.

Figure 2. Box-plot of Laplacian output values of 10,000 image samples.

The extreme values in the box plot have been removed, and the remaining values have been normalized on a scale of 0 to 1 as follows:

if laplacian_score > 1000: 
blur_score = 0
if laplacian_score < 0:
blur_score = 1
if laplacian_score in between 0 and 1000:
normalize the values in range 0 to 1
(in simpler words, divide the laplacian_score by 1000 and then subtract the score from 1)

Thus, we immediately penalize any values exceeding a blur value of 1,000, giving it a 0 (minimum score). For the rest between 0 and 1,000, we normalize them into the range [0,1], where a lower blur value would translate into a higher normalized score.

2. LUMINOSITY

Luminosity is the perceived brightness of a color. The value of a pixel in a digital image (8-bits) lies between 0 and 255, where 0 denotes black and 255 denotes white. Generally, the values to the extreme degrade the overall quality and aesthetics of the image. Hence, the image should have balanced pixel values. We first converted the RGB image to greyscale and then calculated the average luminosity (average pixel value) of an image that will lie between 0 and 255. Further, a box plot is used to analyze the 10,000 images’ luminosity, as shown in Figure 3. We can see that most values lie between 100 to 150, while the ones below 50 and above 200 can be considered bad quality images based on the luminosity factor.

Figure 3. Box-plot of Luminosity output values of 10,000 image samples.

The following rules were applied when calculating the luminosity score:

if the average luminosity lies below 50 or above 228 : 
luminosity_score = 0
if the average luminosity lies between 50 to 100 or 150 to 228 :
normalize it between 0 and 1
if the luminosity lies between 100 to 150 :
luminosity_score = 1

3. CONTRAST

Contrast is the color difference that makes an object distinguishable from other objects within the same field of view. Contrast calculates the randomness of pixels in an image. A real-life example can better explain it; consider a scenario of a sunny and a foggy day. On a sunny day, everything looks clear to us, thus having a high contrast compared to a foggy day, where everything looks almost the same intensity (dull, washed-out grey look). To find contrast in an image, we have used “Shannon entropy,” a measure of image information that quantifies the information available in an image. The box plot of Shannon entropy for the used 10,000 images is shown in Figure 4.

Figure 4. Box-plot of shannon entropy output values of 10,000 image samples.

The following rules were used to calculate the contrast score:

if the shannon entropy lies below 1 : 
contrast_score = 0
if the shannon entropy lies between 1 and 8 :
normalize it between 0 and 1
if the shannon entropy lies above 8 :
contrast_score = 1

4. IQA

IQA is an open-source no-reference image quality assessment tool that returns a score corresponding to an image based on its quality [1]. The authors implemented an aesthetic and technical image quality classifier based on Google’s research paper “NIMA: Neural Image Assessment.” NIMA consists of two Convolutional Neural Networks (CNN) that aim to predict images’ aesthetic and technical quality, respectively [3].

Figure 5 shows the IQA box plot of the used 10,000 images. The following rules were used to calculate the IQA score:

if IQA < 3 : 
IQA = 0
if IQA lies between 3 and 7 :
normalize it between 0 and 1
if IQA > 7 :
IQA = 1
Figure 5. Box-plot of IQA output values of 10,000 image samples.

Final Scoring

The final score, which represents the quality of an image, uses all the four features mentioned before with varying degrees of contribution — weights. To determine the weight of each feature, we took a sample of images (a mixture of good and bad images) and manually labeled them one of the following: worst (1), bad (2), average (3), good (4), best (5). We then used Random Forest Classifier to train a model on all these images’ scores to determine a feature’s importance (rf.feature_importances_). The results were as follows:

The IQA feature occupies the highest proportion of the weights. Thus, to find the final score of an image, we used the equation shown in Figure 6.

Figure 6. Final scoring mechanism.

Result Samples

Figure 7 shows two images that were used to evaluate the final image quality score and the following table shown the individual score and the final score.

Image   |  Blur  |  Luminosity |  Contrast  |  IQA  | Final Score
------------------------------------------------------------------
Fig 7(a)| 0.974 | 1.000 | 0.781 | 0.754 | 86.99%
------------------------------------------------------------------
Fig 7(b)| 0.848 | 0.860 | 0.796 | 0.570 | 75.57%
Figure 7. Original Images (a) left one, and (b) right one.

Conclusion

This article elaborates on the non-reference image quality assessment mechanism to quantify the image quality using four metrics. For each metric, we discussed the criteria to give it a score and normalize the score in the range of 0 to 1. Lastly, we presented the final scoring mechanism, the weighted average of the output of the four mentioned metrics.

--

--

Prateek Chhikara
Engineering @ Housing/Proptiger/Makaan

AI Engineer @ Autoenhance.ai | CS @ University of Southern California | 3+ years of industrial experience. Website: https://www.prateekchhikara.com