A Tool to Automatically Detect Photo Quality
In this era of information explosion, our cloud accounts are filled with hundreds of photos. It’s a pain to scroll through many pages to decide on good ones to upload to our social networks. Photo Grader comes as a time saver. One can simply upload a gallery and the app ranks the photos by aesthetic quality.
This web app was developed for my consulting project for Insight Data Science. I worked with an online photographer booking platform to develop an API to check the quality of the galleries uploaded by photographers before they are delivered to the customers. This tool gives suggestions on quality aspects that the photographers might have overlooked or could easily modify to increase the overall consumer satisfaction with the photos. While the platform already has high customer satisfaction, this tool is meant to boost it even further.
The photography platform collects feedback from customers once they receive the photos after the shoot. They can provide feedback via ratings: 1–5 stars for product, service and platform, respectively, and 1–10 NPS for the platform. Assuming that the customers separate photo quality from the experience as a whole, the product rating should factor in photo aesthetics to a large extent. Therefore, my initial approach was to use the product rating on the albums as a proxy for photo rating. And given that over 77% of albums on the platform get 5 star product ratings, my task was really to understand why some albums got non-5 star ratings. Therefore, I labeled an album to be 1 if it is below 5 star, and 0 otherwise.
One pitfall of using an aggregate measure to proxy disaggregated observations is in the organization of the training and testing sets. I was careful not to put photos from the same album into both the training and testing data sets. Because if that happens, what the algorithm will learn would be finding contents that belong to the same album. Therefore, I applied the train / validation / test split (7:1:2) on the album level. After the split, I got 360K photos in my training set, representing 3K albums.
I applied several pre-trained convolutional neural network models on my training set, dropping the top layer, adding a 124-dimension fully connected dense layer and incorporating my binary classification of bad (label 1) vs. good album ratings as the top layer. I set the number of epochs to be 10 to start with and used the validation set for early stop — if the validation loss keeps going up for two rounds, the iteration stops.
Figure 1 shows the ROC curves for the train, cross-validation, and test data sets from the result using MobileNet. There is significant over-fitting. The AUC score for the training set is 0.6, while that for the CV and test sets are 0.52 and 0.55 respectively. There is some learning from the training set, however, the over-fitting indicates that the algorithm might be only learning whether two photos are from the same album, so it could do a decent job if it has seen the album and now the rating in the training set. Whereas if it has never seen the album, it does poorly in predicting album ratings.
Apply Transfer Learning on an Aesthetic Database
Since the AUC score is low even on the training set when I use solely the platform data, I decided to turn to an academic database with aesthetic quality labels to validate my transfer learning approach.
The data I ended up using is the Aesthetics and Attributes Database (AADB) developed by researchers from UCI and Adobe. They downloaded 10K photos from Flickr and hired examiners to hand label the overall aesthetic quality together with 11 separate aesthetic attributes such as brightness, lighting, shallow depth of field, vivid color, rule of thirds, etc. The aesthetic scores are (almost) normally distributed between 0 and 1 and I generated my binary label to be 1 if the aesthetic score is below 0.4, and 1 otherwise. The reason I did not choose 0.5 as a natural cutoff is because I’m interested in identifying the photos of lowest quality. Then I applied the same transfer learning method as I did with the platform data.
Figure 2 shows the ROC curve for transfer learning using MobileNet. The AUC score is as high as 0.87 on the test set, indicating that the algorithm could correctly discern bad from good photos using the AADB aesthetic criteria. Surprisingly, the transfer learning result using a more complex neural net, such as ResNet50, does not give good learning result. This implies that when learning photo aesthetics, it is more about composition, rather than content.
Using Flasks and Gunicorn, I deployed my training results into the web app Photo Grader. After uploading an album of photos, the app calculates the overall aesthetic score and show the top and bottom 3 photos of the album in terms of the overall aesthetic score. It would also list how each photo performs on several specific attributes.
Photo Grader is live online, and you can visit and play with it at photograder.wangruoying.com.
What Affects Album Ratings?
My ultimate goal is to understand what contribute to very low and high album ratings. Equipped with Photo Grader, we can generate album level aesthetic quality features. I choose to use the aesthetic score for the median, the top 10%, and bottom 10% photos. The median score represent the overall quality of the album, the higher the score, the higher the quality. The top and bottom scores tries to capture the potential boosting effect of the outstanding few photos, and/or the dragging effect of the worst few photos.
In addition to photo quality, I can also directly get or engineer other features of the albums, include the shoot type (wedding, event, profile, etc), the country, state and city that the photography took place, the time of the photo shoot, the time when the photos are uploaded, the total number of photos within an album, and the average size of photos.
Still using 5-star vs. non-5-star as the label, I apply random forest classification on the features and get the feature importance weights as shown in Figure 4.
My variables of interest consist of two parts: one is the album level aesthetic quality, proxied by the median, top 10% and bottom 10% scores. The second part is the non-aesthetic features. It is not so trivial to get the direction of the effect each feature has on album rating, since the relationship can be highly non-linear. Therefore, I took a simple approach by regressing album rating on each of the important features through an Ordinary Least Squares (OLS) regression. And use the sign as a proxy for the direction of the feature effect.
According to my result, all three quality scores are positively related with 5-star album ratings. The album rating is also higher if there are more photos taken per hour, the average size of photos uploaded is higher, and the photos are uploaded within 2 days after the photo shoot.
An interesting observation is that if the customer has left a note with the booking about any additional requirements, the album ratings tends to be better on average. This indicate that good communication before the event is important for customer satisfaction.
In this project, I developed Photo Grader, a tool for the platform to automatically check photo quality, and saved as a docker image for deployment. Right now, any red-flagged photos will go through manual checks of quality, which could give additional validating labels for further training and refining the model. In the future, Photo Grader can help automate photo selection. This not only helps guarantee photo quality, but also improves the photo selection and uploading experience for the photographers.
For the platform, my analysis also recommends encouraging communication between the customers and photographers. In addition, I would suggest setting some guidelines for photographers in terms of number of photos to take according to booking time and shoot length. All in all, the right expectations and good communication are king!