Convolutional Neural Network trained to predict photo quality score

6 min readDec 14, 2018

Using Convolutional Neural Network to predict the mean opinion score for photos we continue the series of posts to solve the challenge of automated photo quality curation to simplify handling of large collections of images.

Training and validation dataset

We will use a publicly available KonIQ-10k Image Database. For training we will use 90% and remaining 10% will be used for the automatic validation of the trained model on each iteration.

KonIQ-10k consists of 10,073 images, on which we performed very large scale crowdsourcing experiments in order to obtain reliable quality ratings from 1,467 crowd workers (1.2 million ratings).

Neural Network

For the CNN model we take the Deep Residual Convolutional Neural Network ResNet34 that allows for faster training of complex concepts. To reduce the need for millions of data samples and significantly save of training time we use transfer learning approach using the pre-trained model weights from ImageNet training.

We will train it using a great PyTorch wrapper library from Fast.AI.

Environment

Let’s head off to Google Colab Notebook and go through each of the basic data preparation, training and validation steps. You can save a copy and run it in an interactive mode. Make sure to select Python 3 kernel and a GPU backend.

First we install the required packages. “!command” allows you to run linux shell commands directly from the Colab environment.

!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
!pip install fastai dataclasses

Then we mount the Google Drive to have a persistent storage that would remain even after the Colab runtime is stopped or disconnected.

On the next step you will be asked to authorise access to your google drive for the Colab runtime. Google makes this integration seamless with the google.colab package.

During the first run of the notebook we would need to download the image dataset and corresponding labels. We pre-process the mean opinion score scaling it to 0–10 point scale.

Histogram of sample mean opinion scores after rescaling

Training

We use Fast.AI library that is packed with best practices for efficient training of NN models and makes it simple to experiment with.

import fastai
from fastai import *           # common functionality
from fastai.vision import *    # computer vision functionality
from fastai.callbacks import * # training cycle callbacks

Included are transfer learning, differential learning rates, 1 cycle learning policy, adaptive sample sizes, etc.

10 techniques learned from fast ai is a great write-up on some of the techniques used to enhance and speed-up the training cycle.

First we setup data loaders for the training and validation datasets.

# We split the 10K samples into training and validation sets
data = ImageDataBunch.from_csv(
    path=PATH,         # path to the dataset
    folder='1024x768', # folder with images
    csv_labels='koniq10k_scores_and_distributions.10scale.csv',
    fn_col=0,          # file name column index
    label_col=1,       # index of the labels
    valid_pct=0.10,    # percentage of samples to use for the validation
    size=299,          # all images are resized to this size
    bs=128,            # the number of images in a traing batch size, change to fit into GPU memory
    ds_tfms=get_transforms(do_flip=True, flip_vert=False, max_rotate=5.0, max_zoom=0., max_warp=False, max_lighting=False)
).normalize(imagenet_stats)

It’s easy to look at the samples from the training dataset with:

data.show_batch(rows=3)

Next we setup the mean average percentage error metric to track the result quality in addition to the loss function that our model will be minimising

def mape(preds, targs):
    return (np.abs((targs - preds) / targs)).mean() * 100

Then following the transfer learning approach we create a neural network model based on a ResNet34 architecture which was pre-trained on ImageNet. Fast AI automatically takes care of correctly cutting off the last fully connected layers and replacing with the ones that are relevant for training the network to predict our targets which based on the data we have setup will be a regression for a single output.

# create a CNN model with weights pre-initialized from ImageNet training
learn = create_cnn(data, models.resnet34)

in order to reduce overfitting (when network training loss gets lower while the validation loss increases) we raise the dropout rates on the last layers which end up looking like:

Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): Lambda()
  (2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.4)
  (4): Linear(in_features=1024, out_features=512, bias=True)
  (5): ReLU(inplace)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.6)
  (8): Linear(in_features=512, out_features=1, bias=True)
)

Next we set the initial learning rate quite and run a few epochs fitting the network to our test data while most of the netowork weights are fixed and only the last 2 fully connected layers are allowed to change weights.

The performance you would get on this step depends on how fast is the data pipeline at the moment. It’s speed can vary significantly in the colab environment especially when we take into account that data comes from the remotely mounted cloud storage.

Total time: 37:16
epoch train_loss  valid_loss  mape
1 3.836365  0.520080  18.071291
2 1.382419  0.447560  20.552778

This indicates a nice trend — the loss function for both training and validation data sets is decreasing with each epoch.

training and validation data set loss vs mini-batch count

After initial training we reduce the learning rate and let it train for a longer time using 1cycle policy

Total time: 1:27:37
epoch train_loss  valid_loss  mape
...
10  0.496612  0.286066  19.329603

Discriminative layer training cycle

So far we have been training only the last 2 layers of the network which was initially trained on ImageNet to recognise objects. To improve the accuracy of predicting the perceived image quality we need to let the model update the weights on the earlier layers. To do this without loosing the transfer learning gained effect we use a discriminative layer training cycle i.e. we set the learning rate lower for the earlier layers and higher for the later ones.

NB! we are now keeping track for the forward and backwards pass of many more parameters than before so in order to fit into GPU memory we must reduce the batch size.

data.batch_size = 128
learn.unfreeze()
learn.fit_one_cycle(10, 
  max_lr=(1e-5, 1e-4, 1e-3), 
  wd=(1e-5, 1e-4, 1e-3)
)

Next in our set of training optimisation tools is a not so obvious technique of using the network trained on smaller input image size as a pre-trained model to increase the performance of the network using a larger image size.

We started with an input images resized to 299x299 pixels, then we train it on 512x512 pixel images and finally on 1024x1024 pixel images.

Mean Average Percentage Error vs Image Size

Validation and Results

We have trained the network to predict the mean opinion score for the sample images on a 0–10 point scale with a mean average percentage error or 19.7% lets look at results to understand better what it means.

The samples we see show that we got a decent accuracy predicting image quality similar to what initial audience has been scoring them.

The final part of the colab notebook for this post shows how you can set the trained model to an inference mode and try it out on your own images.

Hope you have enjoyed the journey. In the next post we will learn to classify the photo vs non-photo images in a photo stream to help filtering out the screenshots, receipts and various other clutter from the collection of beautiful photos you have.