Filtering out non-photo images

4 min readDec 28, 2018

We learn to train a convolutional neural network to classify photo vs non-photo images in under 10 minutes on a free Google Colab GPU environment.

Creating a training dataset

We want to automatically classify images into two categories: photos and non-photos (e.g. documents, screenshots, whiteboard photos, etc.). This is a binary classification problem and like many other challenges that can be solved with neural network models it would require a well balanced representative training and validation datasets.

In our previous challenge of predicting the quality score for an image the training dataset was available off the shelve. But what do you do when for your task there is no ready made dataset available. You create one!

We would need 2 sets of examples images for each of the classes. One of the possibilities is to use google image search to get a large enough set of images and then manually refine each class to keep only the relevant ones. An alternative we would use though is even simpler. We connect to Google Photos and use Library API to list photos and documents/screenshots, etc. images, then we download and store them for training and validation.

#https://developers.google.com/photos/library/reference/rest/v1/mediaItems/search#Filters
results = service.mediaItems().search(body={
    'pageSize':20,
    'filters': {
        'contentFilter': {
            'includedContentCategories': [
                "UTILITY"
            ]
        }
    }
}).execute()
items = results.get('mediaItems', [])

Similarly we use an exclude filter to get a list of photo images. This time tough we gather a larger sample set and randomly select a subset matching in size to our modest set of ~1K samples for non-photo images.

#https://developers.google.com/photos/library/guides/access-media-items#base-urls
def download_image(base_url, size=None, crop=False):
  url = base_url
  if size:
    (width, height) = size
    url+=f"=w{width}-h{height}"
    if crop:
      url+="-c"
  else:    
    url+="=d" # get with exif
  resp = urllib.request.urlopen(url)
  data = resp.read()
  image = np.asarray(bytearray(data), dtype="uint8")
  image = cv2.imdecode(image, cv2.IMREAD_COLOR)
  return image

We download images resized to 299x299 pixels into 2 folders which allows to create a CNN learner model with Fast.AI library using a convenient wrapper.

!du -h --max-depth=1 "{PATH}"
18M /content/drive/My Drive/Datasets/Photo-vs-NonPhoto/non-photo 
23M /content/drive/My Drive/Datasets/Photo-vs-NonPhoto/photo

Training a binary classifier CNN

Fast.AI has a convenience method to create a CNN learner for classification tasks from the samples organised into folder matching the classes - ImageDataBunch.from_folder. One thing to keep in mind is that we do not want to apply data augmentation transforms to the non-photos as doing so might present an inconsistent signal that does not existing in real-life.

# We load samples from folders by class name (photo, non-photo) and split the ~1K samples into training and validation sets
data = ImageDataBunch.from_folder(
    path=PATH,         # path to the dataset
    valid_pct=0.10,    # percentage of samples to use for the validation
    size=299,          # all images are resized to this size
    bs=128,            # the number of images in a traing batch size, change to fit into GPU memory
    ds_tfms=get_transforms(do_flip=False, flip_vert=False, max_rotate=0., max_zoom=0., max_warp=False, max_lighting=False)
).normalize(imagenet_stats)

We create a ResNet34 CNN learner and add an accuracy metric to it to understand the performance during training and validation.

# create a CNN model with weights pre-initialized from ImageNet training
learn = create_cnn(data, models.resnet34)# add validation metrics to track progress
learn.metrics = [accuracy]

Then we train use transfer learning and train only the last fully connected layers of our ResNet34 network pre-trained on ~1.5M ImageNet images.

As you can see after just 2 training epochs and 49 seconds we have reached a 82% accuracy. Let’s take it further with one cycle training.

Finally we run a validation cycle to get the total accuracy for our validation dataset.

learn.validate(metrics=[accuracy])
[0.029595109, tensor(0.9938)]

Wow — 99.38% accuracy in just 8 minutes of training! And not only that — the dataset creation, the data pipeline, the training environment as well as GPU instance were all provided by Google at zero cost.

Enjoy the unleashed power of deep learning and experiment with your data with the shared colab notebook for this post — Photo vs Non-Photo Classifier.

In the next post we will turn our attention to clustering similar images and picking the best one from the set.

Filtering out non-photo images

Creating a training dataset

Training a binary classifier CNN

Written by Mike Arbuzov