How to build an image recognition system using Keras and Tensorflow for a 1000 everyday object categories (ImageNet ILSVRC)

Published in

Deep Learning Sandbox

6 min readMar 25, 2017

Image recognition with the top 5 predicted labels and their probabilities (red row denotes the correct answer) http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Go straight to the code on Github.

In this series of posts, I will show you how to build your own recognition or detection/bounding box prediction web service in just a few lines of code using Keras, TensorFlow, and the python requests library. The post series is as follows:

Build an image recognition system for a 1000 everyday object categories (ImageNet ILSVRC) using Keras and TensorFlow (this post)
Build an image recognition system for any customizable object categories using transfer learning and fine-tuning in Keras and TensorFlow
Build a real-time bounding-box object detection system for hundreds of everyday object categories (PASCAL VOC, COCO)
Build a web service for any image recognition or object detection system

What is it you want to recognize?

There are 3 popular academic competitions in the field of computer vision that have been tremendously impactful: ImageNet ILSVRC, PASCAL VOC, and COCO . These competitions have propelled inventions in computer vision research, and many are available for free and unrestricted use. For this post, I will focus on image recognition using ImageNet ILSCVRC.

Take a look at the the ILSVRC object list. If the particular objects you’re interested in recognizing are one of the 1001 objects in that list, you’re in luck! Here is an excerpt of the list of object categories:

ImageNet ILSVRC labels excerpt

What if you’re object of interest is not on that list, or is a significantly different setting like medical image analysis? I will cover an extremely valuable approach called transfer learning and fine-tuning in the second post.

Image Recognition

What is image (or object) recognition? It answers the question: “what objects are depicted in this image?” This could be useful if you would like to tag images based on content, identify what food is on your plate, classify between images containing cancer or non-cancer, and many more applications.

Keras and TensorFlow

Keras is a high-level neural network library that serves as an easy-to-use abstraction layer on top of the numerical computation library TensorFlow. It even provides access via its keras.applications module to ILSVRC competition-winning convolutional network models like ResNet50 (developed by Microsoft Research) and InceptionV3 (developed by Google Research) for free and unrestricted use. To install, follow the instructions at:

Keras installation: https://keras.io/#installation
TensorFlow installation: https://www.tensorflow.org/install/

Implementation

To go straight to the full program, check out the github.

Our end goal is to write a small python program with argument options of either 1. a path to a local file or 2. a URL to an image. Here is the example usage using a photo of an African elephant.

1. python classify.py --image African_Bush_Elephant.jpg
2. python classify.py --image_url http://i.imgur.com/wpxMwsR.jpg

https://upload.wikimedia.org/wikipedia/commons/3/37/African_Bush_Elephant.jpg

The output will look like:

Top 3 predicted categories and their probabilities

Prediction function

To start, let’s load the keras.preprocessing and the keras.applications.resnet50 modules (resnet50 paper: Deep Residual Learning for Image Recognition), and load the ResNet50 model using weights that have been trained on the ImageNet ILSVRC competition:

import numpy as np
from keras.preprocessing import image
from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictionsmodel = ResNet50(weights='imagenet')

Then we can define a predict function:

def predict(model, img, target_size, top_n=3):
  """Run model prediction on image
  Args:
    model: keras model
    img: PIL format image
    target_size: (width, height) tuple
    top_n: # of top predictions to return
  Returns:
    list of predicted labels and their probabilities
  """
  if img.size != target_size:
    img = img.resize(target_size)  x = image.img_to_array(img)
  x = np.expand_dims(x, axis=0)
  x = preprocess_input(x)
  preds = model.predict(x)
  return decode_predictions(preds, top=top_n)[0]

Note that to use the ResNet50 architecture, target_size must equal (224, 224). Many CNN architectures have a fixed input size and ResNet50 is one such architecture, where the inventors used a fixed size input of (224, 224).

image.img_to_array: converts a PIL format image to a numpy array

np.expand_dims: converts our (3, 224, 224) size image to (1, 3, 224, 2 24). The reason for this is that the model.predict function requires a 4 dimensional array as input, where the 4th dimension corresponds to the batch size. That means, if we wanted to, we could classify multiple images at once.

preprocess_input: zero-centers our image data using the mean channel values from the training dataset. This is an extremely important step that, if skipped, will cause all the predicted probabilities to be incorrect. This mean centering is what’s called data normalization, a fundamental concept in machine learning.

model.predict: runs inference on our data batch and returns predictions

decode_predictions: takes the coded labels associated with model.predict and returns human-readable labels from the ImageNet ILSVRC set.

The keras.applications module provides 4 off-the-shelf architectures: ResNet50, InceptionV3, VGG16, VGG19, XCeption. We arbitrarily chose ResNet50, but you are free to swap that out with any of the other off-the-shelf architectures. Checkout https://keras.io/applications/ for additional information and references.

Plotting

We can use matplotlib to print the output in a horizontal bar graph like so:

def plot_preds(image, preds):  
  """Displays image and the top-n predicted probabilities 
     in a bar graph  
  Args:    
    image: PIL image
    preds: list of predicted labels and their probabilities  
  """  
  #image
  plt.imshow(image)
  plt.axis('off')
  
  #bar graph
  plt.figure()  
  order = list(reversed(range(len(preds))))  
  bar_preds = [pr[2] for pr in preds]
  labels = (pr[1] for pr in preds)
  plt.barh(order, bar_preds, alpha=0.5)
  plt.yticks(order, labels)
  plt.xlabel('Probability')
  plt.xlim(0, 1.01)
  plt.tight_layout()
  plt.show()

Main

In order to have this command line usage:

1. python classify.py --image African_Bush_Elephant.jpg
2. python classify.py --image_url http://i.imgur.com/wpxMwsR.jpg

We’ll define a main function as follows:

if __name__=="__main__":
  a = argparse.ArgumentParser()
  a.add_argument("--image", help="path to image")
  a.add_argument("--image_url", help="url to image")
  args = a.parse_args()if args.image is None and args.image_url is None:
    a.print_help()
    sys.exit(1)if args.image is not None:
    img = Image.open(args.image)
    plot_preds(predict(model, img, target_size))if args.image_url is not None:
    response = requests.get(args.image_url)
    img = Image.open(BytesIO(response.content))
    plot_preds(predict(model, img, target_size))

The image_url option uses the python Requests library to easily download an image from any URL!

We’re done!

Once you put all the above code together, you have the beginnings of an image recognition system! See the complete program and example images here on Github.

The next post in our series will cover the situation where your object of interest is not one of the ImageNet ILSVRC categories:

Build an image recognition system for a 1000 everyday object categories (ImageNet ILSVRC) using Keras and TensorFlow (this post)
Build an image recognition system for any customizable object categories using transfer learning and fine-tuning in Keras and TensorFlow
Build a real-time bounding-box object detection system for hundreds of everyday object categories (PASCAL VOC, COCO)
Build a web service for any image recognition or object detection system