Greg Chu
Greg Chu
Mar 25, 2017 · 6 min read
Image recognition with the top 5 predicted labels and their probabilities (red row denotes the correct answer) http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Go straight to the code on Github.

In this series of posts, I will show you how to build your own recognition or detection/bounding box prediction web service in just a few lines of code using Keras, TensorFlow, and the python requests library. The post series is as follows:

  1. Build an image recognition system for a 1000 everyday object categories (ImageNet ILSVRC) using Keras and TensorFlow (this post)
  2. Build an image recognition system for any customizable object categories using transfer learning and fine-tuning in Keras and TensorFlow
  3. Build a real-time bounding-box object detection system for hundreds of everyday object categories (PASCAL VOC, COCO)
  4. Build a web service for any image recognition or object detection system

What is it you want to recognize?

There are 3 popular academic competitions in the field of computer vision that have been tremendously impactful: ImageNet ILSVRC, PASCAL VOC, and COCO . These competitions have propelled inventions in computer vision research, and many are available for free and unrestricted use. For this post, I will focus on image recognition using ImageNet ILSCVRC.

Take a look at the the ILSVRC object list. If the particular objects you’re interested in recognizing are one of the 1001 objects in that list, you’re in luck! Here is an excerpt of the list of object categories:

ImageNet ILSVRC labels excerpt

What if you’re object of interest is not on that list, or is a significantly different setting like medical image analysis? I will cover an extremely valuable approach called transfer learning and fine-tuning in the second post.

Image Recognition

What is image (or object) recognition? It answers the question: “what objects are depicted in this image?” This could be useful if you would like to tag images based on content, identify what food is on your plate, classify between images containing cancer or non-cancer, and many more applications.

Keras and TensorFlow

Keras is a high-level neural network library that serves as an easy-to-use abstraction layer on top of the numerical computation library TensorFlow. It even provides access via its keras.applications module to ILSVRC competition-winning convolutional network models like ResNet50 (developed by Microsoft Research) and InceptionV3 (developed by Google Research) for free and unrestricted use. To install, follow the instructions at:

Implementation

To go straight to the full program, check out the github.

Our end goal is to write a small python program with argument options of either 1. a path to a local file or 2. a URL to an image. Here is the example usage using a photo of an African elephant.

1. python classify.py --image African_Bush_Elephant.jpg
2. python classify.py --image_url http://i.imgur.com/wpxMwsR.jpg
https://upload.wikimedia.org/wikipedia/commons/3/37/African_Bush_Elephant.jpg

The output will look like:

Top 3 predicted categories and their probabilities

Prediction function

To start, let’s load the keras.preprocessing and the keras.applications.resnet50 modules (resnet50 paper: Deep Residual Learning for Image Recognition), and load the ResNet50 model using weights that have been trained on the ImageNet ILSVRC competition:

import numpy as np
from keras.preprocessing import image
from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
model = ResNet50(weights='imagenet')

Then we can define a predict function:

def predict(model, img, target_size, top_n=3):
"""Run model prediction on image
Args:
model: keras model
img: PIL format image
target_size: (width, height) tuple
top_n: # of top predictions to return
Returns:
list of predicted labels and their probabilities
"""
if img.size != target_size:
img = img.resize(target_size)
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
return decode_predictions(preds, top=top_n)[0]

Note that to use the ResNet50 architecture, target_size must equal (224, 224). Many CNN architectures have a fixed input size and ResNet50 is one such architecture, where the inventors used a fixed size input of (224, 224).

image.img_to_array: converts a PIL format image to a numpy array

np.expand_dims: converts our (3, 224, 224) size image to (1, 3, 224, 2 24). The reason for this is that the model.predict function requires a 4 dimensional array as input, where the 4th dimension corresponds to the batch size. That means, if we wanted to, we could classify multiple images at once.

preprocess_input: zero-centers our image data using the mean channel values from the training dataset. This is an extremely important step that, if skipped, will cause all the predicted probabilities to be incorrect. This mean centering is what’s called data normalization, a fundamental concept in machine learning.

model.predict: runs inference on our data batch and returns predictions

decode_predictions: takes the coded labels associated with model.predict and returns human-readable labels from the ImageNet ILSVRC set.

The keras.applications module provides 4 off-the-shelf architectures: ResNet50, InceptionV3, VGG16, VGG19, XCeption. We arbitrarily chose ResNet50, but you are free to swap that out with any of the other off-the-shelf architectures. Checkout https://keras.io/applications/ for additional information and references.

Plotting

We can use matplotlib to print the output in a horizontal bar graph like so:

def plot_preds(image, preds):  
"""Displays image and the top-n predicted probabilities
in a bar graph
Args:
image: PIL image
preds: list of predicted labels and their probabilities
"""
#image
plt.imshow(image)
plt.axis('off')

#bar graph
plt.figure()
order = list(reversed(range(len(preds))))
bar_preds = [pr[2] for pr in preds]
labels = (pr[1] for pr in preds)
plt.barh(order, bar_preds, alpha=0.5)
plt.yticks(order, labels)
plt.xlabel('Probability')
plt.xlim(0, 1.01)
plt.tight_layout()
plt.show()

Main

In order to have this command line usage:

1. python classify.py --image African_Bush_Elephant.jpg
2. python classify.py --image_url http://i.imgur.com/wpxMwsR.jpg

We’ll define a main function as follows:

if __name__=="__main__":
a = argparse.ArgumentParser()
a.add_argument("--image", help="path to image")
a.add_argument("--image_url", help="url to image")
args = a.parse_args()
if args.image is None and args.image_url is None:
a.print_help()
sys.exit(1)
if args.image is not None:
img = Image.open(args.image)
plot_preds(predict(model, img, target_size))
if args.image_url is not None:
response = requests.get(args.image_url)
img = Image.open(BytesIO(response.content))
plot_preds(predict(model, img, target_size))

The image_url option uses the python Requests library to easily download an image from any URL!

We’re done!

Once you put all the above code together, you have the beginnings of an image recognition system! See the complete program and example images here on Github.

The next post in our series will cover the situation where your object of interest is not one of the ImageNet ILSVRC categories:

  1. Build an image recognition system for a 1000 everyday object categories (ImageNet ILSVRC) using Keras and TensorFlow (this post)
  2. Build an image recognition system for any customizable object categories using transfer learning and fine-tuning in Keras and TensorFlow
  3. Build a real-time bounding-box object detection system for hundreds of everyday object categories (PASCAL VOC, COCO)
  4. Build a web service for any image recognition or object detection system

Additional examples

Let’s try a few more examples!

  1. python classify.py --image_url http://i.imgur.com/cg37Ojo.jpg
Image and top 3 predicted labels along with their probabilities

2. python classify.py --image_url http://i.imgur.com/4FIOwAN.jpg

Image and top 3 predicted labels along with their probabilities

3. python classify.py --image_url http://goo.gl/t3Gh5P

Image and top 3 predicted labels along with their probabilities

If you enjoyed the article, click the ❤ and sign up:

If you have any questions contact me at greg.ht.chu@gmail.comor message me on LinkedIn!

Deep Learning Sandbox

A place to explore, learn, and build using open source deep learning tools

Greg Chu

Written by

Greg Chu

Computer vision engineer

Deep Learning Sandbox

A place to explore, learn, and build using open source deep learning tools

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade