How to build an image recognition system using Keras and Tensorflow for a 1000 everyday object categories (ImageNet ILSVRC)
Go straight to the code on Github.
In this series of posts, I will show you how to build your own recognition or detection/bounding box prediction web service in just a few lines of code using Keras, TensorFlow, and the python requests library. The post series is as follows:
- Build an image recognition system for a 1000 everyday object categories (ImageNet ILSVRC) using Keras and TensorFlow (this post)
- Build an image recognition system for any customizable object categories using transfer learning and fine-tuning in Keras and TensorFlow
- Build a real-time bounding-box object detection system for hundreds of everyday object categories (PASCAL VOC, COCO)
- Build a web service for any image recognition or object detection system
What is it you want to recognize?
There are 3 popular academic competitions in the field of computer vision that have been tremendously impactful: ImageNet ILSVRC, PASCAL VOC, and COCO . These competitions have propelled inventions in computer vision research, and many are available for free and unrestricted use. For this post, I will focus on image recognition using ImageNet ILSCVRC.
Take a look at the the ILSVRC object list. If the particular objects you’re interested in recognizing are one of the 1001 objects in that list, you’re in luck! Here is an excerpt of the list of object categories:
What if you’re object of interest is not on that list, or is a significantly different setting like medical image analysis? I will cover an extremely valuable approach called transfer learning and fine-tuning in the second post.
Image Recognition
What is image (or object) recognition? It answers the question: “what objects are depicted in this image?” This could be useful if you would like to tag images based on content, identify what food is on your plate, classify between images containing cancer or non-cancer, and many more applications.
Keras and TensorFlow
Keras is a high-level neural network library that serves as an easy-to-use abstraction layer on top of the numerical computation library TensorFlow. It even provides access via its keras.applications
module to ILSVRC competition-winning convolutional network models like ResNet50 (developed by Microsoft Research) and InceptionV3 (developed by Google Research) for free and unrestricted use. To install, follow the instructions at:
- Keras installation: https://keras.io/#installation
- TensorFlow installation: https://www.tensorflow.org/install/
Implementation
To go straight to the full program, check out the github.
Our end goal is to write a small python program with argument options of either 1. a path to a local file or 2. a URL to an image. Here is the example usage using a photo of an African elephant.
1. python classify.py --image African_Bush_Elephant.jpg
2. python classify.py --image_url http://i.imgur.com/wpxMwsR.jpg
The output will look like:
Prediction function
To start, let’s load the keras.preprocessing
and the keras.applications.resnet50
modules (resnet50 paper: Deep Residual Learning for Image Recognition), and load the ResNet50 model using weights that have been trained on the ImageNet ILSVRC competition:
import numpy as np
from keras.preprocessing import image
from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictionsmodel = ResNet50(weights='imagenet')
Then we can define a predict
function:
def predict(model, img, target_size, top_n=3):
"""Run model prediction on image
Args:
model: keras model
img: PIL format image
target_size: (width, height) tuple
top_n: # of top predictions to return
Returns:
list of predicted labels and their probabilities
"""
if img.size != target_size:
img = img.resize(target_size) x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
return decode_predictions(preds, top=top_n)[0]
Note that to use the ResNet50 architecture, target_size
must equal (224, 224)
. Many CNN architectures have a fixed input size and ResNet50 is one such architecture, where the inventors used a fixed size input of (224, 224)
.
image.img_to_array
: converts a PIL format image to a numpy array
np.expand_dims
: converts our (3, 224, 224)
size image to (1, 3, 224, 2 24)
. The reason for this is that the model.predict
function requires a 4 dimensional array as input, where the 4th dimension corresponds to the batch size. That means, if we wanted to, we could classify multiple images at once.
preprocess_input
: zero-centers our image data using the mean channel values from the training dataset. This is an extremely important step that, if skipped, will cause all the predicted probabilities to be incorrect. This mean centering is what’s called data normalization, a fundamental concept in machine learning.
model.predict
: runs inference on our data batch and returns predictions
decode_predictions
: takes the coded labels associated with model.predict
and returns human-readable labels from the ImageNet ILSVRC set.
The keras.applications
module provides 4 off-the-shelf architectures: ResNet50, InceptionV3, VGG16, VGG19, XCeption. We arbitrarily chose ResNet50, but you are free to swap that out with any of the other off-the-shelf architectures. Checkout https://keras.io/applications/ for additional information and references.
Plotting
We can use matplotlib
to print the output in a horizontal bar graph like so:
def plot_preds(image, preds):
"""Displays image and the top-n predicted probabilities
in a bar graph
Args:
image: PIL image
preds: list of predicted labels and their probabilities
"""
#image
plt.imshow(image)
plt.axis('off')
#bar graph
plt.figure()
order = list(reversed(range(len(preds))))
bar_preds = [pr[2] for pr in preds]
labels = (pr[1] for pr in preds)
plt.barh(order, bar_preds, alpha=0.5)
plt.yticks(order, labels)
plt.xlabel('Probability')
plt.xlim(0, 1.01)
plt.tight_layout()
plt.show()
Main
In order to have this command line usage:
1. python classify.py --image African_Bush_Elephant.jpg
2. python classify.py --image_url http://i.imgur.com/wpxMwsR.jpg
We’ll define a main function as follows:
if __name__=="__main__":
a = argparse.ArgumentParser()
a.add_argument("--image", help="path to image")
a.add_argument("--image_url", help="url to image")
args = a.parse_args()if args.image is None and args.image_url is None:
a.print_help()
sys.exit(1)if args.image is not None:
img = Image.open(args.image)
plot_preds(predict(model, img, target_size))if args.image_url is not None:
response = requests.get(args.image_url)
img = Image.open(BytesIO(response.content))
plot_preds(predict(model, img, target_size))
The image_url
option uses the python Requests library to easily download an image from any URL!
We’re done!
Once you put all the above code together, you have the beginnings of an image recognition system! See the complete program and example images here on Github.
The next post in our series will cover the situation where your object of interest is not one of the ImageNet ILSVRC categories:
- Build an image recognition system for a 1000 everyday object categories (ImageNet ILSVRC) using Keras and TensorFlow (this post)
- Build an image recognition system for any customizable object categories using transfer learning and fine-tuning in Keras and TensorFlow
- Build a real-time bounding-box object detection system for hundreds of everyday object categories (PASCAL VOC, COCO)
- Build a web service for any image recognition or object detection system
Additional examples
Let’s try a few more examples!
python classify.py --image_url http://i.imgur.com/cg37Ojo.jpg
2. python classify.py --image_url http://i.imgur.com/4FIOwAN.jpg
3. python classify.py --image_url http://goo.gl/t3Gh5P
If you enjoyed the article, click the ❤ and sign up:
If you have any questions contact me at greg.ht.chu@gmail.com
or message me on LinkedIn!