Getting started with Image Recognition and Convolutional Neural Networks in 5 minutes

Jorge Rodríguez Araújo
Abraia
5 min readMay 31, 2018

--

AlexNet Image Classification example from [Krizhevsky 2012]

Starting to develop solutions involving Deep Learning is becoming easier and easier. In a recent move, Google has launched Colaboratory, a free cloud service where you can develop deep learning applications on GPU for free.

Colaboratory is a pre-configured Jupyter notebook environment that runs entirely in the cloud, without having to download, install, or run anything on your own computer. Moreover, notebooks are stored in Google Drive and can be shared just as you would do it with Google Docs.

The environment supports Python for code execution, and has pre-installed TensorFlow, the open source framework for Deep Learning released by Google by the end of 2015. If you are not used to Python, it is a good time to learn about, since it is the most popular programming language for AI.

Collaboratory notebook running a CNN for image recognition

You can directly go to the Colaboratory notebook which we have made for this article and test your images running the code following Runtime > Run all and uploading the file in the fifth step.

Image recognition and CNNs

Image recognition is the problem of identifying and classifying objects in a picture— what are the depicted objects? — . Possibly, the most straightforward application is automatic image tagging for web content management. Recognising automatically people, garments, food, pets, and whatever is relevant can be very handy when it comes to manage and curate large sets of images, from ecommerce to blogging.

The field of machine learning has made a huge progress on addressing these difficult tasks. In particular, thanks to a kind of model called Convolutional Neural Network (CNN). They make up a class of deep neural networks inspired by biological processes taking place in the visual cortex. Individual neurons respond to stimuli only in a restricted region of the visual field that partially overlaps the region of close neurons, collectively covering the entire visual field.

Macroarchitecture of VGG16

As a result, CNNs learn to respond to different features in the image (edges, shapes, and so on) as filters — that in traditional algorithms were hand-engineered — . The fact that they overcome the need of prior knowledge and human effort in feature design is a major advantage of CNNs.

Feature Visualization of Convnet trained on ImageNet from [Zeiler & Fergus 2013]

There are lots of CNN architectures available for free and unrestricted use that can achieve reasonable performance on hard visual recognition tasks. For instance, Keras — a high-level neural network library that serves as an easy-to-use abstraction layer on top of Tensorflow — provides access to some competition-winning (ImageNet ILSVRC) CNNs like ResNet50 (developed by Microsoft Research) or InceptionV3 (developed by Google Research), ready to recognize 1000 common objects (ILSVRC object list).

Getting started with Colaboratory

To start playing with image classification we just need to access Colaboratory and create a new notebook, following New Notebook > New Python 3 Notebook, and start installing Keras.

Apart from writing code, Colaboratory allows to write shell commands preceded with a ‘!’ within the notebook. It gives us control over the Virtual Machine (VM) to install other non default packages.

Loading and running the model

Any image recognition model available in Keras can be loaded with just two lines of code. This automatically download the weights of a pre-trained model using the Imagenet dataset, ready to classify 1000 common objects.

The keras.applications module provides some common off-the-shelf architectures like VGG16, ResNet50, InceptionV3 or MobileNet. We have chosen Inception-v3 architecture from Google because is one of the best, with a top-5 error rate of 3.46%. This measures how often the model fails to predict the correct answer as one of their top 5 guesses — top-5 error rate — .

Prediction function

To start to make predictions we just need to prepare the input image and decode predicted output. For this, we are going to write the predict helper function.

First at all the image has to be resized to the Inception_v3 fixed input size, in this case target_size=(299, 299), other networks like VGG16 or ResNet50 must be equal to (224, 224). The image.load_img function from keras.preprocessing directly loads the image from the img_path and resizes the image to the specified target_size.

The next step is to convert the image img to a numpy array with image.img_to_array and with np.expand_dim to change the shape of the array from (3, 299, 299) to (1, 3, 299, 299) — one image with 3 channels RGB and size 299 by 299 — . This is because the model.predict function requires a 4 dimensional array as input, which means that it could classify multiple images at once.

The last step before prediction is preprocess_input, this performs data normalization to zero center the image data using the mean channel values from the training dataset. This is an extremely important step that, if skipped, will cause all the predicted probabilities to be incorrect.

Finally, we runs inference with model.predict and decode_predictions returning the human-readable labels from the ImageNet ILSVRC and the predicted probability.

Loading the image and prediction

We have defined our image recognition system and only need to load our own image to start playing with this piece of toy. For this, we directly use a code snippet that can be found by clicking the little black button on the top left, under the menu.

Loaded the image, we make the prediction and show results.

We use matplotlib to show the input image and the predicted output in a horizontal bar graph.

Conclusions

This is a glimpse on how easy can be now to start predicting objects with image recognition and Convolutional Neural Networks. But this is just the beginning because we can customize one of this networks to predict with high accuracy our own object classes using transfer learning.

In the next article, we introduce transfer learning for custom image classification.

--

--