Using TensorFlow for Object Recognition

Krishna Teja
spawn-ai
Published in
4 min readNov 7, 2016

Our brains can comprehend things so well that it makes vision seem very easy. It doesn’t take any time for a human to detect an anomaly, or identify the difference between a bus and a car, or to detect and recognize a human face, but it is incredibly hard for a computer to learn how to detect and recognize an object as easy as a human brain.

In the last couple of years researchers have made tremendous progress on addressing this problem. They have come up with a solution using deep convolutional neural networks, a model which can perform hard visual recognition tasks which are close to or sometimes even better than the human brain.

Convolutional Neural Networks, is a black box that constructs features we would otherwise have to handcraft ourselves, hence to create one it takes very high computing power and a lot of time. These abstract features that are created by it are very generalized, which accounts for variance and which again if a human is involved would result in a lot of effort and human errors.

Comes into play Tensor flow, an open source google project released last year. In tensor flow Inception is a Convolutional Neural Networks model which is pre-trained on 100k images into about 1000 categories. When we feed in an image to the model the data is passed through a series of layers on which there are different operations that are performed until it outputs a label and a classification percentage. Each layer this data goes through is different in terms of abstraction. In the first layer it learns to do some edge detection or line detection, shape detection in the middle layer, and abstraction increases as you go deeper. The last few layers are the highest level detectors for the image that will be fed to the model.

So what if the object to be recognized is not in the categories that inception was pre-trained on. Well there is always a hack to everything. We can perform a process called Transfer Learning, where the previous leaning session is added to a new training session. This means that we just have to train the last layer of the inception model with the features of the object to be recognized. Simple isn’t it?

So all that has been discussed above in 3 simple steps:
1. Install Tensorflow
2. Re-train the image classifier
3. Run the classifier and check out the result

I. Install Tensorflow
Installing tensor flow is neatly documented here. Follow the blog for installation and setup. I have installed tensor flow on virtualenv and hence the rest of the article has been

II. Retraining the image classifier
Being a Trekkie I really wanted to see how my application can identify Spock from the images that are being fed to it.
So I have trained inception with a set of Spock images along with a few other characters from Star Trek, Star Wars and Dragon Ball Z for training, by running {directory into which the tensorflow repo was clones into}/tensorflow/tensorflow/examples/image_retraining/retrain.py with the following parameters.

— image_dir=~/data/tf_files/spock/
— bottleneck_dir=~/data/tf_files/
— how_many_training_steps=1000
— model_dir=~/data/tf_files/inception
— output_graph=~/data/tf_files/retained_graph.pb
— output_labels=~/data/tf_files/retained_labels.txt

A successful run means a trained data set is ready for u to experiment on.

III. Run the classifier and check out the result
For the classifier we can quickly right some code to test out how well inception works for the transfer learning that we just provided it.

__author__ = 'krishnateja'

import
tensorflow as tf, sys

image_path = '~/data/tf_files/images/spock/spock_recognize.jpg'

#Read in the image data
image_data = tf.gfile.FastGFile(image_path, 'rb').read()

#Load label and strip off carriage return
label_lines = [line.rstrip() for line in tf.gfile.GFile('~/data/tf_files/retained_labels.txt')]

#unpersist graph from file
with tf.gfile.FastGFile('~/data/tf_files/retained_graph.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')

with tf.Session() as sess:
# Feed the image_data as input to the graph and get first prediction
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, {'DecodeJpeg/contents:0': image_data})
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]

for node_id in top_k:
human_string = label_lines[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))

Run this piece of code on the trained data set to achieve:

spock (score = 0.98361)
kirk (score = 0.01283)
darth vader (score = 0.00195)
goku(score = 0.00161)

NOTE: You might not end up getting the exact same numbers. I have tried a little to confuse the system and added some data where Kirk and Spock were together in a few images. It is worth a try to know how to confuse the system. I would encourage you to play around with that. Let me know if you have any questions. You can reach me on LinkedIn.

Thanks to Google for open sourcing this amazing tool.

There are some really amazing tutorials and videos online. The one that I followed and felt was really useful for learning tensor was this.

Make better use of this open source tool by creating some amazing applications.

Keep practicing, keep reading, and most of all, keep iterating.

As Spock says: “ LIVE LONG AND PROSPER

--

--