Image recognition for custom categories with TensorFlow

Published in

enpit-developer-blog

4 min readOct 25, 2017

One key area for Machine Learning is image recognition / object detection. In this post I want to show how to make use of Tensorflow pre-trained model (Inception v3) for image recognition with a retrained layer for custom categories. The approach is based on TensorFlow for Poets tutorial.

Use Case / Idea

Having an unordered heap of Lego bricks it would be nice to have a machine that is able to automatically categorize and count those bricks to answer questions like “Which Lego projects can be build?” or “Are there enough / the right bricks to build project X?”. So the first essential step to bring the idea someday to life is image detection.

Environment and Source Code

macOS 10.12.6 (not optimal for machine learning, anyway)
Tensorflow 1.2.1
ffmpeg

See Sourcecode: https://github.com/enpit/tensorflow-for-lego

Outline

Next the following typical steps have to be performed

Create and prepare training data
Choose and train a machine learning model
Evaluate trained model with new input data

Create Training Data

In Preparation for training data I am going to make some videos from different perspectives per brick and use ffmpeg to generate training images. BTW: Installing ffmpeg with brew is easy.

$ brew install ffmpeg
...
$ ffmpeg -version
ffmpeg version 3.3.3 Copyright (c) 2000-2017 the FFmpeg developers
built with Apple LLVM version 8.1.0 (clang-802.0.42)
...

The overall process is summarized in the following figure. Taking the movies as input for ffmpeg I generated for this use case about 50 images per brick category.

$ cd training-data/brick2x2
$ ffmpeg -i brick2x2.mov -vf fps=3 img%03d.jpg
..

Choosing and training the model

One of the suitable machine learning models is Inception . It is a pre-trained model on the ImageNet. TensorFlow offers a retrain script which enhances this pre-trained model by a final layer that represents our own image data set. That makes the training process pretty easy:

$ python retrain.py \
  --bottleneck_dir=bottlenecks \
  --how_many_training_steps=250 \
  --model_dir=inception \
  --summaries_dir=training_summaries/basic \
  --output_graph=retrained_graph.pb \
  --output_labels=retrained_labels.txt \
  --image_dir=training-data

After ca 90 Iterations the training accuracy reaches 1.0

Evaluate trained model

To evaluate the trained model on new data TensorFlow provides a convenient script “label_image.py” Using it on new sample images I got the following results. The probability is not very good (due to small training set I guess) but at least it classifies correctly:

Evaluation on new images of brick categories that have been trained

Using sample images of bricks that have not been trained, the model classifies no one category > 50 %.

Evaluation on new images of brick categories that have NOT been trained

Summary

I am quite impressed. There is a lot of complexity and research hidden in the available models like Inception, ResNet etc. Being able to attach a layer makes it easy to apply for custom data sets.

Sample Code: https://github.com/enpit/tensorflow-for-lego

Slides from DOAG BigData Days

For more background information and theory on machine learning checkout the slides (in german) from Slideshare https://www.slideshare.net/enpit/mit-legosteinen-maschinelles-lernen-lernen