Image recognition for custom categories with TensorFlow

Andreas Koop
enpit-developer-blog
4 min readOct 25, 2017

One key area for Machine Learning is image recognition / object detection. In this post I want to show how to make use of Tensorflow pre-trained model (Inception v3) for image recognition with a retrained layer for custom categories. The approach is based on TensorFlow for Poets tutorial.

Use Case / Idea

Having an unordered heap of Lego bricks it would be nice to have a machine that is able to automatically categorize and count those bricks to answer questions like “Which Lego projects can be build?” or “Are there enough / the right bricks to build project X?”. So the first essential step to bring the idea someday to life is image detection.

Environment and Source Code

  • macOS 10.12.6 (not optimal for machine learning, anyway)
  • Tensorflow 1.2.1
  • ffmpeg

See Sourcecode: https://github.com/enpit/tensorflow-for-lego

Outline

Next the following typical steps have to be performed

  • Create and prepare training data
  • Choose and train a machine learning model
  • Evaluate trained model with new input data

Create Training Data

In Preparation for training data I am going to make some videos from different perspectives per brick and use ffmpeg to generate training images. BTW: Installing ffmpeg with brew is easy.

Sample video to obtain training data
$ brew install ffmpeg
...
$ ffmpeg -version
ffmpeg version 3.3.3 Copyright (c) 2000-2017 the FFmpeg developers
built with Apple LLVM version 8.1.0 (clang-802.0.42)
...

The overall process is summarized in the following figure. Taking the movies as input for ffmpeg I generated for this use case about 50 images per brick category.

Preparing training data
$ cd training-data/brick2x2
$ ffmpeg -i brick2x2.mov -vf fps=3 img%03d.jpg
..

Choosing and training the model

One of the suitable machine learning models is Inception . It is a pre-trained model on the ImageNet. TensorFlow offers a retrain script which enhances this pre-trained model by a final layer that represents our own image data set. That makes the training process pretty easy:

$ python retrain.py \
--bottleneck_dir=bottlenecks \
--how_many_training_steps=250 \
--model_dir=inception \
--summaries_dir=training_summaries/basic \
--output_graph=retrained_graph.pb \
--output_labels=retrained_labels.txt \
--image_dir=training-data

After ca 90 Iterations the training accuracy reaches 1.0

Evaluate trained model

To evaluate the trained model on new data TensorFlow provides a convenient script “label_image.py” Using it on new sample images I got the following results. The probability is not very good (due to small training set I guess) but at least it classifies correctly:

Evaluation on new images of brick categories that have been trained

Using sample images of bricks that have not been trained, the model classifies no one category > 50 %.

Evaluation on new images of brick categories that have NOT been trained

Summary

I am quite impressed. There is a lot of complexity and research hidden in the available models like Inception, ResNet etc. Being able to attach a layer makes it easy to apply for custom data sets.

Sample Code: https://github.com/enpit/tensorflow-for-lego

Slides from DOAG BigData Days

For more background information and theory on machine learning checkout the slides (in german) from Slideshare https://www.slideshare.net/enpit/mit-legosteinen-maschinelles-lernen-lernen

Ressources

--

--