Image recognition for custom categories with TensorFlow
One key area for Machine Learning is image recognition / object detection. In this post I want to show how to make use of Tensorflow pre-trained model (Inception v3) for image recognition with a retrained layer for custom categories. The approach is based on TensorFlow for Poets tutorial.
Use Case / Idea
Having an unordered heap of Lego bricks it would be nice to have a machine that is able to automatically categorize and count those bricks to answer questions like “Which Lego projects can be build?” or “Are there enough / the right bricks to build project X?”. So the first essential step to bring the idea someday to life is image detection.
Environment and Source Code
- macOS 10.12.6 (not optimal for machine learning, anyway)
- Tensorflow 1.2.1
- ffmpeg
See Sourcecode: https://github.com/enpit/tensorflow-for-lego
Outline
Next the following typical steps have to be performed
- Create and prepare training data
- Choose and train a machine learning model
- Evaluate trained model with new input data
Create Training Data
In Preparation for training data I am going to make some videos from different perspectives per brick and use ffmpeg to generate training images. BTW: Installing ffmpeg with brew is easy.
$ brew install ffmpeg
...
$ ffmpeg -version
ffmpeg version 3.3.3 Copyright (c) 2000-2017 the FFmpeg developers
built with Apple LLVM version 8.1.0 (clang-802.0.42)
...
The overall process is summarized in the following figure. Taking the movies as input for ffmpeg I generated for this use case about 50 images per brick category.
$ cd training-data/brick2x2
$ ffmpeg -i brick2x2.mov -vf fps=3 img%03d.jpg
..
Choosing and training the model
One of the suitable machine learning models is Inception . It is a pre-trained model on the ImageNet. TensorFlow offers a retrain script which enhances this pre-trained model by a final layer that represents our own image data set. That makes the training process pretty easy:
$ python retrain.py \
--bottleneck_dir=bottlenecks \
--how_many_training_steps=250 \
--model_dir=inception \
--summaries_dir=training_summaries/basic \
--output_graph=retrained_graph.pb \
--output_labels=retrained_labels.txt \
--image_dir=training-data
After ca 90 Iterations the training accuracy reaches 1.0
Evaluate trained model
To evaluate the trained model on new data TensorFlow provides a convenient script “label_image.py” Using it on new sample images I got the following results. The probability is not very good (due to small training set I guess) but at least it classifies correctly:
Using sample images of bricks that have not been trained, the model classifies no one category > 50 %.
Summary
I am quite impressed. There is a lot of complexity and research hidden in the available models like Inception, ResNet etc. Being able to attach a layer makes it easy to apply for custom data sets.
Sample Code: https://github.com/enpit/tensorflow-for-lego
Slides from DOAG BigData Days
For more background information and theory on machine learning checkout the slides (in german) from Slideshare https://www.slideshare.net/enpit/mit-legosteinen-maschinelles-lernen-lernen