aviva vaknin
cisco-fpie
Published in
3 min readMar 17, 2021

--

Differentiating Entertainment Hardware with MLKit on Android

Introduction

While participating in a hackathon devoted to Augmented Reality (AR) experiences for technicians, I devoted several days to hacking an Android based AR app to be able to differentiate between various hardware equipment that a technician may use. Given the non-lab conditions due to COVID-19, I improvised with DVD players and cable set-top boxes available at home that have similar identifying features such as RGB and video pins, power slots, cable inputs, etc., as shown in Figure 1 below.

Figure 1: cable set-top box (left) and DVD player (right)

Images of each of the equipment were used to train and create a model usable on mobile. The model was integrated into an Android-based AR app to successfully differentiate between these pieces of hardware in a variety of settings. Most impressively, when shown a second new DVD player that was not in the training set, the app correctly identified it as a DVD player, substantiating that the model successfully learned more than the original input set.

Model Training

Model training was performed using Google Colab’s Python workbook, which is set up with the necessary training environment and is easily adapted and run using the provided set of instructions.

The data is uploaded in zip format – images need not be labeled independently or marked, rather the tagging is implicit in the directory structure. The user provides a set of subdirectories, each named with the identification label and containing a set of images of the item. Photos need not be the same size. I uploaded 30-50 images for each hardware equipment, taken with differing angles, backgrounds and lighting conditions. The same variety of backgrounds and lighting conditions were used in both data sets to insure the models did not learn environmental information. The subdirectories were named with the labels “setup_box and “dvd_player.”

Training in the workbook is accomplished using an underlying pre-trained model that includes a wide range of generic objects and then applies transfer learning to the new data images so that it can classify images & objects with one of the trained labels. It is possible to choose the underlying model from the TensorFlow hub, and I experimented with several. An excellent overview of transfer learning can be found here.

Training takes on the order of minutes for this small number of images and produces a TensorFlow Lite model—a lightweight inference model tailored for mobile and IoT on-device use.

Putting it all together

I downloaded the generated TensorFlow Lite model and injected it into Google’s MLKit-AutoML vision demo app along with an option to select it.

Running the AR app on a Pixel 4 and pointing the camera at the various hardware, the app was able to isolate and correctly identify the original equipment in a variety of settings, as seen in Figure 2.

Lastly, I pointed the camera at a third piece of equipment that had not been included in the training set — a DVD player — and stacked it among blocks, with obscuring chords, providing a new background. This, too, was correctly identified as a DVD player, verifying that the actual object were “learned” as shown in Figure 3.

Figure 3: New DVD player correctly identified

The accessible workflow and impressive results demonstrate eye-opening advances in available AR and ML training tools.

--

--