Create Jian-Yang’s Hot Dog App on Android with CameraX, TensorFlow Lite, and MobileNet
What would you say, if I told you there is an app on the market that can recognize hot dogs and not hot dogs in the pictures, in real-time?
We all stand on the shoulders of giants. I am no different.
If you’re reading this article, the chances are you are already familiar with the genius of Silicon Valley’s Jian-Yang. If you are not, however, or in case you happened to have forgotten about his arguably greatest creation- an app able to classify images as hot dogs and not hot dogs- here’s a little treat to refresh your memory:
It took humanity mere 3.5 years to reach a stage at which this once outlandish idea, something that seemed to be taken straight out of a sci-fi movie at the time, could be realized. But alas! The day has come at a long last and the technology is here! Today you, dear reader, are here to witness the future happen right before your very own eyes.
Yes, I do realize it was probably done before a thousand times, ask me if I care though.
We are not going to limit ourselves to simply copying the SEEFOOD app; taking pictures to classify them is so démodé. We are going to take it to the next level by pulling off something that even a visionary like Jian-Yang could not have dreamt of. We are going to classify camera frames as hot dogs and not hot dogs in real-time.
To this end, we are going to use a combination of CameraX for a live feed from the device’s camera, pre-trained MobileNetV1 model as the image classifier (this is where all the magic happens), and TensorFlow Lite to run the model directly on the android device.
I will sprinkle this article lavishly with code snippets but if that’s not enough for you, you can also find full source code on GitHub, at the bottom of the article. Forgive me for not implementing this as a multi-module, clean-architecture, MVI, RxJava-heavy, Dagger-haunted app; I will leave all that as an exercise for the reader.
Let’s get the camera out of the way first. I am not going to dive into the details of implementing a camera preview using the CameraX library since I have already covered it here (premium article warning). The code needed for the hot dog app will be almost identical so you can use the previous article for a step-by-step setup, or simply refer to the source code for this article. We are going to run the inference (hot dog recognition) from a class implementing
ImageAnalysis.Analyzer interface; this will be the only CameraX code we will refer to. An instance of this class will receive camera preview frames and process them one at a time.
Here comes the good stuff (unless you came here just to watch the Silicon Valley YouTube clip, can’t blame you if you did): the deep neural network that does the actual hot dog (and not hot dog, naturally) recognition in the images.
For this app, we are going to be using a quantized MobileNetV1 model- a pre-trained, general-purpose image classifier, optimized to run efficiently on mobile devices. Quantization, in the context of deep learning, is a post-training model optimization technique. It is used to reduce the model’s size and increase its efficiency at the price of a slight precision drop. In other words, it is exactly what we want when dealing with mobile hardware.
If you happen to have myriads of hot dog images at hand (not judging), you can of course also train your own neural network instead of relying on the pre-trained version of MobileNet.
MobileNetV1 can be plugged in straight out of the box, as it has already been trained for us by our friends at Google (hey, Google). It is also very easy to use and interpret its output, even with little or no experience with deep learning. When given an image, the model will generate a probability distribution over all supported image categories, thanks to softmax being its activation function. Luckily for us, one of the categories the model has been trained on is, you guessed it,
You can download the said model here. It already contains the labels needed to match the model’s output (numbers) to human-readable categories, such as
hotdog or (lukewarm)
cat . The size of the model is just over 4 MB; compare that to the size of the non-quantized model, which is roughly 4 times bigger at 16 MB.
Time to finally get our hands dirty with some Gradle code. Let’s start and create a script that will download the model and place it inside the
assets directory. Create a
download.gradle file in your main module with the following code:
For this snippet to work, we also need to include a download plugin, tell Gradle not to compress the model, and finally trigger the download at the compilation time. Add the following snippets to your main module’s
Before you consider me a Gradle wizard, I need to confess. As any half-decent developer would do, I proudly borrowed this part of code from the official TensorFlow sample app.
TensorFlow Lite is a Google library for running machine learning models directly on mobile devices. Let’s dive right into the thrilling world of Gradle dependencies and include it in our project:
For our use case, we will be interested mostly in TensorFlow Lite Task Library, ImageClassifier more specifically. It brings a sweet and easy-to-use API for classifying hot dog images (other images as well, I suppose), using a model of our choice.
To prove my point, let’s create a class that will do (almost) all the hard work in just over 40 lines of code (or roughly 554, if we were to use Java):
The code is pretty self-explanatory, said every developer ever, but let me show you around just in case.
Let’s start with the class’ only public function,
getHotDogScore . It takes an instance of
TensorImage class (we will come back to that in a bit) and does the actual classification using the model we have downloaded-
MODEL_PATH matches the name we defined earlier in
download.gradle file. We define a score threshold in
classifierOptions to be 0.05 or 5% confidence. In other words, if the model recognizes a hot dog in the image and is only 4.99% confident about it, we won’t be notified.
Each category identified by the model, processed inside of
Classifications.toHotDogScore() function, has two properties:
score. In our universe, only two types of entities exist and so everything can only be either a hot dog or a not hot dog. If we are lucky enough to encounter a hot dog, we return the score, otherwise 0.
Fun with image formats
Sadly, not everything is rainbows and butterflies and unicorns and kittens in the world of hot dog image classification. CameraX gives us access only to
android.media.Image objects, which are not directly compatible with TensorFlow Lite. Somehow, someway, we need to convert it to a
Bitmap and then to a
TensorImage. After some serious developer work (combing through Stack Overflow) and having little luck with the first few solutions, I finally came across one that worked like a charm; thank the internet for smart people. You can find this code in
Tying up the loose ends
Now that we have the classification code ready, we simply have to call it whenever we get a new camera preview frame. That will be handled by
HotDogImageAnalyzer class, implementing
ImageAnalysis.Analyzer interface that I have hinted at earlier.
All it does is call the classifier and forward the result to the score listener, implemented by our activity, that handles displaying the score as a percentage. Remember to close the
imageProxy at the end to let CameraX know we’re done processing the current frame and are ready for the next one.
I won’t go into details about wiring the camera preview and coding the activity in this article. Stay assured however that I made my best to keep the UI on par with Dinesh’s brilliant SEEFOOD design.
And now for the grand finale, hot dog recognition in the wild:
The days of hot dogs being confused for not hot dogs are finally over.
Thanks for your time and if you made it this far, I have to reward your commitment to the hot dog recognition cause with the full source code for the project. Of course, it is not meant to be a fully polished piece of software but a starting point. (Apparently asking people to clap under your article on Medium is a huge faux pas so I totally won’t do it).