Create custom image classification models with AutoML Vision Edge

Firebase
Firebase Developers
4 min readOct 15, 2019

--

Posted by Sachin Kotwani

One of the most impressive uses of Machine Learning is giving computers the ability to “see”. This technology is used in a variety of applications, ranging from machines used to sort vegetables, to systems that power self-driving cars, and much more. More recently, advances in chips and ML runtimes have made it possible to run models that once required high computing power or cloud-based servers to run fully on lower-powered devices, such as mobile phones.

Getting started with Machine Learning can be challenging for those without prior experience in the space. ML Kit, which is offered as part of Firebase, makes it easy to bring ML technology to mobile applications. It includes the following tools and services:

  • Base APIs with pre-trained models that are ready to use from your app (e.g. text recognition, face detection, object detection, etc.)
  • Custom model serving, to dynamically deploy your own custom TensorFlow Lite models to your users’ devices, and
  • AutoML Vision Edge, to easily create custom image classification models that run on-device based on your own training data (i.e. images). No Machine Learning experience required!

In this post we’ll focus on using AutoML Vision Edge to create a custom, on-device image classification model. We recently walked through this use case during the 2019 Firebase Summit, and you can watch the presentation here.

Let’s say we have a mobile app that includes a flow for users to submit a specific type of documentation. This is a common use case for banking and financial apps that require a form of ID before obtaining services. It would be great if the app could do some initial client-side validation, making sure the document is of an acceptable type (e.g. a driver’s license, business card, etc.), before it gets submitted for further processing by a backend office. We could start by using the pre-trained Image Labeling API, but the answers will likely be too generic for our purposes (e.g. document, paper, smile). What we need is a custom model that can distinguish between the different types of documents relevant to us.

Driver’s License Images Source: Alabama Law Enforcement Agency; NJMVC

AutoML Vision Edge to the rescue!

Creating a custom image classification model with AutoML Vision Edge and ML Kit is straightforward. You don’t require any ML expertise, and you don’t even have to write a single line of code to generate the model. Here are the steps.

  1. Get labeled training data. First, you’ll need to collect the necessary training images of each type. These will be used to “teach” the machine learning model how to identify them. For best results, include multiple angles, resolutions and backgrounds for each class. In our simple example, we created three categories: “Driver’s License”, “Business Card”, and “Other” (as a catch-all for random images). If you need help collecting images you might find the open source “Custom Image Classifier” project useful.

2. Upload the data to the Firebase console. Next, place the images of each type of document in a separate named folder (e.g. all images for “business_card” are under a folder of that name, and so on). Then, zip all the folders into a single archive, which you will upload as a new dataset in the console.

Upload data to the Firebase console

3. Train the model. Choose the model type (size vs accuracy) and the number of hours to train (more images will require a longer training time). See here for a rough guideline.

4. Evaluate the model. Once the model has been trained, review statistics such as accuracy and recall. If not satisfied you can train the model again with more images and/or longer training duration. You can even test the model in the console by uploading a test image.

ML Kit model on Firebase

5. Implement the model in your mobile application. Once satisfied, use the provided client libraries (iOS or Android) to use the model. The model can be bundled with your app and/or deployed to be hosted in the cloud. Hosted models are downloaded to the user’s device the first time they are used, but are still run fully on-device.

Mobile application showing driver’s license
Source: NJMVC

For more detailed implementation steps, take a look at this documentation.

--

--