4 Mobile Machine Learning Demos to Inspire Your Next Project

Using the Fritz AI Studio to build mobile ML experiences from the ground up

Austin Kodra
11 min readFeb 19, 2020

Introduction: The Mobile Machine Learning Lifecycle

From collecting a viable dataset and training a production-worthy model to gauging and improving its real-world performance (and beyond), there’s a whole lot to consider when working with machine learning on mobile.

For a closer look at the entire project lifecycle, you can download our free ebook.

The range of skill sets and expertise needed to take a project from an idea to an app in production is daunting. But there’s so much potential when it comes to embedding machine learning models in mobile apps.

This intersection of stark challenges and immense potential has been central to the development of Fritz AI Studio, our end-to-end platform for mobile machine learning. We’ll be using Fritz AI Studio to create and showcase 4 use cases that exemplify this potential:

  • Animal detector
  • Fingertip pose estimation
  • Logo detector
  • Bath bomb detector

We’ll dive into these use cases in more detail shortly, but first, let’s take a quick look at the core components of Fritz AI Studio that we used to build these on-device demos.

Fritz AI Studio: An Overview

Before we jump into the use cases promised in this blog’s title, let me give offer up quick summaries of each piece of Fritz AI Studio.

Specifically, the Studio includes a unified, webapp-based workflow with 3 primary components:

1. Dataset Generator

Simply put, the Dataset Generator creates accurately-labeled image datasets programmatically — a process also known as synthetic data generation. That means ML teams don’t have to spend countless hours and exorbitant amounts of money manually capturing and annotating images.

While other synthetic data platforms focus on large-scale, server-side tasks and use cases, the Fritz AI Dataset Generator targets mobile compatibility. Additionally, the Generator produces data that more closely matches what mobile-deployed models will see, which leads to better performance on-device.

2. Model Training

After a labeled dataset has been generated, it’s time to train a first version of a mobile-ready model. And with Fritz AI, we can do this quickly, without many of the normal model training pain points (i.e. advanced pre- and post-processing code, architecture selection, mobile-first optimization, and model conversion into mobile-ready formats like Core ML and TensorFlow Lite).

Additionally, Model Training is closely integrated with the Fritz SDK, allowing you to drop models directly into projects in-progress.

3. Dataset Collection System (DCS)

More than likely, the first versions of our models will benefit from some tweaking around the edges. But instead of building an entirely new dataset from the ground up, the Dataset Collection System allows us to easily collect new data based on real-world model performance.

Essentially, developers need to know both what their models actually predict and what their users expected. Our Dataset Collection System captures both, creating a data feedback loop between engineers and app users.

Teams can then leverage this new ground-truth data to retrain their models and gain insights into how end users are actually using these ML features in the wild. More data — and more accurate data — means better models and better user experiences.

Putting Fritz AI to the Test

Building this truly end-to-end solution is not a task that can be taken lightly—as such, we recently completed an internal Hackathon to put Fritz AI Studio to the test.

To help illuminate how Fritz AI Studio works and why it makes mobile ML workflows much simpler, I live-Tweeted the event—you can check out that thread here:

Hackathon Structure

To help test all pieces of the Studio, we decided to create demos for several specific use cases. Currently, the Studio supports 2D pose estimation, object detection, image labeling, and image segmentation use cases, with support for more ML tasks in the works.

Here’s a quick summary of the projects we selected, presented as “user stories”:

  • [Object Detection] As a zoo, I can train an object detection model on zoo animals so I can build an interactive app for visitors that provides more info about the animals.
  • [Object Detection] As a content app looking to build brand engagement features, I have a model to detect and recognize different brands / logos (we used Nike for demo purposes).
  • [Pose Estimation] As an engineer building filters for a content creation/social video app (think Byte or Tik Tok), I can track the position of a person’s fingertips in order to trigger AI and/or AR effects.
  • [Object Detection] As a cosmetics company, I can train an object detection model that identifies and locates bath bombs.

As a note, for demo purposes, we only worked through the collection and retraining step (step 3) with the fingertip pose estimation use case.

Use Case 1: Animal Object Detection

The idea

An animal detection app built as an interactive companion experience to a day at the zoo. The object detection model identifies and locates a variety of animals, including:

  • Elephants
  • Panda bears
  • Ostriches
  • Giraffes
  • Lions
  • Rhinos
  • Anteaters

What this model could do in an app

We could imagine our on-device animal detection model as a core component of an immersive companion app for a local zoo. Using the app, zoo visitors could point their phone camera at a given animal to detect it and access a range of interactive features, including:

  • AR overlays of animal facts, trivia, explainer videos, and other fun informational content.
  • Promotional content for relevant zoo programming, upcoming shows, etc.
  • Engaging “games” for younger visitors (i.e. find the following animals and win a prize).

Hackathon results

  • Dataset Generation: We generated a synthetic dataset of 5000 labeled images from a set of 33 seed images. These seed and generated images included a variety of the animals mentioned above.
  • Model Training: We trained the initial version of this model on the dataset of 5000 synthetically-generated images. We set the training budget to five hours, and early-stopping ended training after one hour and eleven minutes. Here’s a quick look at the model’s performance:

Use Case 2: Brand/Logo Detection

The idea

A brand engagement mobile experience that identifies and locates particular brand logos. For the sake of this demonstration, the object detection model has been trained on the Nike logo, both on and off clothing/apparel items.

What this model could do in an app

We could imagine using our logo detection model in a number of ways to foster enhanced brand engagement, and as a way of differentiating a given brand from competing brands. Specifically, our logo detector could unlock app features such as:

  • Product discounts: i.e. snap and upload a picture of a Nike logo/Nike apparel in the wild, receive a 10% discount on your next Nike store purchase.
  • AR overlays: Point the phone camera at a particular piece of Nike gear and see product info, available discounts, purchase in-app, etc.
  • Digital wish list: Allow users to build a digital wish list/shopping cart and explore Nike’s product catalog in-app.
  • Brand filters for social media: Allow users to create and engage with brand-based filters on popular social apps like Snapchat, Instagram, etc.

Hackathon results

  • Dataset Generation: We generated two synthetic datasets for this use case—one including images with the clothing item that holds the logo annotated (62 seed images; 3100 generated), and one including images with just the logo annotated (47 seed images; 2500 generated).
Left: Logo on clothes / Right: Logo only
  • Model Training: A bit of experimentation with model training for this use case. We trained versions of the model using the logo-only dataset (2-hour budget, 34-minute actual training time), the logo-on-clothing dataset (5-hour training budget, 50-minute actual training time), and on both datasets at once (5-hour training budget, 3 hours and 18 minutes of actual training time). Here’s a video that compares the performance of all three:

Use Case 3: Fingertip Pose Estimation

The idea

A mobile content creation experience that locates and tracks the movement of fingertips as a foundation for adding cool AR effects as a user moves their hands.

What this model could do in an app

Accurately estimating and tracking movement opens up a range of possibilities when it comes to building an engaging mobile experience with on-device machine learning. Specifically, we could use this ML feature to power our app with:

  • Creativity tools: Creative experiences that leverage fingertip estimation could include “magically” drawing on a canvas, create AR effects for social content (i.e. shoot lightning bolts from your fingertips for your latest Tik Tok video), and more.
  • Try-on experiences: Try on different nail polish colors, play with digital finger puppets to show support for your favorite sports team, and more—try-on experiences have become a foundational way to build brand engagement using on-device ML.

Hackathon results

  • Dataset Generation: We generated a synthetic dataset of 5000 labeled images from a set of 31 seed images.
A snapshot of the generated dataset
  • Model Training: We trained the initial version of this model on the dataset of 5000 synthetically-generated images. We set the training budget to five hours, and early-stopping ended training after 45 minutes.
  • Dataset Collection and Retraining: After deploying to a demo iOS project and testing on real-world data, we used the DCS to collect just more than 200 ground-truth images, annotated those in the webapp, and then added them to our initial synthetic dataset to retrain the model. Below is a side-by-side look at the initial model and the model retrained on collected/annotated data:

Use Case 4: Bath Bomb Detection

The idea

Identify and locate various types of bath bombs, a popular beauty and wellness product that includes “hard-packed mixtures of dry ingredients which effervesce when wet. They are used to add essential oils, scent, bubbles or color to bathwater.” — from Wikipedia

What this model could do in an app

Bath bombs, as well as other beauty and cosmetics products, have unique characteristics, including varying scents, textures, and more. As such, a bath bomb detector could be leveraged in a number of ways in a mobile app:

  • Product discounts and promo content: Snap a photo of a bath bomb and access a digital wish list, current product discounts and promotions, and share off brand loyalty across different social channels.
  • Product information: Here again, we could implement augmented reality features that digitally display product information in-store, or when bath bombs are captured in the wild.
  • Brand engagement: Similar to our logo detector (above), our bath bomb detector could be integrated with a range of brand engagement features, such as creating bath bomb filters/stickers for social apps, building a collection of acquired bath bombs, easily sharing product reviews, and more.

Hackathon results

  • Dataset Generation: We generated a synthetic dataset of 3000 labeled images from a set of 15 seed images.
  • Model Training: We trained the initial version of this model on the dataset of 3000 synthetically-generated images. We set the training budget to five hours, and early-stopping ended training after an hour and twenty minutes. Here’s a look at how this model performs in the wild:

What’s Next?

In total, working through this process, from dataset generation to model retraining (for the fingertip pose use case), took us less than two days.

That’s four mobile machine learning models built from the ground up, covering four different use cases that serve as the foundation for immersive and unique mobile app experiences.

That’s a great start, but there’s still work to be done! Three out of the four models are first versions, and could be improved by collecting and annotating ground-truth data with the Dataset Collection System. We saw how just adding and annotating a couple hundred real-world images drastically improved model performance in our fingertip estimation demo.

Additionally, we could revisit our synthetic data generation process, adding seed images that represent more data diversity (i.e. different kinds of lions for our zoo animal detector).

Or, if we’re up for a technical challenge, we could adjust the hyperparameters of our synthetic generation to introduce different image augmentations that would also help diversify and improve our datasets.

Takeaways

My hope is that you leave this blog post with a few takeaways:

  1. Mobile machine learning is no longer a pipe dream. In a relatively short amount of time, we built 4 mobile-ready models that perform pretty well and have plenty of room to improve.
  2. Mobile machine learning isn’t limited to highly-skilled teams with unlimited resources. We had a couple engineers and also a (mostly) non-technical team member build models from scratch and deploy them inside demo projects.
  3. Being able to collect ground-truth data, quickly annotate it, and use that data to retrain models is HUGE. Seeing this flow myself convinced me that this is going to change the way mobile ML models are built and deployed. This is especially true given the ability to deploy models over-the-air using Fritz AI

Getting Started

We’re just getting started, and we’ll undoubtedly be working hard to improve the UX, provide you more helpful content, and introduce new functionality and use cases to Fritz AI Studio.

The Studio is now available to all developers—with subscription plans available for various project requirements and budgets. We’re excited to see what you build.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to exploring the emerging intersection of mobile app development and machine learning. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Fritz AI, the machine learning platform that helps developers teach devices to see, hear, sense, and think. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and the Fritz AI Newsletter), join us on Slack, and follow Fritz AI on Twitter for all the latest in mobile machine learning.

--

--