Going from two classes to three with Skafos.ai and TuriCreate

Miriam Friedel
8 min readFeb 20, 2019

--

A brief tutorial on Image Classification for iOS Developers

Update: 5/30/2019

The example in this post was written for use with our legacy Skafos platform. While much of the content will still be useful, as of 5/29/2019, the included code may not run as written.

Sign-up for an account on our new platform, join our Slack community, and explore some example models and apps(https://github.com/skafos) to learn more.

Here, we will walk you through how to get started with Skafos Quickstart models, adding the right libraries, importing additional data, and creating labels so you can customize the model to suit your use case.

Introduction

Here at Skafos.ai, one thing we are passionate about is making machine learning more accessible to non-data scientists. Machine learning is not magic — it is simply a tool, and it is one that we should all feel empowered to use. But…what if you’ve never used ML before? Where do you even begin? It is exciting to think and talk about using the latest convolutional neural network to power your next app, but…what if you aren’t even sure what that is, or what data to use to get started?

In building our platform, we’ve been fortunate to have feedback from a group of phenomenally talented iOS developers. The feedback we’ve consistently gotten is: “I want to use ML, but I don’t know what I’m doing. I know I need to combine data to build a model but I don’t even know where to start.” To be sure, data ingestion and wrangling is a huge challenge when it comes to successfully leveraging machine learning. A model is only as good as the data that goes into it, and for those of us who aren’t fluent in building machine learning models, even knowing where to begin can be daunting.

In this tutorial, I am going to walk you through how to add data to an existing model, retrain it, and deliver the output to your phone, all using Skafos. Though seasoned data scientists are welcome to read and follow along, this tutorial is not designed for you. Are you an iOS developer who just needs to get started and understand how to iterate and train a model? Read on.

As part of the Skafos.ai platform, we’ve included a number of different Quickstart models and app templates to get you up and running quickly. Our CEO, Michael Prichard, has created a video tutorial of this process to show you how to get an ML-powered app up and running in minutes. In this tutorial, he demonstrates our Quickstart Image Classifier, which identifies cats and dogs.

Follow along as we modify this existing model to include rabbits. By the end, I want you to feel armed with two things:

  1. An understanding of how to combine data sources to use for model training
  2. An understanding of how to extend and modify one of our Quickstart projects and make it your own

Getting Started

The first thing we need to do is create a Quickstart project using a pre-built Image Classifier model. To do this, you can either follow along with Michael’s video or follow our Quickstart Guide for iOS developers. If you follow the Quickstart guide, be sure to choose Image Classifier from the list of available projects. At the completion of either the video or the Quickstart guide, you should have the Skafos Image classification app on your phone, and it will identify either cats or dogs.

Once you’ve created this project, you can access it upon login from the “All Projects” page. Click on the name of the project in your dashboard.

This will take you to the main project page. From there, click the Launch JupyterLab button. This will spin up a Skafos JupyterLab instance, and take you to the code that was used to build the cats + dogs model. From there, we will modify it to include rabbits.

When you first launch the JLab, you will see the following files in the left hand pane.

Double click image_classifier.ipynb to open it. Note that this code was forked from github.com/skafos/TuriImageClassifier, but this copy is now yours.

Adding the right libraries

Because we are going to be pulling in additional data, we need to make two minor adjustments before proceeding. First, double click the requirements.txt file to open it. (It will appear in the left pane, as shown in the Figure above)

The following requirements are included:

skafossdk==1.1.9
turicreate==5.2.1
coremltools==2.1.0
numpy==1.16.1

Add the following line to the end of this file:

s3fs==0.1.6

Next, you will need to modify the second cell in the notebook, the one that imports the necessary libraries. Add import os and from s3fs.core import S3FileSystem, so the cell looks like this:

Now, you can run the first three cells in the notebook, in order, to install these needed libraries, import them, and initialize an instance of the Skafos SDK.

Importing Additional Data

The first thing that we need to do to change our model from dogs + cats to dogs + cats + rabbits is to include some photos of rabbits in the training data.

In the existing code, we pull zipped data from a public S3 bucket, and unzip it. This creates a copy of the PetImages data in this JLab environment, with two subfolders, Cats and Dogs. These are filled with images of (wait for it!), cats and dogs respectively.

The first thing I want you to do is go ahead and delete this cell. Yes, really. We are going to replace this code with new code that takes images of dogs, cats, and rabbits from a public S3 bucket. The main reason for this is that the code above pulls in 25,000+ images, and I want us to use a smaller subset for this example. (Spoiler alert: the training will go faster.)

Before we begin, there are a couple of notes:

  • The Google Open Images Data Set is an excellent place to find data for training both image classification and object detection models. The rabbit images that I am using in this example came from this source. A helpful open source repo for pulling data from Google Open Images is the OIDv4 Toolkit. You are welcome to use other open source images as you wish, or take your own. This is one example.
  • For any image classifier, once you have identified a source of photos, you will need a place to put them. I recommend S3 buckets.
  • For purposes of this tutorial, I have created a public S3 bucket with a subset of the Google Open Images rabbit photos, as well as a subset of the Cats and Dogs available in the original PetImages data set. This S3 bucket is included in the code below, and you are welcome to pull data from it if you wish. Note that only a very small subset of the images is included for purposes of this tutorial. More images will improve model performance.

After you have deleted the code in the cell above, replace it with the following code:

This code will pull images of dogs, cats, and rabbits from the listed S3 bucket, and place them in the appropriate sub-directories of PetImages. After inserting this code, execute this cell.

I want to pause here and note that you could do this with images other than rabbits, or add a fourth category. You could also eliminate dogs and cats all together and use three different animals (horses, cows, chickens?). The main point is that so far, all we have done is taken previously curated images and downloaded them to three different sub-directories. That’s it. You’ve got this.

Creating Image Labels

There is one more change we need to make to the existing code before we can train our new model. The next cell present should be the one that starts with:

data = tc.image_analysis.load_images(‘PetImages’, with_path=True, ignore_failure=True)

This line of code loads all of the images into a TuriCreate SFrame. The second line of this code block is the one that assigns labels to each image, and does it based on the directory each image is in.

# From the path-name, create a label column. This labels each image as either a dog or a catdata[‘label’] = data[‘path’].apply(lambda path: ‘dog’ if ‘/Dog’ in path else ‘cat’)

Replace the existing code with the following:

data[‘label’] = data[‘path’].apply(lambda path: ‘dog’ if ‘/Dog’ in path else (‘cat’ if ‘/Cat’ in path else ‘rabbit’))

This will make sure the rabbits are actually labeled as rabbits, because this label is the target for training.

The final line of this cell randomly partitions the data into what will be used for training (80%), and what will be held out for model testing. It can be left as is, and you can go ahead and run this cell. For more information about train/test splits, this is a useful introduction.

That is it! You may now uncomment and run the rest of image_classifier.ipynb without any modifications. Yes, really. Because you have changed the input data, running the following code

model = tc.image_classifier.create(train_data, target=’label’)

now builds a model that identifies dogs, cats, and rabbits. This is one of the excellent features of the TuriCreate framework; it simplifies the process of going from a binary to multi-class image detection.

Once your model is done (and this may take a bit of time, depending on the number of images you choose to include), uncomment and execute the final cell, to save your model to Skafos and automatically push it to your app.

Open the app on your phone, and snap some photos. Although you did not need to rebuild or change anything about your app, you can now identify cats, dogs, and rabbits:

Hopefully this has given you a sense of how to take an example model and data, make adjustments, and retrain it to give a different result. Machine learning is new, but it doesn’t have to be hard or confusing.

We’d love to have you join our Slack community and Skafos subreddit to continue the conversation.

--

--

Miriam Friedel

Mom, Data Scientist, Physics PhD. Currently: Senior Director, Machine Learning Engineering at Capital One. Previously: Director of Data Science at Skafos.