Object Detection with the Open Images Dataset, TuriCreate, and Skafos

One of the great things about trying to do Machine Learning these days is that there are myriad frameworks to choose from, most of them simplifying the model building process immensely. You no longer have to build a neural network from scratch or write a lot of code to separate out a validation data set; we live in a time of powerful tools and compute resources, and using them is fun.

Unfortunately, you still have to identify the right training data and wrangle that data into the right format to use these frameworks in the first place. This is still a giant pain, one that is particularly acute for object detection models, which require both images and bounding boxes. Oh, and also? The bounding box coordinates need to be specified in the right order and with the right coordinate system to work properly with the framework you are using.

This tutorial walks you through the process of taking data from Google’s Open Images Dataset and adjusting the bounding box coordinate system for use within the Turi Create Framework. We will do this using a nifty toolkit called OIDv4 Toolkit to retrieve the data we need.

For those of you not familiar, the Google Open Images Dataset is a free, open-source set of image data that can be downloaded and used to build machine learning models. In particular, they offer both bounding boxes and images for 600 different categories of objects, simplifying the process of getting started. Likewise, TuriCreate simplifies the development of custom machine learning models.

Fortunately for all of us, the OIDv4 Toolkit is a thing that exists. This easy to use repo allows you to download only the image categories you need, along with their bounding boxes.

The following code was written and executed using a Jupyter Notebook on Google Colab, but you can use it locally if desired, or forego the notebook all together.

First, you need to clone the OIDv4 Toolkit directly into your notebook and install the required libraries. You will also want to to install TuriCreate.

!git clone https://github.com/EscVM/OIDv4_ToolKit.git
!pip install turicreate==5.4
!pip install -r ./OIDv4_ToolKit/requirements.txt

We recommend reviewing the excellent README provided by OIDv4_Toolkit to understand the full capabilities they provide, as well as their directory structure. The latter is key for parsing the image data appropriately. In the snippet below, you pull 100 training images of coffee cups, pens, and computer monitors. More images are available, and if you remove the limit of 100, you’ll get the entire set of training images for these objects.

!python OIDv4_ToolKit/main.py downloader — classes ‘Coffee cup’ ‘Pen’ ‘Computer monitor’ — type_csv train — limit 100

Once you’ve pulled in this data, you need to import the necessary libraries to process it.

# Import necessary libraries
import os
import json
import string
import pandas as pd
import turicreate as tc
# Enable the ability to view images in a cell
%matplotlib inline

The next bit of code is a function that is required to adjust the bounding box coordinates so that Turi Create reads them correctly. Turi Create requires coordinates to be passed as (height, width, x, y) where (x,y) are at the center of the bounding box. The bounding boxes as pulled by the OIDv4 Toolkit, however, are listed as (left, top, right, bottom). The function below shifts the bounding box coordinates so they are interpreted correctly by Turi Create.

# Helper function that will build our bounding boxes in a way that TuriCreate can work with
def build_annotations(_tuple):
label = _tuple[0];
x_min = _tuple[1][0]; # left x
x_max = _tuple[1][2]; # right x
y_min = _tuple[1][1]; # top y
y_max = _tuple[1][3]; # bottom y

ret = {‘coordinates’:
‘x’ : (x_min + (x_max / 2)),
‘y’ : (y_min + (y_max / 2)),
‘width’: x_max,
‘height’: y_max
‘label’: label
return ret

You now need to grab a list of image categories, and the corresponding paths to their bounding box coordinate files. Once that is done, you can apply the build_annotations function to shift each of the bounding boxes into the right coordinate space.

# Get list of image categories
train_path = “./OID/Dataset/train/”
categories = os.listdir(train_path)
# Create an array of image labels for each 
label_paths = [];
for category in categories:
label_paths.extend([train_path + category + “/Label/” + image for image in os.listdir(train_path+category+”/Label”)])

# For each image you downloaded, grab the coordinates, apply the build_annotations function to shift them to the Turi Create space
annotations_map = {}
for path in label_paths:
# get the image id, which serves as the key for this dictionary
image_id = path.split(“/”)[-1].split(“.txt”)[0]
annotations = open(path, “r”).readlines()
annotations_map[image_id] = list(map(build_annotations, [(‘ ‘.join(a.split(“ “)[0:-4]), list(map(float, a.replace(“\n”, “”).split(“ “)[-4:]))) for a in annotations]))

Finally, read all of the images into a Turi Create SFrame, so you can use them with all of the modeling functions.

image_sframe = tc.image_analysis.load_images(train_path, recursive=True)
image_sframe[‘image_id’] = image_sframe[‘path’].apply(lambda x: x.split(“/”)[-1].split(“.jpg”)[0])
image_sframe[‘annotations’] = image_sframe[‘image_id’].apply(lambda x: annotations_map[x])

Want to check to make sure that all of this shifting worked properly? Good idea. Use the code below to do this.

# Show an image and its bounding boxes
which_image_to_plot = 230# Feel free to change this and look at other images (There should be 300 total)

The output will be something like this:

That’s it! You can now use your Turi Create SFrame with its Object Detection function to build object detection models. Happy with what you have? Export to Core ML and integrate it into your app. We here at Skafos will help you deliver it.