Loading Open Images V6 and custom datasets with FiftyOne

Datasets and their annotations are often stored in very different formats. FiftyOne allows for easy loading and visualization of any image dataset and labels.

Eric Hofesmann
Feb 10 · 11 min read

DataFrames are a standard way of storing tabular data with various tools that exist to visualize the data in different ways. Image and video datasets, on the other hand, do not have a standard format for storing their data and annotations. Nearly every dataset that is developed creates a new schema with which to store their raw data, bounding boxes, sample-level labels, etc.

I have been working on an open-source machine learning tool called FiftyOne that can help ease the pain of having to write custom loading, visualization, and conversion scripts whenever you use a new dataset. FiftyOne supports multiple dataset formats out of the box including MS-COCO, YOLO, Pascal VOC, and more. However, if you have a dataset format not provided out-of-the-box, you can still easily load it into FiftyOne manually.

Why would you want your data in FiftyOne? FiftyOne provides a highly functional App and API that will let you quickly visualize your dataset, generate interesting queries, find annotation mistakes, convert it to other formats, load it into a zoo of models, and more.

This blog post will walk you through how to load image-level classifications, object detections, segmentations, and visual relationships into FiftyOne, visualize them, and convert them to other formats. I’ll be using Open Images V6 which was released in February 2020 as a basis for this post since it contains all of these data types. If you are only interested in loading Open Images V6, you can check it out in the FiftyOne Dataset Zoo and load it in one line of code! If you have your own dataset that you want to load, adjust the code in this post to parse the format that your data is stored in.

Open Images V6

Open Images is a dataset released by Google containing over 9M images with labels spanning various tasks:

*Loaded in this post

These annotations were generated through a combination of machine learning algorithms followed by human verification on the test, validation, and subsets of the training splits. Versions of this dataset are also used in the Open Images Challenges on Kaggle.

Open Images V6 introduced localized narratives, which are a novel form of multimodal annotations consisting of a voiceover and mouse trace of an annotator describing an image. FiftyOne support for localized narratives is currently in the works.

A new way to download and evaluate Open Images!

[Updated May 12, 2021] After releasing this post, we collaborated with Google to support Open Images V6 directly through the FiftyOne Dataset Zoo. It is now as easy as this to load Open Images, data, annotations, and all:

With this implementation in FiftyOne, you can also specify any subset of Open Images with parameters like classes, split, max_samples, and more:

Open Images subset loaded in FiftyOne

Additionally, if you are training a model on Open Images, FiftyOne now supports Open Images style evaluation allowing you to produce the same mAP metrics used in the Open Images challenges. The benefit of using FiftyOne for this is that it also stores instance-level true positive, false positive, and false negative results allowing you to not rely only on aggregate dataset-wide metrics but actually get hands-on with your model results and find out how to best improve performance.

For more information check out this post or this tutorial!

Open Images Label Formats

The previous section shows the best way to load the Open Images dataset. However, FiftyOne also lets you easily load custom datasets. The next few sections show how to load a dataset into FiftyOne from scratch. We are using Open Images as the example dataset for this since it contains a rich variety of label types.

Note: The code in the following sections is meant to be adapted to your own datasets, it does not need to be used to load Open Images. Use the examples above if you are only interested in loading the Open Images dataset.

In this “Open Images Label Formats” section, we describe the format used by Google to store Open Images annotations on disk. We will use this information to write the parsers to load this dataset into FiftyOne in the next “Loading custom datasets into FiftyOne” section.

Downloading Data Locally

The AWS download links for the training split (513 GB), validation split (12 GB), and testing split (36 GB) can be found at Open Images GitHub repository. Annotations for the tasks that you are interested in can be downloaded directly from the Open Images website.

We will be using samples from the test split for this example. You can download the entire test split (36 Gb!) with the following commands:

pip install awscliaws s3 --no-sign-request sync s3://open-images-dataset/test ./open-images/test/

Alternatively, I will be downloading just a few images from the test split further down in this post.

We will also need to download the relevant annotation files for each task that are all found here: https://storage.googleapis.com/openimages/web/download.html

Image-level Labels

Every image in Open Images can contain multiple image-level labels across hundreds of classes. These labels are split into two types, positive and negative. Positive labels are classes that have been verified to be in the image while negative labels are classes that are verified to not be in the image. Negative labels are useful because they are generally specified for classes that you may expect to appear in a scene but do not. For example, if there is a group of people in outfits on a field, you may expect there to be a ball . If there isn’t one, that would be a good negative label.

wget -P labels https://storage.googleapis.com/openimages/v5/test-annotations-human-imagelabels-boxable.csv

Below is a sample of the contents of this file:

ImageID,Source,LabelName,Confidence000026e7ee790996,verification,/m/0cgh4,0
000026e7ee790996,verification,/m/04hgtk,0
...

We need the class list for both labels and detections:

wget -P labels https://storage.googleapis.com/openimages/v5/class-descriptions-boxable.csv

Below is a sample of the contents of this file:

/m/011k07,Tortoise
/m/011q46kg,Container
...
Image-level labels visualized in the FiftyOne App

Detections

Objects are localized and labeled with the same classes as the image-level labels. Additionally, each detection contains boolean attributes indicating if the object is occluded, truncated, representing a group of other objects, inside another object, or a depiction of the object (like a cartoon).

wget -P detections https://storage.googleapis.com/openimages/v5/test-annotations-bbox.csv

Below is a sample of the contents of this file:

ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside000026e7ee790996,xclick,/m/07j7r,1,0.071875,0.1453125,0.20625,0.39166668,0,1,1,0,0
000026e7ee790996,xclick,/m/07j7r,1,0.4390625,0.571875,0.26458332,0.43541667,0,1,1,0,0
...
Detections visualized in the FiftyOne App

Visual Relationships

Relationships are labeled between two object detections. Examples are if one object is wearing another. The most common relationship is is, indicating if an object is some attribute (like if a handbag is leather). The annotations for these relationships include the bounding boxes and labels of both objects as well as the label for the relationship.

wget -P relationships https://storage.googleapis.com/openimages/v6/oidv6-test-annotations-vrd.csvwget -P relationships https://storage.googleapis.com/openimages/v6/oidv6-attributes-description.csv

Below is a sample of the contents of the relationships file:

ImageID,LabelName1,LabelName2,XMin1,XMax1,YMin1,YMax1,XMin2,XMax2,YMin2,YMax2,RelationshipLabel9553b9608577b74b,/m/04yx4,/m/017ftj,0.023404,0.985106,0.038344,0.981595,0.238298,0.759574,0.349693,0.529141,wears
819903f6353b60b5,/m/03m3pdh,/m/0dnr7,0.096875,1.000000,0.095833,1.000000,0.096875,1.000000,0.095833,1.000000,is
...
Visual relationships visualized in the FiftyOne App

Instance Segmentation

Segmentation masks are downloaded through 16 zip files each containing the masks related to images starting with 0–9 or A-F. In this example, we will only be using images starting with 0. The following command downloads just those masks, replace the 0 with 1-9 or a-f to download masks for other images.

These segmentation annotations are stored in a separate image for each object and also include the bounding box coordinates around the segmentation and the label of the segmentation.

wget -P segmentations https://storage.googleapis.com/openimages/v5/test-masks/test-masks-0.zipunzip -d segmentations/masks segmentations/test-masks-0.zipwget -P segmentations https://storage.googleapis.com/openimages/v5/test-annotations-object-segmentation.csv

Below is a sample of the contents of the segmentations file:

MaskPath,ImageID,LabelName,BoxID,BoxXMin,BoxXMax,BoxYMin,BoxYMax,PredictedIoU,Clicksd0ed76e0533a914d_m01xyhv_cffd8afa.png,d0ed76e0533a914d,/m/01xyhv,cffd8afa,0.122966,0.958409,0.389892,0.998195,0.00000,
fba940ee0203b368_m08pbxl_13094a08.png,fba940ee0203b368,/m/08pbxl,13094a08,0.460938,0.551562,0.387500,0.572917,0.00000,
...
Instance segmentations visualized in the FiftyOne App

Preprocessing

We are only going to use a small subset of the dataset in this example to make it easy to follow along with. Additionally, since we want to load a lot of different types of annotations, we need to find some samples that are compatible with all of our labels.

Lets load in the annotations from the csv files we downloaded and parse them to find a subset of images we want to use.

We now have a list of valid_ids that contains all of the annotations we want to look at. Let's choose a subset of 100 of those and download the corresponding images following what is done in the official Open Images download script.

pip install boto3

The last thing we need is a mapping from the class and attribute IDs to their actual names.

Loading custom datasets into FiftyOne

You will first need to install FiftyOne through a simple pip command. It is recommended to work with FiftyOne in interactive Python sessions, so let's install that too.

pip install fiftyone
pip install ipython

After launching ipython the first step is to create a FiftyOne Dataset.

If you want this dataset to exist after exiting the Python session, set the persistent attribute to True. This lets us quickly load the dataset in the future.

We then need to create FiftyOne Samples for each image that contain the file path to the images as well as all label information that we want to import. For each label type, we will create a corresponding object in FiftyOne and add it as a field to our samples.

Adding image-level classification labels will utilize the fo.Classifications class. Detections, segmentations, and relations can all use the fo.Detections class since it supports bounding boxes, masks, and also custom attributes assigned to each detection. These custom attributes can be used for things like IsOccluded in the detections or the two labels that a relationship is between.

The sections below outline how to create FiftyOne labels from the Open Images data we have loaded so far and then how to add them to your FiftyOne Dataset.

Classification Labels

Classification labels utilize the fo.Classification class. Since these are multi-label classifications, we will be using the fo.Classifications class to store multiple classification labels.

Additionally, we want to separate out the positive and negative labels (1 and 0 confidence respectively) into different classifications fields so we can view them separately in the App.

Object Detections

Similar to classifications, the fo.Detections class lets you store multiple fo.Detection objects in a list. We create a detection by defining the bounding box coordinates and class label of the object. We can then add any additional attributes that we want, like IsOccluded and IsTruncated.

Visual Relationships

Relationships are best represented in FiftyOne through fo.Detections since a relationship contains a bounding box, relationship label, and object labels, all of which can be stored in a detection. We are going to have the bounding box of the relationship encompass the bounding boxes of both objects it pertains to. We add the labels of each object as additional custom fields to the detection.

It should be noted, that you could easily also add the two objects that make up the relationship as individual detections, not doing so was just a design choice for this post.

Segmentations

We can once again use fo.Detections to store segmentations since a detection contains an optional mask argument that accepts a NumPy array and will scale it to the bounding box region. The segmentations in Open Images also contain a bounding box around the mask as well as the instance label, all of which is added to the detection objects.

Creating FiftyOne Samples

Now that we defined the functions to take in Open Images data and return FiftyOne labels, we can create samples and add these labels to them.

Samples only need a filepath to be instantiated and we can add any FiftyOne labels to a sample. Once the sample is created, we can add it to the dataset and continue until all of our data is loaded.

Visualizing and Exploring

Once we have our data loaded into a FiftyOne dataset, we can launch the App and start exploring.

In the App, we can select which of the label fields that we want to view, look at individual samples in an expanded view, and also view the distributions of labels.

Opening a sample in the expanded view lets you visualize the attributes we added, like the labels of a relationship. For example, we can see that there are two Ride relationships between Man and Horse in the image below.

Being able to visualize our dataset easily lets us quickly spot check the data. For example, it appears that the Mammal label has an inconsistent meaning between different samples. Below are two images containing humans, one has Mammal as a negative_label and the other has Mammal as a positive_label.

Queries

One of the cutting-edge features that FiftyOne provides is the ability to interact closely with your dataset in code and in the App. This lets you write sophisticated queries that would otherwise require a large amount of scripting.

For example, say that we want build a subset of Open Images containing close up images of faces. We can create a view into the dataset that will let us get all detections that contain a Human face with a bounding box area greater than 0.2.

Updating the session object automatically updates what we see in the App. It’s looking good, but there are a couple of images that contain large Human face boxes only because it encompasses a crowd of people. We can further filter the view by making sure the boxes don’t include the IsGroupOf attribute in the detection.

Converting Formats

Once your data is in FiftyOne, you can export it in any of the formats that FiftyOne supports with just a couple of lines of code.

For example, if we want to export the detections that we added to MS-COCO format so that we can use the pycocotools evaluation on it, we can do so in one line of code in Python.

The general formula for loading datasets

The easiest way to load your data is if you follow a standard format for your annotations. For example, if you just finished annotating in the open-source tool, CVAT, you can load it into FiftyOne as easily as:

Even if your data is in a custom format, it’s easy to manually build a FiftyOne dataset.

Once you’ve loaded your data, you can utilize the FiftyOne App to visualize and explore your dataset, use the FiftyOne Model Zoo to generate predictions on your data, export it in various formats, and more!

About Voxel51

High-quality, intentionally-curated data is critical to training great computer vision models. At Voxel51, we have over 25 years of CV/ML experience and care deeply about enabling the community to bring their AI solutions to life. That’s why we developed FiftyOne, an open-source tool that helps engineers and scientists to build high-quality datasets and models.

Want to learn more? Check us out at https://fiftyone.ai.

Voxel51

Developer tools for machine learning

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store