Loading Open Images V6 and custom datasets with FiftyOne
Datasets and their annotations are often stored in very different formats. FiftyOne allows for easy loading and visualization of any image dataset and labels.
DataFrames are a standard way of storing tabular data with various tools that exist to visualize the data in different ways. Image and video datasets, on the other hand, do not have a standard format for storing their data and annotations. Nearly every dataset that is developed creates a new schema with which to store their raw data, bounding boxes, sample-level labels, etc.
I have been working on an open-source machine learning tool called FiftyOne that can help ease the pain of having to write custom loading, visualization, and conversion scripts whenever you use a new dataset. FiftyOne supports multiple dataset formats out of the box including MS-COCO, YOLO, Pascal VOC, and more. However, if you have a dataset format not provided out-of-the-box, you can still easily load it into FiftyOne manually.
Why would you want your data in FiftyOne? FiftyOne provides a highly functional App and API that will let you quickly visualize your dataset, generate interesting queries, find annotation mistakes, convert it to other formats, load it into a zoo of models, and more.
This blog post will walk you through how to load image-level classifications, object detections, segmentations, and visual relationships into FiftyOne, visualize them, and convert them to other formats. I’ll be using Open Images V6 which was released in February 2020 as a basis for this post since it contains all of these data types. If you are only interested in loading Open Images V6, you can check it out in the FiftyOne Dataset Zoo and load it in one line of code! If you have your own dataset that you want to load, adjust the code in this post to parse the format that your data is stored in.
Open Images V6
Open Images is a dataset released by Google containing over 9M images with labels spanning various tasks:
- Image-level labels*
- Object bounding boxes*
- Visual relationships*
- Instance segmentation masks*
- Localized narratives
*Loaded in this post
These annotations were generated through a combination of machine learning algorithms followed by human verification on the test, validation, and subsets of the training splits. Versions of this dataset are also used in the Open Images Challenges on Kaggle.
Open Images V6 introduced localized narratives, which are a novel form of multimodal annotations consisting of a voiceover and mouse trace of an annotator describing an image. FiftyOne support for localized narratives is currently in the works.
A new way to download and evaluate Open Images!
[Updated May 12, 2021] After releasing this post, we collaborated with Google to support Open Images V6 directly through the FiftyOne Dataset Zoo. It is now as easy as this to load Open Images, data, annotations, and all:
With this implementation in FiftyOne, you can also specify any subset of Open Images with parameters like
max_samples, and more:
Additionally, if you are training a model on Open Images, FiftyOne now supports Open Images style evaluation allowing you to produce the same mAP metrics used in the Open Images challenges. The benefit of using FiftyOne for this is that it also stores instance-level true positive, false positive, and false negative results allowing you to not rely only on aggregate dataset-wide metrics but actually get hands-on with your model results and find out how to best improve performance.
Open Images Label Formats
The previous section shows the best way to load the Open Images dataset. However, FiftyOne also lets you easily load custom datasets. The next few sections show how to load a dataset into FiftyOne from scratch. We are using Open Images as the example dataset for this since it contains a rich variety of label types.
Note: The code in the following sections is meant to be adapted to your own datasets, it does not need to be used to load Open Images. Use the examples above if you are only interested in loading the Open Images dataset.
In this “Open Images Label Formats” section, we describe the format used by Google to store Open Images annotations on disk. We will use this information to write the parsers to load this dataset into FiftyOne in the next “Loading custom datasets into FiftyOne” section.
Downloading Data Locally
The AWS download links for the training split (513 GB), validation split (12 GB), and testing split (36 GB) can be found at Open Images GitHub repository. Annotations for the tasks that you are interested in can be downloaded directly from the Open Images website.
We will be using samples from the test split for this example. You can download the entire test split (36 Gb!) with the following commands:
pip install awscliaws s3 --no-sign-request sync s3://open-images-dataset/test ./open-images/test/
Alternatively, I will be downloading just a few images from the test split further down in this post.
We will also need to download the relevant annotation files for each task that are all found here: https://storage.googleapis.com/openimages/web/download.html
Every image in Open Images can contain multiple image-level labels across hundreds of classes. These labels are split into two types, positive and negative. Positive labels are classes that have been verified to be in the image while negative labels are classes that are verified to not be in the image. Negative labels are useful because they are generally specified for classes that you may expect to appear in a scene but do not. For example, if there is a group of people in outfits on a field, you may expect there to be a
ball . If there isn’t one, that would be a good negative label.
wget -P labels https://storage.googleapis.com/openimages/v5/test-annotations-human-imagelabels-boxable.csv
Below is a sample of the contents of this file:
We need the class list for both labels and detections:
Below is a sample of the contents of this file:
Objects are localized and labeled with the same classes as the image-level labels. Additionally, each detection contains boolean attributes indicating if the object is occluded, truncated, representing a group of other objects, inside another object, or a depiction of the object (like a cartoon).
wget -P detections https://storage.googleapis.com/openimages/v5/test-annotations-bbox.csv
Below is a sample of the contents of this file:
Relationships are labeled between two object detections. Examples are if one object is wearing another. The most common relationship is
is, indicating if an object
is some attribute (like if a handbag
is leather). The annotations for these relationships include the bounding boxes and labels of both objects as well as the label for the relationship.
wget -P relationships https://storage.googleapis.com/openimages/v6/oidv6-test-annotations-vrd.csvwget -P relationships https://storage.googleapis.com/openimages/v6/oidv6-attributes-description.csv
Below is a sample of the contents of the relationships file:
Segmentation masks are downloaded through 16 zip files each containing the masks related to images starting with
A-F. In this example, we will only be using images starting with
0. The following command downloads just those masks, replace the
a-f to download masks for other images.
These segmentation annotations are stored in a separate image for each object and also include the bounding box coordinates around the segmentation and the label of the segmentation.
wget -P segmentations https://storage.googleapis.com/openimages/v5/test-masks/test-masks-0.zipunzip -d segmentations/masks segmentations/test-masks-0.zipwget -P segmentations https://storage.googleapis.com/openimages/v5/test-annotations-object-segmentation.csv
Below is a sample of the contents of the segmentations file:
We are only going to use a small subset of the dataset in this example to make it easy to follow along with. Additionally, since we want to load a lot of different types of annotations, we need to find some samples that are compatible with all of our labels.
Lets load in the annotations from the
csv files we downloaded and parse them to find a subset of images we want to use.
We now have a list of
valid_ids that contains all of the annotations we want to look at. Let's choose a subset of 100 of those and download the corresponding images following what is done in the official Open Images download script.
pip install boto3
The last thing we need is a mapping from the class and attribute IDs to their actual names.
Loading custom datasets into FiftyOne
pip install fiftyone
pip install ipython
ipython the first step is to create a FiftyOne Dataset.
If you want this dataset to exist after exiting the Python session, set the
persistent attribute to
True. This lets us quickly load the dataset in the future.
We then need to create FiftyOne Samples for each image that contain the file path to the images as well as all label information that we want to import. For each label type, we will create a corresponding object in FiftyOne and add it as a
field to our samples.
Adding image-level classification labels will utilize the
fo.Classifications class. Detections, segmentations, and relations can all use the
fo.Detections class since it supports bounding boxes, masks, and also custom attributes assigned to each detection. These custom attributes can be used for things like
IsOccluded in the detections or the two labels that a relationship is between.
The sections below outline how to create FiftyOne labels from the Open Images data we have loaded so far and then how to add them to your FiftyOne Dataset.
Classification labels utilize the
fo.Classification class. Since these are multi-label classifications, we will be using the
fo.Classifications class to store multiple classification labels.
Additionally, we want to separate out the positive and negative labels (1 and 0 confidence respectively) into different classifications fields so we can view them separately in the App.
Similar to classifications, the
fo.Detections class lets you store multiple
fo.Detection objects in a list. We create a detection by defining the bounding box coordinates and class label of the object. We can then add any additional attributes that we want, like
Relationships are best represented in FiftyOne through
fo.Detections since a relationship contains a bounding box, relationship label, and object labels, all of which can be stored in a detection. We are going to have the bounding box of the relationship encompass the bounding boxes of both objects it pertains to. We add the labels of each object as additional custom fields to the detection.
It should be noted, that you could easily also add the two objects that make up the relationship as individual detections, not doing so was just a design choice for this post.
We can once again use
fo.Detections to store segmentations since a detection contains an optional
mask argument that accepts a NumPy array and will scale it to the bounding box region. The segmentations in Open Images also contain a bounding box around the mask as well as the instance label, all of which is added to the detection objects.
Creating FiftyOne Samples
Now that we defined the functions to take in Open Images data and return FiftyOne labels, we can create samples and add these labels to them.
Samples only need a
filepath to be instantiated and we can add any FiftyOne labels to a sample. Once the sample is created, we can add it to the dataset and continue until all of our data is loaded.
Visualizing and Exploring
In the App, we can select which of the label fields that we want to view, look at individual samples in an expanded view, and also view the distributions of labels.
Opening a sample in the expanded view lets you visualize the attributes we added, like the labels of a relationship. For example, we can see that there are two
Ride relationships between
Horse in the image below.
Being able to visualize our dataset easily lets us quickly spot check the data. For example, it appears that the
Mammal label has an inconsistent meaning between different samples. Below are two images containing humans, one has
Mammal as a
negative_label and the other has
Mammal as a
One of the cutting-edge features that FiftyOne provides is the ability to interact closely with your dataset in code and in the App. This lets you write sophisticated queries that would otherwise require a large amount of scripting.
For example, say that we want build a subset of Open Images containing close up images of faces. We can create a view into the dataset that will let us get all detections that contain a
Human face with a bounding box area greater than 0.2.
session object automatically updates what we see in the App. It’s looking good, but there are a couple of images that contain large
Human face boxes only because it encompasses a crowd of people. We can further filter the view by making sure the boxes don’t include the
IsGroupOf attribute in the detection.
Once your data is in FiftyOne, you can export it in any of the formats that FiftyOne supports with just a couple of lines of code.
The general formula for loading datasets
The easiest way to load your data is if you follow a standard format for your annotations. For example, if you just finished annotating in the open-source tool, CVAT, you can load it into FiftyOne as easily as:
Even if your data is in a custom format, it’s easy to manually build a FiftyOne dataset.
Once you’ve loaded your data, you can utilize the FiftyOne App to visualize and explore your dataset, use the FiftyOne Model Zoo to generate predictions on your data, export it in various formats, and more!
High-quality, intentionally-curated data is critical to training great computer vision models. At Voxel51, we have over 25 years of CV/ML experience and care deeply about enabling the community to bring their AI solutions to life. That’s why we developed FiftyOne, an open-source tool that helps engineers and scientists to build high-quality datasets and models.
Want to learn more? Check us out at https://fiftyone.ai.