Introducing FiftyOne: A Tool for Rapid Data & Model Experimentation

A new (and open source!) tool for your machine learning toolbox

Brian Moore


Exploring a labeled dataset with predictions from an object detection model in FiftyOne.

The Backstory

My co-founder Jason and I started Voxel51 in 2017 with the vision of building tools that enable CV/ML engineers to tackle the hardest problems in computer vision. We started that journey by participating in the NIST Public Safety Innovation Accelerator Program, which was created to incubate new technologies with the potential to transform the future of public safety.

Over the next two years, we set out to translate our academic research on image/video understanding — over 250 papers and 30 years of experience — into a scalable platform for developing and deploying ML models that process visual data. The platform made it easy to take a model trained on images and deploy it to process video, efficiently and at scale. We trained a variety of models ourselves on road scene videos for tasks such as vehicle recognition, road sign detection, and human activity recognition. With some early successes under our belt, we raised a round of venture capital, built a small team, and began pilots with industry partners to onboard their CV/ML teams to our platform.

During this process, we learned a surprising fact:

There is a serious lack of tooling available to rapidly experiment with data and models

We found ourselves building our own in-house tools to solve tasks like:

  • Wrangling datasets into a common format for training/evaluation
  • Choosing a diverse set of images to annotate
  • Balancing datasets across classes and visual characteristics
  • Validating the correctness of human annotations
  • Visualizing model predictions
  • Finding and visualizing failure modes of models

Each time we encountered a new task, we wrote more custom scripts, massaged our datasets in similar-but-different ways, and generally found ourselves spending way more time wrangling data than doing actual science to improve our models.

The ML Lifecycle. There are many great tools for annotation, model training, and experiment tracking, but no good solutions for dataset analysis and model evaluation… That’s our focus.

After cross-referencing our experience with CV/ML teams in dozens of other companies, both small and large, as well as academic groups, we learned that others were experiencing the same pain: they needed a tool that would enable them to rapidly experiment with their data and models.

Getting Closer To Your Data

Training great machine learning models requires high quality data. Academic education in CV/ML is an excellent way to develop knowledge of model architectures, tips & tricks for training models, etc. However, most research treats the dataset (often a standard dataset like ImageNet or COCO) as a static, black box.

In our experience, the limiting factor of performance on real problems is not the model, but rather the dataset. The more data you have for training, the better, right? End of story. Well, sort of:

Nothing hinders the success of machine learning systems more than poor-quality data
- Jason Corso, CEO of Voxel51 and Professor of CV/ML

In a recent blog post, we found that over 1/3 of false positives of state-of-the-art models on the Open Images dataset are actually due to annotation error! Scientists are busy tweaking model architectures to squeeze a few extra points of mAP out of a dataset whose limiting factor is annotation quality.

“False positive” predictions from a state-of-the-art object detection model. The predictions are, in fact, reasonable; the ground truth annotation schema was not handled correctly. Source.
Error analysis of a state-of-the-art object detection model on the Open Images V4 test set. 36% of the “false positive” predictions by the model were actually due to annotation error! Source.

A consistent theme arose from our customer discovery efforts: the diversity of training data and the accuracy of the labels associated with that training data are critically important to developing a high-performance ML system.

That makes sense, but don’t ML engineers already know this?

Not nearly as much as we thought they did.

We found that CV/ML engineers focus primarily on choosing/tweaking model architectures and tuning hyperparameters. These are all important, but guess what? In our experience and that of many of the highest performing ML teams we spoke with, the biggest breakthroughs came by focusing on the data used to train the systems.

So, why aren’t CV/ML engineers acting on this insight?

Lack of tooling.

Seriously. We found that, while the teams and organizations with the most ML experience have built their own tools to make it easier to visualize their data, they only built those tools because they had no other choice. And smaller teams with less computer vision expertise? Most do not have the resources to build tools in-house (they are correctly focused on getting to market as fast as possible), and many do not yet realize the importance of getting closer to their data.

Introducing FiftyOne

We built FiftyOne to help CV/ML engineers and scientists spend dramatically less time wrangling data and more time focusing on the science of building better datasets and better models. The vignettes below give a taste of what I’m talking about.

Loading Data With Ease

FiftyOne removes the effort required to quickly load and visualize data in a variety of common (or custom) formats. While this task is certainly doable by any ML engineer, it takes real effort to properly wrangle the various data/annotation formats out there into a standard format that they can work with. FiftyOne removes that effort and meets engineers where they’re already working: in Python.

With FiftyOne, it takes two lines of code to load an image dataset with labels and display them visually in the FiftyOne App.

You are two lines of code away from infinite scrolling bliss.

Powerful Dataset Analysis

FiftyOne does more than just visualize your data. It also provides powerful dataset analysis utilities. For example, look how easy it is to find visually similar images in your dataset:

Calculate and sort by visual uniqueness in two lines of code with FiftyOne.

Interactive Model Evaluation

When working with image/video datasets, some tasks are best performed visually, while others are best performed programmatically. We built FiftyOne to enable seamless handoff between the App and the Python library. You can load a dataset from code, then search and filter it from the App to identify particular samples/labels in the App, then access those samples of interest back in Python.

One important instance of this workflow is evaluating model predictions. Quantitative measures such as confusion matrices or mAP scores don’t tell the whole story of a model’s performance. You need to visualize the failure modes of your model to understand what actions to take to remove the glass ceiling on the model’s performance.

With FiftyOne, you can perform complex operations on your data, such as evaluating the quality of a particular class at a given confidence threshold for your model’s predictions, and then visualizing the worst performing samples in the App:

A typical workflow in FiftyOne: using the Python library to perform a complex operation on a dataset and then visualizing the results in the App. In this case, we’re analyzing the worst performing samples for an object detection model: those with the most false positives. Visualizing these failure modes can lead to critical insights, such as whether the model or the human annotations are at fault.

Less Wrangling, More Science

While we’ve heard that visualizing datasets in the App and the power of the Python library alone are already useful to CV/ML engineers, we’re also building out features in FiftyOne to power common workflows that are important but-challenging to implement today:

  • Converting between dataset formats
  • Curating diverse and representative datasets
  • Selecting video frames for annotation and training of image models
  • Automatically finding label mistakes
  • Evaluating model predictions
  • Identifying and visualizing failure modes in your models

Ready to jump in? Check out the easy-to-follow tutorials and recipes in the FiftyOne Docs to learn how to execute these tasks on your datasets.

Oh, and getting started with FiftyOne is a breeze!

We Believe In Open Core Software

At Voxel51, we’re strong believers in the virtues of open source software. As a point of reference, many of the most popular tools in the ML ecosystem — TensorFlow, PyTorch, Apache Spark, and MLflow, to name a few — are open source. We believe this is not a coincidence: the best way to build a community around a developer tool is to make the project as open and transparent as possible. Making a project free and open source removes barriers to entry and allows individual developers to directly evaluate the merits of a tool, cast a vote of approval for the project with their code (and GitHub stars), and evangelize the tool within their organizations.

That’s why we released FiftyOne open source on GitHub with a permissive license. We’re committed to making the best tools for rapid data and model experimentation openly available for all CV/ML engineers to use. Issues, feature requests, and pull requests are welcome!

Although the core FiftyOne library will always be free and open source, we know that professional organizations and other advanced users have unique requirements to adopt tools more broadly and integrate them into their production workflows. That’s why we’re pursuing an open core model for FiftyOne. But more on that soon…

For now, we’re excited to see the amazing CV/ML-powered solutions that are built using FiftyOne!