FiftyOne — A Productive Tool for Cognomen ML & AI Developers
In this guest blog post, the Cognomen team shares their experience using FiftyOne and working with the Voxel51 team.
ML Challenges
Nearly every business in the field of artificial intelligence faces issues bringing their AI solution to-market, including model performance (especially regression and model drift) and difficulty distilling high quality datasets from their data lake.
Some key challenges that ML engineers face on this journey are the need to deal with oversimplified abstractions of their data, limitations of their models due to human inputs, and over-reliance on qualitative visuals to interpret data. Data visualization is a critical part of the ML lifecycle, and a key challenge is understanding your data in a non-obtrusive way. There is always a need for high quality datasets that will enable training better models more quickly, and ML practitioners should always be on the lookout for dataset analysis and curation tools.
About Cognomen
At Cognomen, we are building a next generation IOT edge-based ecosystem with deep learning models trained to run in real-time over the edge. Our mission is to empower every human being on the planet and to help ease their lives by using their products which are built on emerging technologies.
FiftyOne Bootcamp Experience
In our zeal to expand our technical toolkit and find a solution where our data visualizations stand out from the crowd (with minimal work by us), we participated in a FiftyOne Bootcamp organized by the Voxel51 team, including CEO Jason Corso and CTO Brian Moore, to learn best practices for using FiftyOne to level-up our computer vision practices.
The Voxel51 team has rich experience of more than 25 years in developing machine learning models and tools for computer vision. FiftyOne is a Python-based tool for machine learning/computer vision engineers and scientists that enables the curation of better datasets. With FiftyOne, we could rapidly gain insight into model performance by visualizing samples overlaid with dynamic and queryable fields such as ground truth and predicted labels, dataset splits, and much more! Finding annotations mistakes by hand is not feasible, but using FiftyOne we could automatically identify possible label mistakes in datasets using their advanced FiftyOne Brain functionality.
Benefits of using FiftyOne
Powerful dataset exploration has enabled us to connect to data anywhere in any format and easily search, filter and sort the data. FiftyOne’s core library provides a structured yet dynamic representation to explore datasets. We can efficiently query and manipulate datasets by adding custom tags, model predictions and more. For example, we can easily adding optimal samples to training datasets to see the largest improvement in our models. The FiftyOne Brain is a library of powerful machine learning -powered capabilities that provide insights to our datasets and recommend ways to modify our datasets that lead to measurably better performance of our models.
“Data visualization capability is the foundation for deep learning AI solutions. At Cognomen, our data scientists are using Voxel51’s FiftyOne tool, which has significantly improved our productivity as well as given us an edge in the business” — Kalpit Singh, CEO
By using the FiftyOne tool we can automatically find and remove near-duplicate images in our datasets; it recommends the most unique samples in our data, enabling us to start our model training off right with a high-quality bootstrapped training set. FiftyOne’s image uniqueness tool can be used to analyze and extract insights from raw (unlabeled) datasets. Using FiftyOne datasets allows us to easily load, modify and visualize data along with any related labels (classification, detection, etc). It provides a way to easily load images, videos, annotations, and model predictions into a format that can be visualized in the FiftyOne App. FiftyOne uses a lightweight non-relational database to store datasets, so we can easily scale to datasets of any size without worrying about the RAM constraints in our machine, even when working with video. It supports automatic loading of datasets stored in various common formats as well as custom formats; it even provides native support for importing datasets from disk in a variety of common formats.
Our speed of working with our datasets has definitely improved, saving countless days of navigating opaque machine learning challenges. We highly recommend FiftyOne as it has eased our workflow by automatically loading, visualizing, curating and analyzing datasets as well as evaluating and improving models. We also now directly engage with members of Voxel51 through dedicated Slack channels for any answers to our queries.
Summary
At Cognomen, we aspire for efficient and scientific methods to improve our model performance, and Voxel51 has just done that for us! The FiftyOne tool has given us the edge to deliver a complete solution to our clients by helping us build high quality datasets and models with ease.
About Voxel51
High-quality, intentionally-curated data is critical to training great computer vision models. At Voxel51, we have over 25 years of CV/ML experience and care deeply about enabling the community bring their AI solutions to life. That’s why we developed FiftyOne, an open-source tool that helps engineers and scientists to build high-quality datasets and models.
Want to learn more? Check us out at https://voxel51.com.