fastdup: One Year Strong and Still Going!

Dickson Neoh
Visual Layer
Published in
3 min readMay 17, 2023

Can you believe it? It’s been a year since we launched fastdup — a free tool for image & video data analysis.

Not only that, one year into running the company Visual Layer, we recently raised $7.5M to help enterprises with massive visual datasets!!

This funding marks a significant milestone in our journey and brings us one step closer to achieving our vision of revolutionizing the way we interact with visual data. — Amir Alush, co-founder

We’re featured in Techcrunch on our journey to getting where we are today.

Danny Bickson & Amir Alush, co-founders of Visual Layer, the company behind fastdup.

The reason users like fastdup is that it is easy to use, scalable and accurate for a large variety of datasets. We believe the problem is completely unsolved, especially in the billion images and videos regime. — Danny Bickson, co-founder.

What is fastdup?

If you’re unfamiliar you might be wondering what’s fastdup?

fastdup is a tool that can analyze your computer vision dataset in an unsupervised way.

With fastdup, you can find common issues in your computer vision dataset such as:

1. Duplicates / near-duplicates.
2. Anomalies.
3. Wrong labels.
4. Corrupted images.
5. Blurry images.
6. Overly dark / bright images.

These issues are quite common in large computer vision datasets and it’s an issue that’s largely ignored by most. fastdup works on both labeled and unlabeled datasets. So we got you covered.

In a recent discovery, we find that even widely used computer vision datasets such as ImageNet and LAION have problems with duplicates, broken images, and wrong labels.

Example near duplicates identified in the MS-COCO (160K images) & ImageNet-21K datasets (11.5M images). A record breaking number of 1.2M duplicates were identified in the ImageNet-21K dataset! Additional information can be found in our GitHub repo.

These are some of the issues that fastdup addresses. One of the standout features of fastdup is its incredible performance.

Guess what? You don’t even need a GPU to run it! fastdup runs efficiently on a CPUs even on a lightweight 2-core Google Colab instance!
fastdup’s highly efficient graph engine can handle up to 400M images on a single CPU machine.

All you need to get started is 3 lines of code:

In fastdup version 1.0 we’ve made it easier for anyone to get started. Here’s the TLDR:

  • Clean & simple API: The new API is simpler to use
  • Native Windows support: Windows now has first-class, full feature support in fastdup.
  • Amazing documentation: New and improved fastdup documentation.
  • Sleek galleries: New and improved galleries to get a better view of your data.
  • Extensive labels support : Improved support for handling image and bounding box labels.
  • Additional image formats support: Apple’s HEIC+HEIF, 16 bit grayscale TIFF.
  • Support for Python 3.10.
  • Fully back compatible to old API.

More in our GitHub repo —

Wrapping Up

All in all, we couldn’t be more grateful and thrilled for such a wild ride over the past year.

None of this would have been possible without the incredible support and contributions from our amazing community. Your feedback, bug reports, and feature suggestions have helped shape fastdup into the robust tool it is today.

Thank you for being part of this journey!

If fastdup had made your life easier in any way over the past year, please help us spread the word and share your experience with other users. On behalf of the fastdup team we’d like to thank you for helping us be where we are today.

Here’s to an incredible year with fastdup and many more to come!

