10 Best Open Source Datasets for Computer Vision in 2024

Mariia Krasavina
CVAT.ai
Published in
11 min readApr 26, 2024

Leveraging open-source datasets is essential for the development and testing of computer vision models. Here are 10 significant datasets that encompass a variety of computer vision tasks such as object detection, image classification, segmentation, and beyond.

Common Objects in Context (COCO)

Description: The Common Objects in Context (COCO) dataset is a comprehensive collection that features a wide array of objects including everyday items like cars and bicycles to more specific categories such as umbrellas and sports equipment. It was developed to address the shortcomings of previous datasets by incorporating a richer context, a wider variety of object categories, and a greater number of instances per category.

The COCO dataset is extensively employed for a variety of computer vision tasks, such as object detection, semantic segmentation, superpixel segmentation, keypoint detection, and image captioning — with each image having five associated captions.

It boasts a diverse collection of images and annotations, with over 330K images (more than 200K of which are labeled), 1.5 million object instances across 80 categories, and 250,000 people annotated with keypoints.

It’s important to note that while COCO’s annotations are highly regarded and widely utilized, the quality can be inconsistent and may not suit every application.

History: Introduced in 2014, the COCO dataset was designed to advance object recognition technology. Although the dataset itself has not seen regular updates in terms of new images, its annotations and functionality continue to be refined and extended through annual challenges and competitions.

Licensing: The COCO dataset is available under the Creative Commons Attribution 4.0 License, permitting both academic and commercial usage provided proper attribution is maintained.

Official Site: https://cocodataset.org/

ImageNet

Description: ImageNet is an extensive image database organized according to the WordNet hierarchy, which categorizes each significant concept into “synonym sets” or “synsets.” With over 100,000 synsets, primarily nouns that number over 80,000, ImageNet aims to provide approximately 1000 images for each synset to accurately depict each concept. The images undergo rigorous quality control and are human-annotated for precision. ImageNet offers tens of millions of carefully labeled and organized images that encompass the wide range of concepts included in the WordNet system.

ImageNet has been helpful in the development of computer vision technologies, notably through its sponsorship of the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This challenge has significantly advanced the capabilities of image recognition and deep learning.

History: Launched in 2009 by a team of researchers at Stanford University, ImageNet was conceived to build a large-scale database of annotated images to boost advancements in computer vision. The project has had a profound impact on the field, especially through the ImageNet Challenge that ran until 2017. Although these challenges are no longer active, the ImageNet dataset continues to be a vital tool in the computer vision community, despite not being frequently updated with new images.

Licensing: While ImageNet does not hold the copyrights for the images, it compiles and annotates a precise list of web images for each WordNet synset. As such, ImageNet is available for academic and non-commercial research under specific usage terms that require proper attribution.

Official Site: http://www.image-net.org/

PASCAL VOC

Description: PASCAL VOC is a well-known dataset and benchmarking initiative aimed at improving visual object recognition. It provides a substantial dataset and tools on a specialized platform, serving as a critical resource for the computer vision community.

Developed to present a varied collection of images that capture the complexity of the world, PASCAL VOC is essential for crafting more effective object recognition models. This dataset has become a foundational element in computer vision, driving major progress in technologies related to image classification. The challenges associated with PASCAL VOC have been pivotal in motivating researchers to improve the precision, efficiency, and dependability of automated image understanding and classification. The dataset has been irreplaceable in advancing various areas such as instance segmentation, image classification, person pose estimation, object detection, and person action classification.

History: Developed in 2005, the PASCAL VOC project was created to provide a standardized dataset for image recognition and object detection tasks. It became well-known through its annual challenges, which significantly moved the field forward until its conclusion in 2012. Despite the end of these yearly challenges, the PASCAL VOC dataset continues to be a valuable asset for computer vision researchers, even though it no longer receives updates with new data.

Licensing: PASCAL VOC is available under terms that support academic and research-oriented projects, following guidelines that promote the ethical and responsible utilization of the dataset. Additionally, the VOC dataset includes images sourced from the “Flickr” website; for more details, refer to the “Flickr” terms of use.

Official Site: http://host.robots.ox.ac.uk/pascal/VOC

Cityscapes

Description: The Cityscapes dataset was specifically developed to enhance the visual understanding and analysis of urban environments. It features a diverse collection of stereo video sequences from street scenes in 50 different cities. This dataset is notable for its high-quality, pixel-accurate annotations across 5,000 frames, and it also includes 20,000 frames with coarse annotations. Cityscapes significantly exceeds the scope of previous projects in this area, providing a unique and extensive resource for researchers and developers focused on urban settings.

Cityscapes aims to bridge the gap in the availability of urban-centric datasets that are crucial for advancing autonomous vehicle technology and urban scene analysis. It provides a rich repository of annotated images designed for semantic understanding of urban scenes, which has spurred substantial progress in analyzing complex city environments. This has furthered the development of algorithms that can interact with and interpret urban settings with greater sophistication.

History: Introduced around 2019, the Cityscapes dataset supports detailed research into urban scenes, particularly for segmentation tasks that demand accurate pixel and object identification. This dataset continues to receive regular updates and plays a vital role in the field, aiding developers and researchers in improving technologies such as those used in autonomous vehicles.

Licensing: The Cityscapes dataset is provided for academic and non-commercial research purposes.

Official Site: https://www.cityscapes-dataset.com/

KITTI

Description: The KITTI dataset is famous in autonomous driving research, providing a rich resource for various computer vision tasks related to automotive technology. It is designed around real-world driving scenarios and covers key areas including stereo vision, optical flow, visual odometry, and 3D object detection and tracking.

KITTI was established to fill the void in automotive vision datasets, aimed at enhancing the field of autonomous driving. It offers an in-depth look at the complexities of real-world driving conditions, presenting a diversity and depth that surpass previous datasets.

History: Introduced in 2012, the KITTI dataset played an important role in advancing autonomous driving technologies. This dataset was developed through a collaboration between the Karlsruhe Institute of Technology and the Toyota Technological Institute in Chicago. Although the KITTI dataset does not receive frequent updates, it continues to be a vital resource for researchers and developers in the field of automotive technology.

Licensing: The KITTI dataset is made available under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License that supports academic research and technological development, promoting its use among scholars and developers in the autonomous driving community.

Official Site: http://www.cvlibs.net/datasets/kitti

VGGFace2

Description: VGGFace2 is a substantial dataset containing approximately 3.31 million images across 9131 classes, each class representing a unique individual. This dataset is utilized for a range of computer vision tasks including face detection, face recognition, and landmark localization. It features a diverse collection of images that showcase a wide range of demographic characteristics such as age, pose, lighting, ethnicity, and profession. This diversity ensures a robust framework for the development and testing of algorithms aimed at achieving a human-like understanding of faces.

The dataset includes images of faces from well-known public figures to everyday individuals from various professions, greatly enhancing the depth and real-world applicability of face recognition technologies.

History: VGGFace2 was developed by the Visual Geometry Group at the University of Oxford and launched in 2017 as an expansion of the original VGGFace dataset. It is not updated regularly as it was released as a static collection intended for academic research and development purposes.

Licensing: VGGFace2 supports both academic research and non-commercial use, as detailed on its website.

Official Website: https://paperswithcode.com/dataset/vggface2-1

CIFAR-10 & CIFAR-100

Description: The CIFAR-10 and CIFAR-100 datasets are subsets derived from the larger 80 million tiny image collection, assembled by researchers Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. These datasets are designed to aid the analysis of real-world imagery. CIFAR-10 contains 60,000 color images of 32x32 pixels each, spread across 10 categories, with each category containing 6,000 images. The dataset is divided into 50,000 training images and 10,000 testing images, covering a variety of subjects such as animals and vehicles.

In contrast, CIFAR-100 builds upon this by offering 100 categories, each with 600 images, totaling the same 60,000 images but with a more granular categorization. It assigns 500 images for training and 100 for testing per category. Additionally, CIFAR-100 classifies its categories into 20 supercategories, with each image labeled with both a “fine” label for its specific category and a “coarse” label that indicates its supercategory.

These datasets aim to advance the field of image recognition by providing a detailed and diverse collection of images that surpasses what previous datasets offered. They support the development of algorithms capable of differentiating and recognizing a wide variety of object types, thus enhancing computer vision towards a more human-like perception.

History: Developed by researchers at the University of Toronto, the CIFAR-10 and CIFAR-100 datasets were launched in 2009. Since their introduction, they have primarily served as benchmarks within the academic community and have not undergone regular updates.

Licensing: Both CIFAR-10 and CIFAR-100 are freely available for academic and educational use, under a license that supports their wide use in research and development within the field of image recognition (licensing information can be found on the official site).

Official Site: https://www.cs.toronto.edu/~kriz/cifar.html

IMDB-WIKI

Description: The IMDB-WIKI dataset was created to overcome the limitations of existing small to medium-sized face image datasets, which typically lack extensive age data and seldom feature more than several tens of thousands of images. Drawing from the IMDb website, its developers compiled a list of the top 100,000 actors, systematically gathering their birth dates, names, genders, and all associated images.

Similarly, profile images and the same types of metadata were collected from Wikipedia pages. By assuming that images featuring a single face likely represent the actor and trusting the accuracy of the timestamps and birth dates provided, a real biological age was assigned to each image. As a result, the IMDB-WIKI dataset includes 460,723 face images from 20,284 celebrities indexed on IMDb, complemented by an additional 62,328 images from Wikipedia, culminating in a total of 523,051 images that are particularly useful for facial recognition training.

History: The IMDB-WIKI dataset was developed by researchers at ETH Zurich in 2015. Since its introduction, it has not been regularly updated, yet it remains a significant resource in the realm of computer vision and facial recognition research.

Licensing: The MDB-WIKI dataset can be used only for non-commercial and research purposes (licensing information can be found on the official site).

Official Site: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

Open Images Dataset by Google

Description: The Open Images Dataset, developed by Google, stands out as one of the most expansive and meticulously detailed public image datasets currently available. It is tailored to accommodate the diverse needs associated with computer vision applications. Spanning a wide assortment of categories, from ordinary objects to complex scenes and actions, this dataset aims to surpass prior collections by providing a comprehensive range of detailed annotations across a wide spectrum of subjects.

Essential for various computer vision endeavors such as image classification, object detection, visual relationship detection, and instance segmentation, the Open Images Dataset serves as a valuable resource for enhancing machine learning models.

Diving into specifics, the dataset includes:

  • 15,851,536 bounding boxes across 600 object classes,
  • 2,785,498 instance segmentations in 350 classes,
  • 3,284,280 annotations detailing 1,466 types of relationships,
  • 675,155 localized narratives that offer rich, descriptive insights,
  • 66,391,027 point-level annotations over 5,827 classes, showcasing the dataset’s depth in granularity,
  • 61,404,966 image-level labels spanning 20,638 classes, highlighting the dataset’s broad scope,
  • An extension that further enriches the collection with 478,000 crowdsourced images categorized into over 6,000 class descriptions.

History: The Open Images Dataset, created by Google, was first launched in 2016. It has undergone consistent updates, culminating in its final iteration, Version 6, released in 2020. This latest version includes improved annotations and broadened categories, designed to aid in the development of more precise and varied computer vision models.

Licensing: The annotations are licensed by Google LLC under CC BY 4.0 license. The images are listed as having a CC BY 2.0 license. Both licenses support academic research and commercial use, promoting its application across a wide array of projects and developments in the field of computer vision.

Official Site: https://storage.googleapis.com/openimages/web/index.html

SUN Database: Scene Categorization Benchmark

Description: The SUN dataset is an extensive and meticulously curated collection designed for the identification and categorization of diverse scenes. It stands out for its broad spectrum of environments, including everything from indoor settings to wide outdoor locales. This diversity addresses the gap in scene datasets, which typically focus more on object detection than scene variety. The SUN Database aims to enhance our comprehension of complex scenes and their contexts by providing a multitude of scene types along with detailed annotations.

Essential for various computer vision applications, such as scene classification, scene layout analysis, and object detection across different contexts, the SUN dataset includes over 130,000 images spanning more than 900 scene types, each richly annotated to facilitate precise scene recognition.

History: Developed by teams at Princeton University and Brown University, the SUN dataset was initially introduced in 2010. Although it has not received regular updates since its debut, it continues to be a crucial resource in the computer vision community, aiding in the advancement of scene-understanding technologies.

Licensing: The SUN Database is distributed under terms that permit academic research, provided there is proper attribution to the creators and the dataset itself.

Official site: https://vision.princeton.edu/projects/2010/SUN/

Conclusion

We sincerely hope this article has been informative and beneficial for your model training and everyday computer vision activities. If you haven’t found exactly what you need, please continue to follow us on our social media channels. We are committed to sharing insights on how to develop, annotate, and manage your personalized dataset that meets your specific requirements.

Stay curious, keep annotating!

--

--