Datasets for Machine Learning in Autonomous Vehicles

Surya Gutta
Analytics Vidhya
Published in
13 min readJul 13, 2021

Datasets with multiple sensor modalities (LiDAR, RADAR, Stereo Camera, Thermal Camera etc.)

Car vector created by upklyak — www.freepik.com

A wide variety of sensors are used in autonomous vehicles. The diversity of sensing modalities helps in different weather conditions. The following is a popular list of autonomous driving datasets which have been published up to date.

Note: As new datasets are being released every couple of months, I will be updating this page every month if a popular dataset is released.

Name: ONCE Dataset (One millioN sCenEs)- Huawei Corp.

Published Year: 2021

Sensor Type: Camera, LiDAR

Recording Area: China

Description: ONCE(One millioN sCenEs) dataset can be used for 3D object detection in the autonomous driving scenario. The ONCE dataset consists of 1million LiDAR scenes and 7 million corresponding camera images. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available (e.g. nuScenes and Waymo), and it is collected across a range of different areas, periods and weather conditions. It has 15k fully annotated scenes with 5 classes (Car, Bus, Truck, Pedestrian, Cyclist). In the ONCE dataset, there are 3 weather conditions, i.e., sunny, cloudy, rainy, and 4 time periods, i.e., morning, noon, afternoon, night, for every labeled and unlabeled scene. The information, i.e., weather, period, timestamp, pose, calibration, annotations, are in a single JSON file for each scene.

Dataset Download and Paper(s): Dataset , Paper

Name: All-In-One Drive (AIODrive)

Published Year: 2020

Sensor Type: Camera, LiDAR, Radar

Recording Area: NA

Description: A Large-Scale Comprehensive Perception Dataset with High-Density Long-Range Point Clouds. It is a large-scale synthetic dataset that provides comprehensive sensors, annotations, and environmental variations. It has

  1. Eight sensor modalities (RGB, Stereo, Depth, LiDAR, SPAD-LiDAR, Radar, IMU, GPS)
  2. Annotations for all mainstream perception tasks (e.g., detection, tracking, trajectory prediction, segmentation, depth estimation)
  3. Rare driving scenarios such as adverse weather and lighting, crowded scenes, high-speed driving, violation of traffic rules, and accidents

It has high-density long-range point clouds for LiDAR and SPAD-LiDAR sensors, about ten times denser and larger sensing range than Velodyne-64.

Dataset Download and Paper(s): Dataset , Paper

Name: Ford Multi-AV Seasonal dataset

Published Year: 2020

Sensor Type: Camera, LiDAR

Recording Area: United States (Michigan)

Description: The multi-agent seasonal dataset was collected by a fleet of Ford autonomous vehicles on different days and times during 2017–18. The vehicles were manually driven on a route in Michigan that included a mix of driving scenarios, including the Detroit Airport, freeways, city-centers, university campus, and suburban neighborhood. The dataset has seasonal variation in weather, lighting, construction, and traffic conditions experienced in dynamic urban environments.

Dataset Download and Paper(s): Dataset, Paper

Name: Dense Depth for Autonomous Driving (DDAD) — Toyota Research Institute

Published Year: 2020

Sensor Type: Camera, LiDAR

Recording Area: United States (San Francisco, Bay Area, Cambridge, Detroit, Ann Arbor) and Japan (Tokyo, Odaiba)

Description: DDAD is a new autonomous driving benchmark from TRI (Toyota Research Institute) for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth (across a full 360 degree field of view) generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting.

Dataset Download and Paper(s): Dataset , Paper

Name: PandaSet

Published Year: 2020

Sensor Type: Camera, LiDAR

Recording Area: United States (San Francisco, El Camino Real from Palo Alto to San Mateo)

Description: PandaSet combines Hesai’s best-in-class LiDAR sensors with Scale AI’s high-quality data annotation. PandaSet features data collected using a forward-facing LiDAR with image-like resolution (PandarGT) as well as a mechanical spinning LiDAR (Pandar64). The collected data was annotated with a combination of cuboid and segmentation annotation (Scale 3D Sensor Fusion Segmentation).

Dataset Download and Paper(s): Dataset

Name: Canadian Adverse Driving Conditions (CADC)

Published Year: 2020

Sensor Type: Camera, LiDAR

Recording Area: Canada (Waterloo)

Description: The Canadian Adverse Driving Conditions (CADC) dataset was collected with the Autonomoose autonomous vehicle platform, based on a modified Lincoln MKZ. The dataset, collected during winter within the region of Waterloo, Canada, is the first autonomous vehicle dataset that focuses on adverse driving conditions specifically. It contains 7,000 frames collected through a variety of winter weather conditions of annotated data from 8 cameras (Ximea MQ013CG-E2), Lidar (VLP-32C), and a GNSS+INS system (Novatel OEM638). The sensors are time-synchronized and calibrated with the intrinsic and extrinsic calibrations included in the dataset. Lidar frame annotations that represent ground truth for 3D object detection and tracking have been provided by Scale AI.

Dataset Download and Paper(s): Dataset, Paper

Name: A2D2: Audi Autonomous Driving Dataset

Published Year: 2020

Sensor Type: Camera, LiDAR, Bus data

Recording Area: Germany (Gaimersheim, Munich, and Ingolstadt)

Description: The dataset consists of simultaneously recorded images and 3D point clouds, together with 3D bounding boxes, semantic segmentation, instance segmentation, and data extracted from the automotive bus. The sensor suite consists of six cameras and five LiDAR units, providing full 360-degree coverage. The recorded data is time-synchronized and mutually registered. The dataset features 2D semantic segmentation, 3D point clouds, 3D bounding boxes, and vehicle bus data. All sensor signals are timestamped in UTC format.

Dataset Download and Paper(s): Dataset , Paper

Name: A*3D Dataset

Published Year: 2019

Sensor Type: Camera, LiDAR

Recording Area: Singapore

Description: The A*3D dataset consists of RGB images and LiDAR data with a significant diversity of the scene, time, and weather. The dataset consists of high-density images (≈10times more than the pioneering KITTI dataset), heavy occlusions, a large number of night-time frames (≈3times the scenes dataset), addressing the gaps in the existing datasets to push the boundaries of tasks in autonomous driving research to more challenging highly diverse environments. The data collection covers the entire Singapore, including highways, neighborhood roads, tunnels, urban, suburban, industrial, HDB car parks, coastline, etc.

Dataset Download and Paper(s): Dataset , Paper

Name: EuroCity Persons (ECP)

Published Year: 2019

Sensor Type: Camera, LiDAR

Recording Area: Europe (31 cities in 12 countries)

Description: The EuroCity Persons dataset provides a large number of highly diverse, accurate, and detailed annotations of pedestrians, cyclists, and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238200 person instances manually labeled in over 47300 images, EuroCity Persons is nearly one order of magnitude larger than person datasets used previously for benchmarking. The dataset furthermore contains a large number of person orientation annotations (over 211200).

Dataset Download and Paper(s): Dataset, Paper

Name: Oxford RobotCar Dataset

Published Year: 2019 and 2016

Sensor Type: 2019 - Camera, Radar, LiDAR. 2016 - Camera, LiDAR

Recording Area: UK (Oxford)

Description: 2019 - The Oxford Radar Robot-Car dataset can be used for researching scene understanding using Millimetre-Wave FMCW scanning radar data. The target application is autonomous vehicles, where this modality is robust to environmental conditions such as fog, rain, snow, or lens flare, which typically challenge other sensor modalities such as vision and LIDAR. The data were gathered in January 2019 over thirty-two traversals of a central Oxford route spanning a total of 280 km in urban driving. It encompasses a variety of weather, traffic, and lighting conditions.

2016 - The original dataset release consisted of over 20 TB of vehicle-mounted monocular and stereo imagery, 2d And 3D LIDAR, as well as inertial and GPS data collected over a year of driving in Oxford, UK. More than100 traversals of a 10km route were performed over this period to capture scene variation over a range of timescales, from 24 day/night illumination cycle to long-term seasonal variations.

Dataset Download and Paper(s): 2019 dataset, 2019 Paper, 2016 dataset, 2016 Paper

Name: Waymo Open Dataset

Published Year: 2021 and 2019

Sensor Type: Camera, LiDAR

Recording Area: United States (San Francisco, Mountain View, Los Angeles, Detroit, Seattle, Phoenix)

Description: The Waymo Open Dataset was first launched in August 2019 with a perception dataset comprising high-resolution sensor data and labels for 1,950 segments. In March 2021, expanded the Waymo Open Dataset to also include a motion dataset comprising object trajectories and corresponding 3D maps for 103,354 segments. Waymo Open Dataset includes a wide variety of environments, objects, and weather conditions (Downtown, Suburban, Daylight, Night Time, Pedestrians, Cyclists, Construction, Diverse Weather).

Dataset Download and Paper(s): Dataset, 2021 Paper, 2019 Paper

Name: Lyft Level 5 Dataset

Published Year: 2019

Sensor Type: Camera, LiDAR, Radar

Recording Area: United States (Palo Alto)

Description: The dataset can be used for motion prediction with over 1,000 hours of data. This was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California, over a four-month period. It Consists of 170,000 scenes, where each scene is 25 seconds long and captures the perception output of the self-driving system, which encodes the precise positions and motions of nearby vehicles, cyclists, and pedestrians over time. On Top of this, the dataset contains a high-definition semantic map with 15,242 labeled elements and a high-definition aerial view over the area. The datasets include a high-definition semantic map to provide context about traffic agents and their motion. The map features over 4,000 manually annotated semantic elements, including lane segments, pedestrian crosswalks, stop signs, parking zones, speed bumps, and speed humps. Datasets include elements from real-world scenarios, including vehicles, pedestrians, intersections, and multi-lane traffic.

Dataset Download and Paper(s): Dataset, Paper

Name: Argoverse

Published Year: 2019

Sensor Type: Camera, LiDAR

Recording Area: United States (Pittsburg, Miami)

Description: Argoverse was collected by a fleet of autonomous vehicles in Pittsburgh and Miami. The Argoverse 3D Tracking dataset includes 360-degree images from 7 cameras with overlapping fields of view, 3D point clouds from long-range LiDAR, 6-DOF pose, and 3D track annotations. It provides forward-facing stereo imagery. The Argoverse Motion Forecasting dataset includes more than 300,000 5-second tracked scenarios with a particular vehicle identified for trajectory forecasting. Argoverse is the first autonomous vehicle dataset to include “HD maps” with 290 km of mapped lanes with geometric and semantic metadata. It provides rich semantic information about road infrastructure and traffic rules.

Dataset Download and Paper(s): Dataset, Paper

Name: nuScenes dataset

Published Year: 2019

Sensor Type: Camera, LiDAR, Radar

Recording Area: United States (Boston), Singapore

Description: nuTonomy scenes (nuScenes) carry the fully autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with a full 360-degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. The data is from Boston (Seaport and South Boston) and Singapore (One North, Holland Village, and Queenstown), two cities that are known for their dense traffic and highly challenging driving situations. There is diversity across locations in terms of vegetation, buildings, vehicles, road markings, and right versus left-hand traffic.

Dataset Download and Paper(s): Dataset, Paper

Name: BLVD: Building A Large-scale 5D Semantics Benchmark for Autonomous Driving

Published Year: 2019

Sensor Type: Camera, 3D LiDAR

Recording Area: China(Changshu)

Description: BLVD, a large-scale 5D semantics benchmark, aims to provide a platform for the tasks of dynamic 4D (3D+temporal) tracking, 5D (4D+interactive) interactive event recognition, and intention prediction. BLVD dataset contains 654 high-resolution video clips owing 120k frames extracted from Changshu, Jiangsu Province, China, where the Intelligent Vehicle Proving Center of China (IVPCC) is located. The frame rate is 10fps/sec for RGB data and 3D point cloud. Fully annotated all the frames and totally yielded 249, 129 3D annotations, 4, 902 independent individuals for tracking with the length of overall 214, 922 points, 6, 004 valid fragments for 5D interactive event recognition, and 4,900 individuals for 5D intention prediction. These tasks are contained in four kinds of scenarios depending on the object density (low and high) and light conditions (daytime and nighttime).

Dataset Download and Paper(s): Dataset, Paper

Name: H3D - Honda 3D Dataset

Published Year: 2019

Sensor Type: Camera, LiDAR

Recording Area: United States (San Francisco)

Description: Honda Research Institute 3D Dataset (H3D), large-scale full-surround 3D multi-object detection and tracking dataset collected using a 3D LiDAR scanner. H3D comprises 160 crowded and highly interactive traffic scenes with a total of 1 million labeled instances in 27,721 frames. With unique dataset size, rich annotations, and complex scenes, H3Dis gathered to stimulate research on full-surround 3D multi-object detection and tracking. It is gathered from the HDD dataset, a large-scale naturalistic driving dataset collected in the San Francisco Bay Area. H3D consists of 1) Full 360-degree LiDAR dataset (dense point cloud from Velodyne-64) 2) 160 crowded and highly interactive traffic scenes 3) 1,071,302 3D bounding box labels 4) 8 common classes of traffic participants (Manually annotated every 2Hz and linearly propagated for 10 Hz data) 9) Benchmarked on state-of-the art algorithms for 3D only detection and tracking algorithms.

Dataset Download and Paper(s): Dataset, Paper

Name: ApolloScape

Published Year: 2019

Sensor Type: Camera, LiDAR

Recording Area: China

Description: ApolloScape contains much large and richer labeling, including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labeling, lane mark labeling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities, and daytimes. The dataset contains 140K+ annotated images with annotation of lanes. For 3D object detection, it annotates 3D bounding boxes of objects in 6K+ point clouds. It consists of data from 4 regions in China in various weather conditions.

Dataset Download and Paper(s): Dataset, Paper

Name: DBNet

Published Year: 2018

Sensor Type: Camera, LiDAR

Recording Area: China

Description: DBNet is a large-scale dataset for driving behavior research. It includes aligned video, point cloud, GPS, and driver behavior (speed and wheel), which captures 1000 KM real-world driving data. The LiDAR-Video dataset provides large-scale, high-quality point clouds scanned by a Velodyne laser, videos recorded by a dashboard camera, and standard drivers’ behaviors.

Dataset Download and Paper(s): Dataset, Paper

Name: KAIST multispectral dataset (2018) and KAIST Multispectral Pedestrian dataset (2015)

Published Year: 2018 and 2015

Sensor Type: 2018 - Camera (Visual and Thermal), LiDAR. 2015 - Camera (Visual and Thermal)

Recording Area: South Korea (Seoul)

Description: 2018: data set provides the different perspectives of the world captured in coarse time slots (day and night), in addition to fine time slots (sunrise, morning, afternoon, sunset, night, and dawn). For the all-day perception of autonomous systems, a thermal imaging camera can be used. Toward this goal, developed a multi-sensor platform, which supports the use of a co-aligned RGB/Thermal camera, RGB stereo, 3-D LiDAR, and inertial sensors (GPS/IMU) and a related calibration technique.

2015: multispectral pedestrian dataset, which provides well-aligned color-thermal image pairs captured by beam splitter-based special hardware. The color-thermal dataset is as large as previous color-based datasets and provides dense annotations, including temporal correspondences. With this dataset, the team introduced multispectral ACF, which is an extension of aggregated channel features (ACF) to simultaneously handle color-thermal image pairs. Multi-spectral ACF reduces the average miss rate of ACF by 15%.

Dataset Download and Paper(s): 2018 Dataset, 2018 paper, 2018 paper download option-2, 2015 dataset, 2015 paper

Name: FLIR ADAS Dataset

Published Year: 2018

Sensor Type: Camera (Visual and Thermal)

Recording Area: United States (Santa Barbara)

Description: The dataset features a compilation of more than 10,000 annotated thermal images of people, cars, other vehicles, bicycles, and dogs in day and night time scenarios. It is primarily captured in streets and highways in Santa Barbara, California, the US, with clear-sky conditions both day and night. Annotations exist for the thermal images based on the COCO annotation scheme. However, no annotations exist for the corresponding visible images.

Dataset Download and Paper(s): Dataset

Name: KITTI

Published Year: 2015, 2013, and 2012

Sensor Type: Camera, LiDAR

Recording Area: Germany (Karlsruhe)

Description: For the tasks stereo, optical flow, visual odometry, 3D object detection, and 3D tracking, equipped a standard station wagon with two high-resolution color and grayscale video cameras. Accurate ground truth is provided by a Velodyne laser scanner and a GPS localization system. Datasets are captured by driving around the mid-size city of Karlsruhe, in rural areas, and on highways. Up to 15 cars and 30 pedestrians are visible per image, besides providing all data in raw format, extracted benchmarks for each task.

Dataset Download and Paper(s): Dataset, 2012 Paper, 2012 Paper download option-2, 2013 Paper-1, 2013 paper-1 download option-2, 2013 Paper-2, 2013 Paper-2 download option-2, 2015 Paper, 2015 Paper download option-2

Thank you for reading! Please 👏and follow me if you liked this post, as it encourages me to write more!

--

--

Surya Gutta
Analytics Vidhya

Software Architect | Machine Learning | Statistics | AWS | GCP