Activeloop Hub is hosting nuScenes dataset for autonomous driving

Published in

Activeloop

2 min readSep 10, 2019

At Snark, we have released an open-source package called Hub to manage large scale datasets. The package lets you represent large arrays on the cloud or on the remote storage as if they are local numpy arrays. You can read more in the previous blog post. The package can be installed through pip.

pip install hub

nuScene Dataset

Recently Aptiv published a large-scale autonomous driving dataset to facilitate research. It contains a comprehensive autonomous vehicle (AV) sensor suite, including data from 6 cameras, 1 LIDAR, 5 RADAR, GPS & IMU. In addition, it was annotated at 2 Hz with 1.1 million 3D bounding boxes from 23 classes, with 8 attributes, such as visibility, activity, and pose. There are 1000 scenes each with a duration of 20s captured in Boston and Singapore [1].

For the train-val split, there are 10 files each with around 30GB of file size. They need to be downloaded from nuScenes.org and get uncompressed totaling in 400GB data. In addition, Aptiv open-sourced the devkit to efficiently work with the data.

Accessing through Hub

You can access the dataset without the hustle of downloading and uncompressing files. Before downloading the data please check nuScenes’s terms of use.

> import hub
> nuscenes = hub.dataset(name='aptiv/nutonomy:v1.0-trainval')
> nuscenes[0] 
# output a list of arrays corresponding to each sensor

To access only a single sensor data, print all sensors and choose the first frame

> nuscenes.keys.keys()
dict_keys(['RADAR_FRONT', 'LIDAR_TOP', ..., 'CAM_FRONT_LEFT'])> nuscenes['CAM_FRONT_LEFT']
<hub.marray.array.HubArray object at 0x10f347cf8>> nuscenes['CAM_FRONT_LEFT'].shape
[400000, 900, 1600, 3]> nuscenes['CAM_FRONT_LEFT'][0].mean()
107.05682847222222

Limitations

Limitations of Hub package will be addressed in future releases.

Dynamic Shapes: Radar and Lidar data are point clouds. Since the number of points is not fixed, we took the maximum to represent the shape of the array and padded with zeros the rest.
Relational Schemas: The dataset also includes a relational schema to index the data. Currently, relational schemas are not natively supported by Hub. So the package is a container for storing arrays.

Next steps

Besides addressing the limitations mentioned above, we are planning to include labels and bounding boxes so that you can directly use PyTorch and Tensorflow data loaders to train deep learning models. You can find deep learning benchmarks here.

In case you are looking to use Hub for your own datasets, we also provide the script for uploading nuScenes here.

Acknowledgment

We would like to thank Holger Caeser for his support and feedback, separately, thanks to Aptiv for allowing us to host the dataset.

[1] Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O. nuScenes: A multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027. 2019 Mar 26.