Snark Hub is hosting nuScenes dataset for autonomous driving

Sep 10, 2019 · 2 min read

At Snark, we have released an open-source package called Hub to manage large scale datasets. The package lets you represent large arrays on the cloud or on the remote storage as if they are local numpy arrays. You can read more in the previous blog post. The package can be installed through pip.

pip install hub

nuScene Dataset

Recently Aptiv published a large-scale autonomous driving dataset to facilitate research. It contains a comprehensive autonomous vehicle (AV) sensor suite, including data from 6 cameras, 1 LIDAR, 5 RADAR, GPS & IMU. In addition, it was annotated at 2 Hz with 1.1 million 3D bounding boxes from 23 classes, with 8 attributes, such as visibility, activity, and pose. There are 1000 scenes each with a duration of 20s captured in Boston and Singapore [1].

For the train-val split, there are 10 files each with around 30GB of file size. They need to be downloaded from and get uncompressed totaling in 400GB data. In addition, Aptiv open-sourced the devkit to efficiently work with the data.

Accessing through Hub

You can access the dataset without the hustle of downloading and uncompressing files. Before downloading the data please check nuScenes’s terms of use.

> import hub
> nuscenes = hub.dataset(name='aptiv/nutonomy:v1.0-trainval')
> nuscenes[0]
# output a list of arrays corresponding to each sensor

To access only a single sensor data, print all sensors and choose the first frame

> nuscenes.keys.keys()
dict_keys(['RADAR_FRONT', 'LIDAR_TOP', ..., 'CAM_FRONT_LEFT'])
> nuscenes['CAM_FRONT_LEFT']
<hub.marray.array.HubArray object at 0x10f347cf8>
> nuscenes['CAM_FRONT_LEFT'].shape
[400000, 900, 1600, 3]
> nuscenes['CAM_FRONT_LEFT'][0].mean()


Limitations of Hub package will be addressed in future releases.

Next steps

Besides addressing the limitations mentioned above, we are planning to include labels and bounding boxes so that you can directly use PyTorch and Tensorflow data loaders to train deep learning models. You can find deep learning benchmarks here.

In case you are looking to use Hub for your own datasets, we also provide the script for uploading nuScenes here.


We would like to thank Holger Caeser for his support and feedback, separately, thanks to Aptiv for allowing us to host the dataset.

[1] Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O. nuScenes: A multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027. 2019 Mar 26.


Stay in the loop of the latest and greatest in the world of AI.


Activeloop Blog


Written by

Serverless AI Training and Deployment at Scale


Activeloop Blog