Extending Activeloop Hub capabilities to handle Waymo Open Dataset

Apr 1, 2020 · 3 min read

Too much time is spent on setting up the data. With well-designed data pipelines, rapid iterations of machine learning experiments will result in models with superhuman accuracy much faster.

We are releasing simple, but yet powerful, python native access to Waymo Open Dataset [1]. The Hub package allows streaming any chunk of the data to your local machine. It could be used not only for fast exploration or visualization purposes but also for directly training machine learning models.

Waymo Open Dataset

Waymo published one of the very first large-scale autonomous driving datasets for the research community. This includes a high-quality multimodal sensor dataset that covers a wide variety of environments. The goal is to help researchers gain advances in 2D and 3D perception, including scene understanding and behavior prediction.

The data is about 2TB after compression. To access it, one downloads 89 tar files and uncompresses them into 1950 .tfrecords files. Then, use the waymo_open_dataset package to load the data. But Snark Hub can simplify access to this data and significantly reduce download/access time.

Snark Hub

At Snark, we have released an open-source package called Hub to manage large scale datasets. The package lets you represent large arrays on the cloud or on remote storage as if they are local NumPy arrays. We want to simplify access to the data for exploration and ML training purposes.


Get Access

To access the data, you will need to register at Waymo Open and accept the license agreement. As noted on their website, it may take up to 2 business days to be granted access to their Google Cloud Storage Bucket.

Then, authenticate the Google Cloud inside your terminal by running.

gcloud auth application-default login
gcloud init

Install python package simply by running.

pip3 install hub==0.5

Enjoy simple, yet powerful, access to the data inside your python script.

import hub
waymo = hub.gs('waymo_open_dataset_snark').connect()
ds = waymo.dataset_open('v1/training')
ds['images'].shape # [158361, 5, 1280, 1920, 3]
ds['images'][0,0].mean() # 106.92709309895834

To visualize a single image without requiring you to download the rest of the data simply run it.

import hub
from PIL import Image
waymo = hub.gs('waymo_open_dataset_snark').connect() camera = waymo.array_open('v1/training/images')
for i in range(0, 5):
img = camera[10000, i]
Image.fromarray(img, 'RGB').save(f'image-{i}.jpg')

You can also go beyond images and access laser point clouds and labels.

# Open the dataset
ds_train = waymo.dataset_open('v1/training')
ds_val = waymo.dataset_open('v1/validation')
# Get all arrays from the dataset

In the same way, you can access and explore the v1.2 version by just changing the naming for participating in the Waymo dataset challenge.

> ds_train = waymo.dataset_open('v1.2/training')
> print(ds_train.paths.keys())
dict_keys(['labels', 'lasers_camera_projection', 'images', 'lasers_range_image'])
> ds_train['lasers_range_image'].shape
[158081, 5, 2, 200, 2650, 4]

You can access domain adaptation datasets

> waymo.dataset_open('v1.2/domain_adaptation/training')
> waymo.dataset_open('v1.2/domain_adaptation/training/unlabeled')

Next Step

We plan to provide future tutorials to let you directly train machine learning models while streaming the data through data pipelines from Hub.


Thanks, Waymo for hosting the data backend.

[1] Sun, Pei, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo et al. “Scalability in Perception for Autonomous Driving: An Open Dataset Benchmark.” arXiv preprint arXiv:1912.04838 (2019).


Stay in the loop of the latest and greatest in the world of AI.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store