Unlocking Self-Driving Research: The Lyft Level 5 Perception Dataset and Competition

Published in

Woven Planet Level 5

4 min readJul 23, 2019

By Luc Vincent, EVP Autonomous Technology

At Level 5, we believe self-driving technology presents a rare opportunity to improve the quality of life in our communities. Avoidable collisions and vehicle emissions are choking our cities, while infrastructure strains under rapid urban growth.

Meanwhile, self-driving tech can save two lives every single minute, and is essential to combat climate change. Solving the autonomous vehicle challenge is not just an option — it’s a necessity.

Self-driving is too big — and too important — an endeavor for any one team to solve alone. Transportation serves all of us, and we should all be invested in the next step of its evolution. That is why today, I’m excited to announce that Lyft is releasing a subset of our autonomous driving data, the Level 5 Dataset, and we will be sponsoring a research competition.

A Dataset to Accelerate Innovation

The Level 5 Dataset is the largest publicly released dataset of its kind. It includes over 55,000 human-labeled 3D annotated frames, a drivable surface map, and an underlying HD spatial semantic map to contextualize the data.

Academic research accelerates innovation, but it requires costly data that is out of reach for most academic teams. Sensor hardware must be built and properly calibrated, a localization stack is needed, and an HD semantic map must be created. Only then can you unlock higher-level functionality like 3D perception, prediction, and planning.

At Lyft Level 5, we’ve been perfecting our hardware and autonomy stack for the last two years. We want to share some of the data we have collected or derived along the way, to help level the playing field for all researchers interested in autonomous technology.

This dataset allows a broad cross-section of researchers to contribute meaningfully to downstream research in self-driving technology, and this is only the first step. We are committed to democratizing access to this tech, and we will release additional data as we continue on our journey.

Highlights From Our Team’s Progress

The efforts to deliver this dataset are just a small piece of the incredible work the Lyft Level 5 team is doing to advance the development of autonomous vehicles. We are currently operating an ongoing self-driving employee shuttle, and our fleet is accruing tens of thousands of autonomous miles to train our system.

We are already iterating on the third generation of Lyft’s self-driving car and have built a cutting-edge perception suite, patenting a new sensor array and a proprietary ultra-high dynamic range (100+DB) camera.

*Lyft Level 5’s Latest Perception Suite*

Because HD mapping is crucial to autonomous vehicles, our teams in Munich and Palo Alto have been building high-quality lidar-based geometric maps as well as high-definition semantic maps that are used by the autonomy stack.

Meanwhile, our team in London (formerly Blue Vision Labs) has been hard at work unlocking the scale of the Lyft fleet to build high quality, cost-effective geometric maps, using only a camera phone to capture the source data. This effort is essential for us to build large-scale mapping and data collection infrastructure — a unique “Lyfty” advantage that our team is leveraging.

Dataset and Competition Details

Collaboration doesn’t stop with our partners — we want to incentivize research to advance self-driving vehicles further. To do so, we will be launching a competition for individuals to train algorithms on the dataset.

For reference, the Lyft Level 5 Dataset includes:

Over 55,000 human-labeled 3D annotated frames;
Data from 7 cameras and up to 3 lidars;
A drivable surface map; and,
An underlying HD spatial semantic map (including lanes, crosswalks, etc.)

To ensure compatibility with existing work that may have been done using the nuScenes dataset, we’ve chosen to reuse the nuScenes format.

Our dataset makes it possible for researchers to work on a variety of problems, including prediction of agents over time; scene depth estimation from cameras with lidar as ground truth; object detection in 3D over the semantic map; scene segmentation using lidar and semantic maps; agent behavior classification; and many more.

We have segmented this dataset into training, validation, and testing sets — we will release the validation and testing sets once the competition opens.

There will be $25,000 in prizes, and we’ll be flying the top researchers to the NeurIPS Conference in December, as well as allowing the winners to interview with our team. Stay tuned for specific details of the competition!

In the meantime, if you are interested in helping us build the future of transportation, check out our open positions in Palo Alto, San Francisco, London, and Munich.

Unlocking Self-Driving Research: The Lyft Level 5 Perception Dataset and Competition

A Dataset to Accelerate Innovation

Highlights From Our Team’s Progress

Dataset and Competition Details

Written by Woven Planet Level 5