Unlocking Access to Self-Driving Research: The Lyft Level 5 Dataset and Competition
By Luc Vincent, EVP Autonomous Technology
At Lyft, we believe self-driving technology presents a rare opportunity to improve the quality of life in our communities. Avoidable collisions, single-occupant commuters, and vehicle emissions are choking our cities, while infrastructure strains under rapid urban growth.
Our path forward to solve these challenges is clear: build the world’s best transportation and offer a viable alternative to car ownership. And that translates to an efficient ecosystem of connected transit, bikes, scooters, and shared rides from drivers as well as self-driving cars. Solving the autonomous vehicle challenge is not just an option — it’s a necessity.
Logan Green, Lyft’s founder and CEO, has committed to making this vision a reality: “Not only can self-driving tech save two lives every single minute, it is essential to combat climate change by allowing people to ditch their cars for shared electric transportation. Lyft is committed to leading this transportation revolution.”
Self-driving is too big — and too important — an endeavor for any one team to solve alone. Transportation serves all of us, and we should all be invested in the next step of its evolution.
That is why today, I’m excited to announce that Lyft is releasing a subset of our autonomous driving data, the Level 5 Dataset, and we will be sponsoring a research competition.
A Dataset to Accelerate Innovation
The Level 5 Dataset is the largest publicly released dataset of its kind. It includes over 55,000 human-labeled 3D annotated frames, a drivable surface map, and an underlying HD spatial semantic map to contextualize the data.
Academic research accelerates innovation, but it requires costly data that is out of reach for most academic teams. Sensor hardware must be built and properly calibrated, a localization stack is needed, and an HD semantic map must be created. Only then can you unlock higher-level functionality like 3D perception, prediction, and planning.
At Lyft Level 5, we’ve been perfecting our hardware and autonomy stack for the last two years. We want to share some of the data we have collected or derived along the way, to help level the playing field for all researchers interested in autonomous technology.
This dataset allows a broad cross-section of researchers to contribute meaningfully to downstream research in self-driving technology, and this is only the first step. We are committed to democratizing access to this tech, and we will release additional data as we continue on our journey.
Highlights From Our Team’s Progress
The efforts to deliver this dataset are just a small piece of the incredible work the Lyft Level 5 team is doing to advance the development of autonomous vehicles. We are currently operating an ongoing self-driving employee shuttle, and our fleet is accruing tens of thousands of autonomous miles to train our system.
We are already iterating on the third generation of Lyft’s self-driving car and have built a cutting-edge perception suite, patenting a new sensor array and a proprietary ultra-high dynamic range (100+DB) camera.
Because HD mapping is crucial to autonomous vehicles, our teams in Munich and Palo Alto have been building high-quality lidar-based geometric maps as well as high-definition semantic maps that are used by the autonomy stack.
Meanwhile, our team in London (formerly Blue Vision Labs) has been hard at work unlocking the scale of the Lyft fleet to build high quality, cost-effective geometric maps, using only a camera phone to capture the source data. This effort is essential for us to build large-scale mapping and data collection infrastructure — a unique “Lyfty” advantage that our team is leveraging.
Finally, we know collaboration is crucial to bringing self-driving technology to our communities, so Lyft’s autonomous platform team has been deploying partner vehicles on the Lyft network. With our partner Aptiv, we have successfully provided over 50,000 self-driving rides to Lyft passengers in Las Vegas, which is the largest paid commercial self-driving service in operation. Waymo vehicles are also now available on the Lyft network in Arizona, expanding the opportunity for our passengers to experience self-driving rides.
Dataset and Competition Details
Collaboration doesn’t stop with our partners — we want to incentivize research to advance self-driving vehicles further. To do so, we will be launching a competition for individuals to train algorithms on the dataset.
For reference, the Lyft Level 5 Dataset includes:
- Over 55,000 human-labeled 3D annotated frames;
- Data from 7 cameras and up to 3 lidars;
- A drivable surface map; and,
- An underlying HD spatial semantic map (including lanes, crosswalks, etc.)
To ensure compatibility with existing work that may have been done using the nuScenes dataset, we’ve chosen to reuse the nuScenes format.
Our dataset makes it possible for researchers to work on a variety of problems, including prediction of agents over time; scene depth estimation from cameras with lidar as ground truth; object detection in 3D over the semantic map; scene segmentation using lidar and semantic maps; agent behavior classification; and many more.
We have segmented this dataset into training, validation, and testing sets — we will release the validation and testing sets once the competition opens.
There will be $25,000 in prizes, and we’ll be flying the top researchers to the NeurIPS Conference in December, as well as allowing the winners to interview with our team. Stay tuned for specific details of the competition!
In the meantime, if you are interested in helping us build the future of transportation, check out our open positions in Palo Alto, San Francisco, London, and Munich.