Image for post
Image for post

Rethinking Maps for Self-Driving

Lyft Level 5
Oct 15, 2018 · 11 min read
Image for post
Image for post
Image for post
Image for post

What is the map?

HD maps are not new. Several mapping and self-driving companies have already started to produce and consume HD maps. However, it is still early days in terms of how these maps are being built, the richness of information they contain, and how accurate they are. Companies are iterating quickly on making these HD maps better and as such there is little standardization between various providers and consumers.

HD Map Principles

There are four main principles that capture how we define the map and go about building it.

Mapping Vehicles

We build HD maps using a fleet of cars containing state of the art sensors such as HD cameras, lidar, radar, GPS, and inertial measurement units (IMUs). Early on we chose to keep the sensor configuration and hardware build to be the same as that used by our self-driving cars. This constraint makes it easy for us to ensure that the HD maps we build work correctly with the subsequent autonomy software stack which supports various self-driving functions such as localization, perception, prediction, and planning. In our experience, sharing cars interchangeably between mapping and autonomy feature development gets us better utilization of our autonomy fleet while keeping fleet management overhead low. Similarly, having a single hardware SKU makes it simpler for us to manage hardware development. Our map build process also uses the same data collect logs as autonomy feature development. This enables us to re-use any and all miles collected by the car to build better maps. Map build data collection runs don’t need the self-driving car to be engaged in autonomy mode aka the car is driving by itself. However, we do keep many of the autonomy services running, especially localization and perception, in passive mode as their outputs help greatly in subsequent map build processing. For example, perception output can help with removal of dynamic objects that aren’t part of the HD map and localization match errors can quickly help us detect which parts of the map are stale.

A Layered Map

Image for post
Image for post
Image for post
Image for post
Geometric map

Geometric Map Layer

The geometric map layer contains 3D information of the world. This information is organized in very high detail to support precise calculations. Raw sensor data from lidar, various cameras, GPS, and IMUs is processed using simultaneous localization and mapping (SLAM) algorithms to first build a 3D view of the region explored by the mapping data collect run. The outputs of the SLAM algorithm are an aligned dense 3D point cloud and a very precise trajectory that the mapping vehicle took. The vehicle trajectory is shown in pink. Each of the 3D points is colored using the colors observed for that 3D point in the corresponding camera images. The 3D point cloud is post-processed to produce derived map objects that are stored in the geometric map. Two important derived objects are the voxelized geometric maps and a ground map. The voxelized geometric map is produced by segmenting the point cloud into voxels that are as small as 5cm x 5cm x 5cm. During real-time operation, the geometric map is the most efficient way to access point cloud information. It offers a good trade-off between accuracy and speed. Segmentation algorithms identify 3D points in the point cloud for building a model of the ground, defined as the driveable surface part of the map. These ground points are used to build a parametric model of the ground in small sections. The ground map is key for aligning the subsequent layers of the map, such as the semantic map.

Image for post
Image for post
Semantic map

Semantic Map Layer

The semantic map layer builds on the geometric map layer by adding semantic objects. Semantic objects include various traffic 2D and 3D objects such as lane boundaries, intersections, crosswalks, parking spots, stop signs, traffic lights, etc. that are used for driving safely. These objects contain rich metadata associated with them such as speed limits and turn restrictions for lanes. While the 3D point cloud might contain all of the pixels and voxels that represent a traffic light, it is in the semantic map layer that a clean 3D object identifying the 3D location and bounding box for the traffic light and its various components are stored. We use a combination of heuristics, computer vision, and point classification algorithms to generate hypotheses for these semantic objects and their metadata. The output of these algorithms isn’t accurate enough for us to produce a high fidelity map. Human operators post-process these hypotheses via rich visualization and annotation tools to both validate the quality and fix any misses. For example, to identify traffic lights, we first run a traffic light detector on the camera images. Visual SLAM is used to process multiple camera images to get a coarse location of the traffic light in 3D. Lidar points in the local neighborhood of this location are matched and processed to produce the bounding box and orientation of the traffic light and its sub-components. We also employ heuristics for solving simpler problems. One area where we’ve found heuristics to be useful is in the generation of lane hypotheses, yield relationships, and connectivity graphs at intersections. There is a lot of structure in how these are setup for roads, especially since there are local laws that ensure consistency. Feedback from the human curation and quality assurance steps is used to keep these up to date.

Map priors layer

The map priors layer contains derived information about dynamic elements and also human driving behavior. Information here can pertain to both semantic and geometric parts of the map. For example, derived information such as the order in which traffic lights at an intersection cycle through their various states e.g. (red, protected-left, green, yellow, red) or (red, green, protected-left, yellow, red) and the amount of time spent in each state are encoded in the map priors layer. Time and day of week dimensions are used as keys to support multiple settings. These priors are approximate and serve as hints to the onboard autonomy systems. Another example is parking priors in the map. These parking priors are used by the prediction and planning systems to determine object velocities and make appropriate decisions. Parking priors are represented as polygonal regions on the lanes with metadata that capture the probability of encountering a parked vehicle at that location in the lane. When the AV encounters a stationary vehicle in a map region with high parking prior, then it will more aggressively explore plans that route the AV around the vehicle and demote plans that queue up the AV behind the vehicle. Similarly, knowing where people normally park allows perception systems to be more cautious to car doors opening and detected pedestrians as they might be getting in and out of cars. Unlike information in the geometric and semantic layers of the map, the information in the map priors layer is designed to be approximate and act as hints. Autonomy algorithms commonly consume these priors in models as inputs or features and combined with other real-time information.

Real-time knowledge layer

The real-time layer is the top most layer in the map and is designed to be read/write capable. This is the only layer in the map designed to be updated while the map is in use by the AV serving a ride. It contains real-time traffic information such as observed speeds, congestion, newly discovered construction zones, etc. The real-time layer is designed to support gathering and sharing of real-time global information between a whole fleet of AVs.

Image for post
Image for post
Real-time knowledge

Lyft Level 5

Revolutionizing cars, reshaping the future.

Lyft Level 5

Written by

Level 5 is Lyft’s self-driving division in Palo Alto, Munich and London. We’re hiring! lyft.com/level5

Lyft Level 5

Revolutionizing cars, reshaping the future. Level 5 is Lyft’s self-driving division in Palo Alto, Munich and London. We’re hiring! lyft.com/level5

Lyft Level 5

Written by

Level 5 is Lyft’s self-driving division in Palo Alto, Munich and London. We’re hiring! lyft.com/level5

Lyft Level 5

Revolutionizing cars, reshaping the future. Level 5 is Lyft’s self-driving division in Palo Alto, Munich and London. We’re hiring! lyft.com/level5

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store