Semantic Maps for Autonomous Vehicles

Lyft Level 5
May 1 · 9 min read

By Kris Efland and Holger Rapp, Engineering Managers, Lyft Level 5

Maps are fundamental to Lyft. They’re critical not just for connecting riders and drivers, but for operating safely and scaling our business. As we look ahead to the future of transportation, we see our multi-modal network of human-driven cars, bikes, scooters, and autonomous vehicles all working together to provide the world’s best transportation. In order to realize that vision, we invest heavily in mapping and recognize that the complexity will grow along with our business. At Level 5, our high-definition (HD) Maps are crucial for helping our autonomous vehicles (AVs) operate safely and enabling them to behave naturally for riders and other vehicles on the road.

A map for an autonomous car has an unprecedented quality bar: navigational maps used by today’s cars and smartphones have a precision requirement of meters, but a map on a self-driving car must be orders of magnitude more accurate. In our last mapping blog post, we outlined how we structure our autonomous maps in layers. While the geometric layer allows the vehicle to localize itself and place itself accurately in the map; the semantic map helps it stay in its lane as well as adhere to established social or cultural norms so that it operates in a way that others expect. It gives our AVs understanding of how all the roads, lanes, crosswalks, and traffic lights are interconnected and it’s used to determine how it and everyone else on the road should behave.

Road Graph Layer

At the foundation of our semantic map is the road network graph. This represents all of the road segments and the interconnections: how many lanes there are, what direction they travel, and which roads connect to which. It also represents the yield relationships between roads and lanes, so that our AVs are able to safely stop at intersections or crosswalks for cross traffic. These are complex properties that change through other layers: the state of a traffic light influences which lanes you need to yield to or alternately some lanes are toggled between one-way or two-way depending on the time of day. Though much more feature-rich, this layer is the most similar to the map for in-car navigation. The planning function in autonomy uses the road network graph to determine the coarse path from A to B, and helps the AV mitigate risk by avoiding complex intersections or roads that have a high speed limit.

Lane Geometry Layer

Above the road graph is our centimeter-precise lane geometry. This is a set of polygons that represents the individual lane markings on the road surface and the street-level rules the vehicle will obey. This data also contains properties not usually found in navigational maps: color of lines, areas that allow lane changes, speed bumps, and stop lines are all represented. Knowledge of these properties allows the vehicle to perform safe yet complex maneuvers, as well as make intelligent decisions about what other vehicles or pedestrians will do. While the road graph allows for high level planning (How do I get from San Francisco to Seattle?), the lane geometry layer allows for a more local and detailed path planning: How do I turn the steering wheel and use the pedals to get through this intersection? Are there environmental limitations to be considered such as one-way streets?

Once the AV is on the road, it makes real-time decisions that may alter that initial route. The vehicle’s behavioral planner uses the dashed white lane dividers to decide when to send control commands to the vehicle’s throttle and steering to safely change lanes or merge with the flow of traffic. The perception system detects other traffic participants and classifies them (e.g. there is a large object next to me and I think it is a truck). The prediction function then uses this information, map location, and velocity to make predictions about what that truck will do in the next few milliseconds through the next 10–30 seconds. With geometry for sidewalks and crosswalks, the AV can correctly identify a pedestrian, place them on the sidewalk, and predict that they intend to cross the street at the crosswalk along the vehicle’s route. The AV would then slow to a stop before the crosswalk to let them pass. All of these systems work together many times a second to ensure a safe and comfortable ride.

Semantic Features and Map Priors

At the top levels are the semantic features and map priors (see layered map). Semantic features are traffic lights, crosswalks, and road signs. Map priors are areas in the map that we expect to see something with a correlating probability. These two aspects of the map allow for the vehicle to make decisions about how it and other objects will behave. In the road graph and lane geometry layers, the semantic relationships define how the lanes work together: where you can turn, where you need to stop, and which lane you need to be in when going from A to B. The semantic features take this a step further by giving the AV more context about the environment and importantly about the dynamic objects that are moving around the vehicle.

Traffic lights as an example are pretty straight forward. In the map, they are 3D shapes that represent where traffic light fixtures are located, what direction they’re facing, and what lanes they control or apply to. They also contain information about which arrows control turns and which are simple solid circles that regulate through traffic. However, traffic lights are not static. The perception and planning systems on the vehicle use the traffic light information in the map to locate the traffic light and determine what state they are in before making a decision: Is it a green or yellow light? Is there a pedestrian in or out of the crosswalk? These are the important distinctions that the semantic features allow and directly impact how the vehicle will dynamically react.

Map Priors are similar but allow even more nuance. These represent derived or observed information that is encoded into the map. Using the case of traffic lights, the prior layer contains the order that an individual traffic light fixture cycles through each state (red, green arrow, green, yellow, then red again) and how long each state lasts. Does the traffic light have the same dwell time per state at all hours of the day, or is it green longer during rush hour to allow more traffic through major intersections? Notionally, the priors layer can represent areas where we anticipate seeing a particular class of objects or some observed behavior with geospatial and temporal specificity. As another example, we may have a prior layer near your city’s dog park that gives the self-driving car a hint that they should watch out for dogs, especially on Saturday mornings between 8 AM and 11 AM.

With our map priors layer, our AVs now have the hints to avoid an area altogether or be more cautious. At the highest level, priors can also give our fleet insight into social or cultural norms that aren’t necessarily spelled out on a sign on the road but are inferred through observation. For example, there may be a center turn lane where cars have a tendency to take u-turns due to restrictions at the neighboring intersections. With this information, the AV can preemptively merge to the outside lane and take more notice when a vehicle gets into that center lane as they may be turning around.

A final example for differentiating between semantic features and map priors is parking. As a semantic feature, a parking spot designates an area where the vehicle can’t drive and that cars must come to a stop. As a prior, parking may represent an area of the street that is safe to drive on, yet also indicates that parked cars have been seen in the area before. These two elements together allow the AV to make much more nuanced decisions when navigating our complex system of roads and behave like an experienced driver.

Semantic Map Construction

The construction of the semantic map follows a few basic principles:

  1. All data must be aligned to the geometric map through which our AV can place itself.
  2. Data from our own AVs and fleet is the most trusted source of information.
  3. Whenever useful, we’d like to leverage pre-existing data (e.g. navigational maps) and build on top of it.

Our map starts by taking our existing navigational map data as our baseline. A tremendous amount of engineering goes into keeping all individual data imports fresh and the base map up to date and correct. This forms the basis of our road graph layer. Building on top of this with our own fleet and AV data allows us to create the geometric layer for localization, the lane geometry layer, and most of the semantic features. We use an evolving set of techniques to construct these next layers. With our AV sensor data, we use computer vision and machine learning to identify lane markings, traffic lights, road signs, and other elements and triangulate their position to place them three-dimensionally in our map. Then, by analyzing the trajectories that our vehicles take or observations they make about what other vehicles do, we can concretely refine things like turn restrictions, traffic light patterns, or driver behavior. These object trajectories are input that refine the map and allow us to improve the lane geometry, lane connectivity, traffic restrictions, and map priors layers with continuous information. The priors are distilled from the full set of historic data we have available and our statistical models improve over time with exposure to more scenarios. This combination not only helps us converge on an AV-quality map over a shorter period of time, but also helps our autonomy system function much better, the more substantive and timely information we encode into the map.

The AV’s sensors are well calibrated, and we use the full set of sensors to create our map: GPS, IMU, lidar, and cameras through both a lidar and visual SLAM processing pipeline to create crisp geometric maps. The lidar scans and images are processed to produce a true 3D representation of the road surface and surrounding area and features. The geometric map allows us to precisely localize the semantic data’s position. This correspondence between the elements we observe in the world and our map lets us warp all layers into the same frame of reference, the geometric map frame. This is important because it ensures consistent alignment across all of our map layers, down to individual traffic lights and crosswalks.

Human curation on the lane geometry layer

The final important function in our map build process is a persistent feedback loop of human curation and quality controls to ensure that our map is accurate to the centimeter-level precision we require. In addition to helping us build portions of our map automatically, our algorithmic techniques also help us smartly identify errors in the map and where to focus the attention of our human operators for final refinement and quality control (QC). Our rich 2D and 3D tools allow our operators to flag data source errors, call out logical errors in our heuristics or algorithms, and then make final adjustments to the map. Once our QC processes complete, we move to simulation and automated testing across the full mapping and autonomy stack — all before a vehicle ever gets on the road. After passing simulation tests we move to live road testing following Level 5’s established testing protocols to ensure that all processes and rules are strictly followed. Once final road testing passes, the map is approved for deployment to our fleet.

All of the information we collect and produce when processing our fleet data feeds into an automated and iterative cycle. This helps us continuously improve and measure our map and the processes and algorithms we use to build it. We are then perpetually producing a better map that we measure against ground-truth on an automated basis.

The Road Ahead

Over time, as vehicle sensors improve and AVs deploy in more cities, the map becomes a focal point for improving the effectiveness of transportation as a whole. Our map building pipeline is designed with this future of data overabundance in mind. We think iteratively, and with more and more observations, our maps are getting feature richer and more precise over time.

While a lot in map building is working well, a lot is still to be built. At Level 5, we are equal parts thrilled and challenged by the road ahead. We are always looking for new friends and colleagues who are excited by the challenge of tomorrow’s transportation. If you are interested in autonomous mapping, please take a look at our careers page. We’d love to talk to passionate people who can help us make it happen!

282

282 claps
Lyft Level 5

Written by

Level 5 is Lyft’s self-driving division in Palo Alto, Munich and London. We’re hiring! lyft.com/level5