My favorite gif, you’re welcome (Image credits: ProductHunt)

Understanding Localization in Self-Driving Cars — and its Technology

An Introduction to Self-Driving Cars (Part 3)

MT
MT
May 31 · 11 min read

Previously, we discussed about why self-driving car is the future and how we got here today by looking at the history of transportation. When then dove into the backbone of self-driving software itself: how to build an HD mapping, in the second part of this Introduction to Self-Driving Cars series.

The high-definition mapping is critical in any driverless vehicle production since it enables accurate localization, helps with sensor perception and path planning, all to improve safety and comfort of the driver and riders. Without this map data, it is almost impossible for any automated driving systems to accurately localize and anticipate the road ahead.

What is localization?

“In theory, localization isn’t necessary, as long as the vehicle’s perception system can figure out everything in the environment. However, since it puts a lot of responsibility on the perception system, localization is used to greatly simplify the perception tasks.” — David Silver

With localization, we can pinpoint a vehicle’s location within less than 10 centimeters inside a map. This high level of accuracy enables a self-driving car to understand its surroundings and establish a sense of the road and lane structures. Precise localization allows a vehicle to detect when a lane is forking or merging, plan lane changes, and determine lane paths even when markings aren’t clear.

How does localization work?

A fundamental understanding of how localization works lies in the comparison between what the car’s sensors see and its actual location in the map. To simply put it step-by-step:

  1. Vehicle’s sensors measure the distance between ego (the vehicle itself) and static objects around it, like trees, walls, road signs, etc.
  2. Distances and directions to the static objects are measured in ego’s coordinate frame. Like the following image, the head of the car would always point forward, X’.
  3. Ego also compares semantic landmarks that its sensors identify, to the landmarks that are present in the HD maps. This comparison would require data transformation between sensors’ coordinate frame to maps’ coordinate frame, all within 10cm accuracy.
[2] Ego’s heading always points forward. Image credits: Udacity’s Self-Driving Fundamentals featuring Apollo

Technology — how they work and their challenges

Image source: Novatel

Components like radar, LiDAR and cameras are used to provide the distance to objects surrounding the vehicle. If the exact location of the surrounding objects is known, sensor integration from these hardwares will provide the absolute vehicle location with assistance from HD map.

Keywords: Global Navigation Satellite System (GNSS), Inertial Measurement Unit (IMU), and Light Detection And Ranging Sensor (LiDAR)

In addition to HD maps, we’ll take a look at a more detailed view at the technology behind localization, such as GNSS, IMU, LiDAR, and camera:

1. Global Navigation Satellite System (GNSS)

A widely used application of GNSS is a Global Positioning System, or what is widely known as GPS. GPS is a system of 30+ navigation satellites circling the Earth, that was initially launched by the U.S. government in 1973 with an aim to offer satellite-based navigation anywhere on earth. Based on data from NASA Space Place, these satellites — approximately 20,000 kilometers away from the earth’s surface — constantly send out signals which are then listened by a GPS receiver in our phone. Once the receiver calculates its distance from three or more GPS satellites (with one other satellite for validation), it can figure out where we are.

To compute the time it takes for signals from a satellite to travel to GPS receiver: time = distance / speed of light (c=3*1⁰⁸ m/s)

Now think of cars with the same GPS receiver. Would it be safe to localize its position just based on GPS? GPS alone isn’t enough, as it can provide some drawbacks, such as:

  1. Inaccurate positioning depending on where we are, typically with accuracy of ~4.9 m (16ft). Furthermore, environmental factors could add noise to GPS localization, which explains why in urban areas and cities like Manhattan with high buildings, our phone’s maps would have poor performance.
  2. Low frequency: 10 hertz or 10 updates per second, which means it takes 0.1 second for GPS to give an update. Since autonomous cars move quickly though, there is a need to update more often than this.
GPS and base stations as ground truth (Image: Land Information New Zealand)

To reduce such drawback and minimize error, a Real Time Kinematic (RTK) positioning technique is incorporated in the equation, as shown on the left image.

RTK involves setting up several base stations on the ground, where each base has a ground truth location (x1, y1). With the output from GPS (x2, y2), we can calculate the distance error that we then send to the GPS receivers, such as a car, to adjust their measurement.

Overall, RTK will use signals from a nearby fixed base station to measure these errors and transmit them to the vehicle. The use of an RTK correction network can allow for accuracies of within less than 10 centimeters on average. However, there’s always a room for improvement that can account for errors when transmitted signals from space are affected by imprecision in the satellite orbit, satellite clock errors, and atmospheric disturbances.

2. Inertial Measurement Unit (IMU)

An Inertial Measurement Unit (IMU) is a sensor device that directly measures a vehicle’s three linear acceleration components and three rotational rate components (and thus its six degrees of freedom).

In contrast to camera, LiDAR, radar and ultrasound sensors, IMU is a sensor that requires no information or signals from outside a vehicle. This environment independence makes IMU a core technology for both safety and sensor-fusion.

IMU measures the forces of acceleration (gravity and motion) and the angular rates of the vehicle. When combined with a GNSS receiver, IMU can provide a complete positioning solution to accurately determine a vehicle’s position and attitude. When the GNSS signal isn’t available, IMU measures the vehicle’s motion and estimates its position until the GNSS receiver can again access the satellites and recalculate the position.

An IMU consists of two core parts:

  • Accelerometer, which outputs linear acceleration signals on three axes in space
  • Gyroscope, which outputs angular velocity signals on three axes in space.
Image source: Science Direct

The IMU helps provide “localization” data, which is the information about where the car is. Software implementing driving functions combines this information with map and perception stack data that tell the car about objects and features around it, which we will discuss further in the next part of this series. Perception stack is essentially the brains behind autonomous vehicles. We can think of it as a collection of hardware and software components consolidated into a platform that can handle end-to-end vehicle automation.

Perception stack keywords: perception, data fusion, cloud/OTA, localization, behavior, control and safety.

One of the many challenges with IMU is that motion error increases with time, therefore we have to combine IMU with low update frequency GNSS/GPS to compensate for location error in IMU. The combination of these two does create a more robust localization result, but it still won’t solve the localization problem completely.

3. Light Detection And Ranging Sensor (LiDAR)

LiDAR, which stands for Light Detection and Ranging, is an active remote sensing method that uses light in the form of a pulsed laser to measure ranges (variable distances) to the vehicle’s surrounding. These light pulses, combined with other sensor data can generate precise, three-dimensional information about the shape of objects and landmarks surrounding the cars, as well as their surface characteristics — as shown below.

LiDAR can create a visual map of its surrounding. Image: Automotive World.

LiDAR instrument principally consists of a laser, a scanner, and a specialized GPS receiver (NOAA of U.S. Department of Commerce). LiDAR sensors continuously rotate and generate thousands of laser pulses per second. These high-speed laser beams from LIDAR are continuously emitted in the 360-degree surroundings of the vehicle and are reflected by the objects in the way.

With use of complex machine learning algorithms, the data received through such activity is converted into real-time 3D graphics, which are often displayed as 3D images or 3D maps of the surrounding objects — in an also 360-degree field of view that helps the car drive in various road types and physical conditions.

LIDAR-based systems are highly accurate in object detection and recognition of 3D shapes, even for longer distances (~100–200 meters). LIDAR system’s 3D mapping capability also helps in differentiating between cars, pedestrians, trees, people, or other objects, while also calculating and sharing details of their velocity in real time.

To localize a vehicle with LiDAR, a means of point cloud matching is used, where LiDAR sensors with an HD maps comparison yields global position of the vehicle. Iterative Closest Point (ICP) [Besl & McKay, 1992] is often used to minimize the error when comparing between points from sensor scans with those from the HD map. As its name suggests, the idea is to iterate through each data point to find alignment. These points converge if starting positions are close enough.

Iterative Closest Point (ICP). Image source: https://cs.gmu.edu/~kosecka/cs685/cs685-icp.pdf

While this give LiDAR localization an advantage of robustness given HD maps and sensors data, some of the main difficulties of using LiDAR is constructing the HD maps and having it constantly updated with transient elements, like pedestrians, cars, and bikes, as they are constantly changing the next time the vehicle passes through. Another disadvantage is that, the laser’s wavelength can also be affected by poor weather and temperature, as the signal-to-noise ratio could affect the sensors in the LiDAR detector. LiDAR, overall, is highly expensive and requires more space to implement on cars, so it also tends to make self-driving cars look a little bulkier.

The aforementioned disadvantages are some of the reasons LiDAR is used in coordination with cameras for the application of self-driving cars — which takes us to our next point.

4. Camera(s)

Camera images are indeed the easiest type of data to collect. To use them in vehicle localization, 3D map and GPS data are also incorporated for visual matching, such as lane lines or dynamic objects matching.

While cameras can be more reliable as a visioning system compared to LiDAR, they don’t have the range detecting feature like LiDAR does. That’s why, companies like Tesla that does not depend on LiDAR in their vehicle’s autopilot system, uses other sensors like radar and ultrasonic sensor to detect range and distance.

It’s important to note that while image data is easy to obtain, the primary disadvantage with cameras is their lack of 3D information and the need to rely on 3D map.

Object detection with computer vision. Image source: TDS

Localization techniques

Now that we have a better understanding on the technology as the main sources of data and input for sensor integration, let’s look further into various methods of robotic localization in the space of self-driving. Bayes, histogram, and Kalman filters are some of the most widely used algorithms for vehicle’s localization. At a high level:

1. Bayes Filter, uses recursive bayesian estimation and Markov assumption to estimate an unknown probability density function (PDF) recursively over time — which mainly consists of two steps:

Discrete Bayes Filter — pseudocode (source)
  • First, we project the belief of the previous time step to the current time step.
  • Second, we update the projected belief with the new evidence.

2. Histogram Filter, is similar to Bayes algorithm, but used when continuous space is applicable instead of a finite space. We use discrete a Bayes filter as an approximate inference tool for continuous state spaces, while Histogram filter to decompose a continuous state space into finitely many bins or regions.

Kalman Filter uses cumulative probability distribution function in determining position and velocity of the vehicle (Source: Bzarg)

3. Kalman Filter [Thiele, Swerling and Kalman], is also used for continuous state space problem. It is typically preferred as it has a relatively simple form and only require a small computational power. In Kalman Filter, the vehicle state, position, and velocity are estimated by using sensor outputs.

The filter is very powerful in several aspects: it supports estimations of past, present, and even future states, in a way that minimizes the mean of the squared error — and it can do so even when the precise nature of the modeled system is unknown [G. Welch and G. Bishop, 2004].

4. Particle Filter, or Sequential Monte Carlo [Liu and Chen, 1998], uses a set of particles (also called samples) to represent the posterior distribution of some stochastic process given noisy and/or partial observations. Particle filtering is especially useful for localization using LiDAR.

Say we start with a map containing complete landmarks which could be houses, buildings, and lamp posts. We also know the exact location of these landmarks within the map. (1) Particle filtering initializes N number of particles randomly placed around the map, where particles are possible positions of the car. Then at each time step t, the car moves. The car gives us sensors information about the velocity and angle in which the car is moving towards. (2) We then move each particle by the same distance and direction as the car, which we call the prediction step. (3) Using LiDAR, we then measure the distance of the car to the known landmarks on the map — this is called the measurement step.

The above localization techniques use a fused sensor and map data to sense the objects in its surrounding. The vehicle is then able to measure its distance from these objects: trees, buildings, humans, other vehicles and so on — to paint a picture of the surroundings and locate itself.

Using high-precision measurements, the vehicle is able to improve its localization within the GPS spot. As the vehicle moves, so does the GPS spot and the vehicle has to repeat the process of sensing, gathering evidence and improving upon the new prior.

Concluding thoughts

Robots use a surprisingly simple but powerful algorithm to find out where they are on a map, a problem called localization. I hope that in this article you get to understand better about localization, how it works at a high level, some of their common techniques, and the technology as well as its persisting challenges to date.

Self-Driving Cars

A publication covering news, predictions, and opinions…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store