Tracking pedestrians for self driving cars

Chapter 4: Doing cool things with data!

A self driving car needs a map of the world around it as it drives. It must be able to track pedestrians, cars, bikes and other moving objects on the road continuously. In this article I will talk through a technique called Extended Kalman Filter which is being used by Google self driving car to track moving objects on the road.

Below is a video of a car tracking a pedestrian in a simulator. Lidar measurements are red circles, radar measurements are blue circles and estimation markers are green triangles. It is interesting to see how good the accuracy of LIDAR is compared to RADAR. With some playing around the accuracy of Kalman Filter can be improved to a Root Mean Square Error of 0.09.

Kalman Filter in action

Link to my GitHub with the full code for implementing the above video in C++.

Before I go into Kalman Filter, I want to touch on the three main types of sensors used in a car — Camera (front and rear), LIDAR and RADAR.

Car Sensors — Camera, LIDAR and RADAR

LIDAR uses LASER for measurement and generates a point cloud of the world around it providing the car with fairly accurate position x and position y values. It is able to detect objects in the vicinity of the car (20–40m) with very high accuracy. However LIDAR is not very accurate in poor weather conditions or if the sensor gets dirty. A LIDAR cloud looks like:

LIDAR point cloud

The RADAR on the other hand is less accurate but is able to provide an estimate of the both position and velocity of an object. The velocity is estimated using the Doppler effect as seen in the image below. RADAR is able to detect objects up to 200 m from the car. It is also less impervious to weather conditions.

RADAR estimating velocity of a car using Doppler effect

A Kalman Filter is an algorithm that can be used to track the position and velocity of a moving pedestrian over time and also measure the uncertainty associated with them. It has two main steps — A prediction step which predicts where the pedestrian is likely to be at the next time step assuming they are moving with a constant velocity and an update step which uses sensor data (LIDAR and RADAR) to update our estimate. And these two steps repeat endlessly.

Let’s go through this in a bit more detail:

First we designate the state of the pedestrian as a 4 dimension vector denoting- position_x , position_y, velocity_x, velocity_y. Here are the main steps in a Kalman Filter:

  1. Initialize the state of the pedestrian at t=0. We do this by reading the first measurement from our sensor and using that to estimate our position and velocity
  2. Prediction Step — At t=1 we estimate where the object is likely to be assuming it is under constant motion (fixed velocity with no acceleration) using simple equations of Physics
px (at t+1) = px(at t) + vx* dt (difference in time)
vx (at t+1) = vx (at t)

Similarly we can update our y position and velocity estimate

3. Measurement Update Step — Now we read the measurements from RADAR and LIDAR and use that to update our estimate of pedestrian’s state calculated in step 2. The new state after the measurement step is used for the next prediction step.

This process is repeated continuously to update the current state as shown in the diagram below:

One other thing to note is that in the Kalman filter, the pedestrian's state is estimated as a Gaussian (bell curve) with a mean and co variance. Here is how a 1D and a 2D Guassian looks like:

1 D Gaussian
2D Gaussian

Here is an interesting question? — Why do we need to do a prediction step if we have a sensor measurement? Wouldn’t it be best to just update the state with the sensor (LIDAR/RADAR) measurement? Those should be pretty accurate. Right?

Not really! Measurements can also have uncertainty. The manufacturer for the equipment provides information on what the noise of measurement would be for their sensor. Also local environment condition like rain/fog can make sensors less accurate.

It is a good practice to not trust the measurement blindly. By combining information, the Gaussian for prediction and measurement get multiplied and when that happens, the most amazing thing happens, the resulting uncertainty is lowered as shown below.

The Kalman Filter equations are a bit involved and you can check them on Wikipedia

I also built an Unscented Kalman Filter which is able to accommodate a non-linear motion and is more accurate in predicting the state of a pedestrian. My Github also has the Unscented Kalman Filter built in C++.

Overall building my first Kalman Filter and tracking objects with it was a great experience and I am very happy with the outcome.

Other writings:

PS: I live in Toronto and I am looking to switch career into deep learning. If you like my post and can connect me to anyone, I will be grateful :). My email is


Udacity Self Driving Cars Nano Degree — I thank Udacity for giving me the chance to be part of their new Self Driving Car program. It has been a very interesting journey. Most of the code used by me was suggested in classroom lectures.