Extended Kalman Filters for Dummies

9 min readAug 18, 2017

Starting from Wikipedia:

“Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by using Bayesian inference and estimating a joint probability distribution over the variables for each timeframe.”

Let’s say that “Bayesian inference” has to do with statistics. Its goal is to make predictions using all the information currently available until new information is generated. With this statement we can already get the main idea from Kalman Filters.

It is considered a Sensor Fusion Algorithm because it uses many inputs from different sensors that work better than the estimate obtained by only one measurement.

It has to deal with the Uncentainly of the Noise Sensor as well as external factors.

The Kalman Filter produces an estimate of the state of the system averaging all the Predictions about the state and the New Measurements. It uses a Weighted Average that selects the relevant data.

It works recursively but it doesn’t need the whole story, just the last “best guess”.

There is another concept called Kalman Gain. It is defined as a relative importance given to the measurements and to the current state estimate. We can tune this gain depending of the results we want to obtain. With a high gain the system places more weight to the most recent measurements so the system is going to fit them faster, more responsively. If the gain is lower, the system pays more attention to the predictions.

To summarize this, a gain closer to one will result in a jumpy estimated trayectory while with a gain close to zero the system will smooth out noise but decrease responsiveness.

For doing the calculations, “ the state estimate and covariances are coded into matrices to handle the multiple dimensions involved in a single set of calculations”. Let’s says as we have multiple dimensions it is tricky to work with functions. To simplify the process we arrange the data into Matrices.

Let’s imagine a car that is detecting a pedestrian, this would be its processing flow. I’m going to take into account that our car has two sensors, Lidar and Radar.

First the system receive a measurement from the pedestrian position relative to a car.
The system initializes the pedestrian position based on the first measurement.
After a time period called Δt the car will receive another sensor measurement.
As we are working with Extended Kalman Filter we assume that the velocity is constant, therefore we calculate the next position using velocity*Δt. This step is called Predict.
Now the model compares the predicted location with what the sensor says. This is called Update. The Kalman Filter will give more importance to the predicted location or to the measured location depending on the uncertainty of each one.
The car keeps predicting and updating the postion of the pedestrian.

It can be the case when the car receives measurements from different sensors at the same time. What the system does is doing one Update right after the other using the same Prediction for both Updates.

Let’s talk now about maths and physics. Int the Prediction step we assume that in every time step the pedestrian keeps going at the same velocity, thus the next state can be described with this equation.

x′=Fx+ν

As we don’t know exactly how the pedestrian has behaved in this last time step, it can accelerate or change the direction of movement, our uncertainty increases:

P′=FPF_t+Q (F_t means F transpose)

As reminder, F is the Transition Matrix (the one that deals with time steps and constant velocities) and Q is the Covariance Matrix (the one that deals with the uncertainty). Let’s say there is a relation between the Uncertain and the the velocity, the Covariance Q is proportional to the velocity being bigger the Uncertainty whith higher velocities and more accurate with lower ones. This Process has a noise wich notation can be written as ν∼N(0,Q) wht means zero mean and covariance Q, Gaussian Distribution is the proper name.

There is another function called State Transition Function Bu that turns our equation into:

x′=Fx+Bu+ν

We are not going to pay much attention to it. Let’s say it predicts the position taking into account the acceleration of the car as well. For our model we assume that the acceleration is constant.

Later in the Update step we compare where we think the pedestrian is with the sensor information:

y=z−Hx′

x′ is where we predict the object to be after time Δt.

Multiplying by the measurement function H matrix will drop the velocity information from the vector x. Now we can compare our belief of reality with the Lidar measurement position.

Using matrices handle mutliples dimensions in our set of calculations. This allows for a representation of linear relationships between different state variables (such as position, velocity, and acceleration). Here the matrices start to show up:

So first we calculate the time intervals and then we estimate the 2D position and the 2D velocity. We are assuming that the velocity is constant between time intervals.

The State Transition Matrix contains the information for estimate the position and velocity in shape of a matrix:

Remember, time intervals are not constant:

As Time Interval increases we add more uncertain about our position and velocity to the State Covariance P.

We can divide the State Transition Matrix into two parts: Deterministic part and Stochastic part. The Stochastic part has to do with calculating the Covariance Matrix. This photo may help you to figure out the relation between both parts:

As you can see the Noise vector can be splitted up in a non random matrix (4x2) and a random matrix (2x1).

Covariance matrix is about expectations as we don’t know yet how the Noise Vector is going to be. We can define it as:

Taken from Udacity SDC Nanodegree

The Process Covariance is equals to the Expectation of the noise times he noise transpose. As I showed before the noise can be decomposed in G times a.

As G doesn’t contain random variables we can put it outside. Let’s say now we have 3 statisticals moments:

The expectation of the acceleration in x (ax).
The expectation of the acceleration in y (ay)
The expectation of the product. As ax and ay are not correlated this expectation is zero.

When we multiply all the matrices we obtain:

We are going to obtain the data from two kind of measurements (sensors):

Laser Measurements
Radar Measurements

Laser Measurement allows us to get the current position of the object but not its velocity. On the other hand Laser Measurements have more resolution.

Radar Measurement goes further and allows us to get the velocity information as well although it is a little bit tricky. We are used to work with Cartesian Coordinates but this measurement comes in shape of Polar Coordinates.

This introduces a new kind of matrix, H. Let’s have a look to the next image.

The left side of the image represents a common Kalman filter. You just need to multiply H*x´ to get the position. In the rigth side the process is more complex, you need to calculate the mapping manually to convert from cartesian coordinates to polar coordinates.

Let’s start talking about Laser Measurements. The Lidar Measurement is coming in the next shape:

During the Update step we need to compare this measurement with the one predicted. If you remember the Prediction Matrix has the shape:

So we need to get rid of the velocity component. It is easy to think of a matrix to perform that task, the H matrix:

The prime notation (´) means that you have already computed the Prediction Step but you haven’t updated the measurement yet.

But these measurements have their own covariance as well. Each component px and py is affected by a random noise. Our noise vector w has the same dimensions as the Measurement Vector z. If we multiply the noise vector by its transpose then we have a new covariance matrix R:

There are some zeros in the other diagonal that means that px and py noises are not correlated between eachother. You may wonder where are these noise values coming from, they are provided by the manufacturer.

Now let’s have some fun, Radar Measurements are the other part of the story. Unlike the Laser, Radar can measure radial velocity.

To make this simple to understand let’s plot our system. The origin from the plot represents our car and the point, the pedestrian.

x always points in the direction of movement of the car and y to the left. Rho is the distance from the origin to the pedestrian. The bearing phi is the angle between rho and the x direction and rho dot is the change rate of rho.

It doesn´t exist any matrix H as for Lidar Measurement that will map the state vector x into Polar Coordinates. We need to map it manually converting from Cartesian to Polar coordinates.

We can linearize the system using First Order Taylor Expansion.

What Taylor does is trying is to predict the behaviour of a linear function taking into account a starting point, in this case the mean (which is zero).

But watch out!, as we are working with several dimensions the partial derivate turns into a Jacobian with this shape:

Let’s summarize the difference between Kalman Filters and Extended Kalman Filters:

H matrix in Kalman filters will be replaced by Hj (Jacobian)
To calculate y the h function is used instead of the H matrix.
To calculate x′, the prediction update function, f, is used instead of the F matrix.
The F matrix will be replaced by Fj when calculating P′.

The matrices with j are Jacobians. As the linearization point changes I have to recompute the Jacobians at every time.

This is my personal overview about Kalman Filters. I might have commited typos or conceptual errors troughout the article, if so all the feedback is welcome.

Extended Kalman Filters for Dummies

Written by Raúl Serrano