The math behind Extended Kalman Filtering

*featuring my Garmin ⌚

17 min readJan 9, 2024

If you clicked on this article, you’re probably wondering what the heck an extended Kalman filter is. We’ll get to that. But first, I want to tell you why you should care about them.

Hey! I also made a video on the same topic, so if you prefer that format, feel free to check it out!

Since I’m working in battery tech, let’s use an EV as an example. Perhaps the most classic would be Tesla EVs, which also happen to be working on autonomous driving.

Autonomous vehicles rely on a ton of sensor information in order to get a sense of what their environment consists of. If the Tesla misinterprets its sensor information, or if the sensors malfunction, it could be fatal.

Even if we take something as simple as GPS. In a canyon or somewhere else with sparse signal, the Tesla wouldn’t have very accurate sensor data, and so it may make inaccurate conclusions about where it is. 🗺️

There are countless examples where we have the exact same situation as with this Tesla — where we need to predict the state or location of something, given potentially unreliable measurements. Airplanes, computer graphics, and so on. The question becomes, how can we get more reliable predictions despite having unreliable data?

Kalman filters are one of the most effective and well known solutions to this nature of problem. 🚀

I’m working on estimating battery state of health using extended Kalman filtering. I wanted to build a simple Kalman filter to learn more, so I decided it would be interesting to make one that predicted the path of my bike. 🚲 In this article, I’ll break down the math and show you how I built my own!

Enter Kalman Filters

They’re used in scenarios where you need to predict the state of a system given noisy measurements. Let’s break that down, using an airplane flying as an example. To predict the state of the system would be to say “I think we will be at coordinates 8°20'53.7”N 80°10'16.0"W in 5 hours”.

Or it could be simpler, and just say “I think we’re here at this moment.” “State of the system” just means what the system is doing/where the system is at x point in time.

And then having noisy measurements means you don’t fully trust your measurements, or the sensors giving you those measurements. So going back to our Tesla example, there are a gazillion different sensors, and perhaps some of them are glitchy. These sensors could give us inaccurate information, which would mean that any decision we made “based on the data” would likely be a bad decision.

Obviously we don’t want the Tesla to crash because we didn’t realize it was right beside a massive tree and on a path to impact it. And so one of the potential solutions here is to use a Kalman filter.

At a very high level, what the Kalman filter does is it:

Makes a prediction
Updates that prediction based on a measurement

And it returns its best estimate for the state of the system.

Kalman Filter vs. Extended Kalman Filter

I also want to note something that I haven’t yet explained. I’ve been talking about both extended Kalman filters and Kalman filters, but I never made the difference between the two clear.

The simplest way to explain the difference is that a Kalman filter is used for linear data only. So for example, switching modes of transport here, but bear with me, lets say I was out biking. If I biked in a perfectly straight line in only one direction, and that’s it, a Kalman filter would work great! Because this is linear data; if you plotted it you’d see a line that very clearly fits the classic y = mx + b.

However, I find biking in a perfectly straight line to be quite boring, and also not very realistic. What if I see a friendly-looking dog across the street? I would obviously need to pause my bike ride, cross the street, and go pet the dog. 🐕

If we graphed that, it wouldn’t be linear, and so we need an extended Kalman filter. So for this project, I’ll be working with an extended Kalman filter because my bike route was non-linear.

State space model + Observation model = Extended Kalman Filter

Before getting right into the thick of these equations (because they’re a lot, not gonna lie), I want to give you a conceptual explanation of an extended Kalman filter. Here’s a visual representation:

We use a state space model to predict the state, then we feed that value into an observation model. The observation model then predicts what measurements we should see if we are, in fact, at that state.

Then, we take a measurement of the system (at the timestep we are predicting for now, so the next timestep), and compare those predicted measurements with the actual one.

Based off that, we calculate something called the Kalman gain, which is essentially how much we trust both the state space model and the actual sensor measurements. We then combine those two estimates to form one that is more accurate than either of them alone.

here’s an even better image (although there is some math)

Now it’s time for the math. 🚀

Prediction Step

1. State Space Model

The first thing that goes on inside that black box that is an extended Kalman filter (EKF), is something called a state space model. We use this to predict the future state of the system given what we know about the system. This is part of the prediction step.

In the land of Kalman Filters, time is measured in discrete intervals. This means that you can’t have infinite decimal points for time — in fact you can have no decimal points!
Time is represented as t = 1, t = 2, t = 3, and so on. The “1” basically means that this is the first time interval. These intervals are not necessarily a “logical,” in that they don’t necessarily equal something such as one minute. They can be set to anything, but the important thing is that they are all the same size. Time is represented by t, or in some literature, k.

Going back to the state space model, a Kalman filter will take the state at t = 1 (which is the current state, so it’s known). It then adds what it thinks will be the change in state, given what it knows about the system.

So for example, if I was riding my bike at 20km/hr, that can give a pretty good sense of where I will be in, say, 3 minutes. We can rearrange the simple formula velocity = distance / time, to solve for how far I am likely to have travelled.

Then we add that change in state to the previous state, which gives us the theoretical final state. It is very similar to taking 3 (your initial state) and adding 2 (your change in state) to get 5 (your final state). 3 + 2 = 5.

This is what a state space equation looks like (or at least the one I’m using for my project).

This shows us that the final state, as defined by 3 variables, equals the current state (again, 3 variables) plus the change in state. That dt value stands for delta time, and is there so we can predict further into the future. To go from t = 1 to t = 2, your t value is increasing by 1. So that “change in state” parameter is being multiplied by one. But if we want to predict the state at t = 3, we now have to multiply our change in state variable by 2 (because 1 + 2 = 3).

So then we’d get 3 + 2(2) = 7.

The problem with this equation is that it’s non linear (see all those sines and cosines?), which makes it challenging to work with. So instead, we want to convert it into this form, which looks a lot more like the linear y = mx + b.

There are a few steps to get there. First, there are quite a few variables. Let’s start by defining them.

x_t = state of the system at time t (not to be confused with x_t — the one is bolded). When x has a subscript of t-1, it is the state at the previous timestep.
A = a matrix that describes how the system operates by itself. So if I was biking at a constant speed and there were no external factors (ex a cute dog), how would the system evolve?
B = a matrix that shows how the system is affected by control inputs. Control inputs are essentially an outside factor that, when changed, will affect the state. For example, if I started to pedal really fast, then my speed would change, which would affect the evolution of the system.
u = the control inputs.

X_t-1 we will know, because that’s the previous state. X_t we are trying to predict. U_t-1 we should also know, because these are the control inputs, for example the pilot steering the plane. In my case, I made my control input speed.

However, we don’t know what A and B are. So we have to calculate the matrices. And to do that, we need to calculate the Jacobian.

Solving the A matrix

The Jacobian is essentially a matrix of partial derivatives. A derivative is “the rate of change of a line” — or the rate of change or one variable with respect to another (because points on a line are defined by x and y). The best explanation I’ve seen on this is by (unsurprisingly), 3Blue1Brown. You can check it out here.

Essentially, we are looking at a curve as a bunch of little lines, and then looking at the slopes of those lines. The derivative is what happens when the little lines get smaller and smaller. This way, we are essentially linearizing it, because we are working with very small sections that we can use y = mx + b on.

A partial derivative is merely scaling up that 2 dimensional line into additional dimensions. In our particular case, where the state is defined by 3 variables, its 3 dimensions. We are taking the partial derivative of the state vector at time t with respect to the partial derivative of the state vector at time t — 1.

This basically means we are seeing how the state vector at t changes with respect to the state vector at t — 1. When we looked at the state space model, we set the outputs equal to f1, f2, and f3. So now we can put those values back into our Jacobian like so:

Unfortunately, after all that effort, we get something super simple: an identity matrix. (this is not always the case)

An identity matrix is a matrix where everything is zeros except for the diagonal, which are ones. If you think of matrices as describing the transformations you did to a graph, then the identity matrix represents the graph that has undergone no transformations. Check out this video for a more detailed explanation!

Solving the B matrix

Ok, so we’ve solved A. Let’s solve B now. B describes how the control impacts will affect the state of the system. And so we calculate another Jacobian, this time calculating the partial derivatives of the state vector at t with respect to the control inputs at t.

This time, we get something a bit more exciting than an identity matrix:

We’ve now defined all the variables that make up our state space model. Sub everything in, and we get:

v and w are just the control inputs, so for example speed

So basically all this equation is doing is describing the evolution of the system mathematically. The problem here is that this is based off what we know about the system. But sometimes that can be wrong — something could have happened, say, using the example of my bike, I saw that cute dog and so I slowed down. Then the prediction for my future state would be wrong, because my “change in state” value was less than the model thought it was.

And of course, we want accuracy. Can’t have any planes crashing, or any glitched GPS Garmin maps. So we need to calculate exactly how wrong our estimate was.

2. State Covariance Matrix

The State Covariance matrix is a matrix that shows us how accurate the estimate we just made is. This helps us calculate the Kalman gain (how much we trust each answer) in a future step.

The state covariance matrix is given by this formula:

To break it down, let’s first define what each variable is.

P_k|k-1 = state covariance matrix

F_k = the A matrix (which we just calculated as an identity matrix!)

The little superscript “T” means that this matrix is being transposed. Essentially this means we’re swapping rows for columns. But since it’s an identity matrix, and the rows and columns are already symmetrical, transposing it doesn’t have an effect. And so that value still ends up equaling an identity matrix.

Q_k = the process noise covariance matrix. Process noise accounts for the unpredicted changes in the system’s dynamics over time — this is kind of a Kalman filter’s way of accounting for those unpredictable changes. Covariance is basically how much two variables change together. The reason we have Q_k as a covariance matrix is to reflect the fact that changes in one aspect of the system may be related to changes in another.

We calculate Q_k by calculating the covariance of each variable in relation to each other variable, like this:

The breakdown of the “cov” function looks like this:

Where:

Xi = the values of the X-variable
Yj = the values of the Y-variable
X̄ = the mean (average) of the X-variable
Ȳ = the mean (average) of the Y-variable
n = the number of data points

We are actually calculating the covariance of the same 2 variables (for example, cov(x, x)). This can be simplified to var(x), and so our Q_k value ends up looking like this:

P_k-1|k-1 = slightly different than P_k|k-1. This is the state covariance matrix for the previous timestep. The “|” represents at. For the first iteration, we set this value to an identity matrix with 0.1s on the diagonal instead of 1s. You’ll see later how we can improve the accuracy of our extended Kalman filter by calculating P_k|k-1, rather than just using an identity matrix.

If we sub all those values into our P_k|k-1 equation, we get:

So a quick recap: we predicted the state using a state space model, than found the inaccuracy of that prediction. Now we move on to the update phase of Kalman filtering.

Update Step

1. Measurement Residual

The Kalman filter updates the estimate we just made based off a measurement. So at this point, we take a measurement for the state of the system at the time we’re trying to predict (k).

But remember, we don’t fully trust the measurement. So here, we use the observation model. It’s like having someone tell you that they think saw a cute dog 2 blocks away. Obviously, this is far too serious of a matter to leave to chance, so you’ll have to go double-check yourself. The telling of the cute dog is like the observation model (what you expect to see), and the seeing for yourself is the actual measurements. 🐕

The observation model will tell us what measurements we can expect, if we were actually at that state. So for example, if the predicted state was 49°15′39″N 123°06′50″W, then I would expect to get those coordinates as my measurement.

Something to note here is that it’s not always that simple. In this example, the state maps directly to the measurement: the state is defined by latitude and longitude, and the measurements are also latitude and longitude. (This is important because it defines our H value as just an identity matrix).

We are calculating y tilde k (the y with the squiggly line on top), which is the difference between the actual sensor observations and the predicted sensor observations. In math talk, that looks like this:

Where:

ỹ_k = the measurement residual

z_k = the actual sensor measurements we took

h = the observation matrix. It defines how the state vector (x_k) is transformed to obtain the predicted measurement (z_k). This ends up being an identity matrix, because our state is defined by latitude, longitude, and elevation, and our measurements are also in this same form. The state maps directly to the measurement.

x̂_k|k-1 = the predicted sensor measurements at time k given the predicted state estimate at time k from our state space model. That hat symbol above x means “estimated.”

h(x̂_k|k-1) is our observation model! And again, this returns y tilde k, which is the difference between our predicted and actual sensor measurements.

2. Innovation Covariance

In the last step we calculated y tilde k, which is the difference between the actual and the predicted sensor observations. In this step, we’re going to calculate the covariance matrix S_k.

S_k represents the uncertainty of our y tilde k value, in the form of a matrix. So it shows the uncertainty associated with the difference between the predicted and actual sensor measurements. The smaller the S_k, the more certain the Kalman filter is about the accuracy of the predicted measurement.

This error could be caused by an inaccurate mathematical model for our observation model, inaccurate sensor measurements, changes in the environments, or process noise.

We use this S_k value when we go to calculate the Kalman gain, the next step! Here’s the S_k equation:

H_k = our identity matrix from the previous step, and again that superscript of T just means we are transposing it, which in this case has no effect.

P_k|k-1= state covariance matrix (which we calculated in the prediction step)

R_k = is a covariance matrix (so the same thing as S_k) but for the noise. Since I didn’t include any noise values in my example, we will just set this to another identity matrix.

Near Optimal Kalman Gain

We are getting SO CLOSE!! This is the last step before we get our predicted state estimate. The Kalman gain is basically how much we trust both the predicted sensor observations and the actual sensor observations when it comes to estimating the state.

So it looks at the difference between the actual sensor observations and the uncertainty with both the predicted state (our P_k|k-1 value) and the measurements (S_k).

So if there is a ton of sensor noise, than the prediction for the state will be mostly dependent on the predicted measurements, the K_k (Kalman gain) will be close to 0. And if we trust the sensor measurements quite a bit (because they’re fairly reliable) then we get a high Kalman gain. And again, the reliability is determined by the covariance matrices S_k and P_k|k-1.

This is the Kalman gain equation, and as you can see we’re already calculated all those variables so it’s a simply sub-it-in.

Update State Estimate

This is the step where we finally pull everything together and get our best estimate for the state. 🥳

To recap, we predicted the state based on what we knew about the system, calculated the error in that prediction. Then we predicted the measurements we should have gotten at that state, then calculated the difference between those predicted measurements and actual sensor observations. Then we calculated the error in that difference. Using the error of our first prediction and the calculated error of our measurements, we calculated the Kalman gain.

Here’s the equation to calculate our updated state estimate:

x^k|k = our final estimate

x^k|k-1 = our estimate from the state space model

K_k = the Kalman gain

y~_k = the difference between our predicted and actual sensor observations.

This is the value that is returned!

Update Covariance of the State Estimate

But wait — there’s one more step. Since the Kalman filter is going to iterate through every single data point, we can use some of the values we calculated now to help improve the accuracy in the future.

When we were calculating P_k|k-1 (error in state space model estimate), I said we would just set P_k-1|k-1 (error in state space model estimate at previous timestep) to an identity matrix with 0.1s. Now, we can actually calculate that value, so for the next iteration we’re using more accurate numbers.

P_k|k = the updated error of the state space model estimate after using some of the sensor information

I = an identity matrix

K_k = the Kalman gain

H_k = the measurement Jacobian, aka an identity matrix in our case

P_k|k-1 = uncertainty associated with the predicted state (from our state space model).

The Code

After going through all the theory and concepts behind it, I want to show you what I actually built. In a nutshell, I wanted to use data from my Garmin watch (so running and biking routes) inside an extended Kalman filter. Here’s the result I got, which I’m pretty happy with.

if you zoom in you can see that it derivates a little in some spots!!

The code lines up pretty closely with the math that we just went through, so I’m not going to walk through it again. There is just more defining of variables — for example since so many things end up being identity matrices, they are just defined as that.

If you want to try this out on your own run/bike, I’ll add the code here and if you check out my video I show you how to extract the data. I’m sure it’s a similar process for an apple watch/other fitness trackers. I’m actually curious to see how well it works!!

Oh one more caveat — make sure you have all the libraries that are listed at the top installed. You can do this by typing:

pip install "inserte library name here"

Here’s the code :)

If you stuck with me throughout all that math, huge congrats. It was a lot XD. But I do hope you’ve learned something!! Building this out took a lot of debugging, but it was worth it :)

Thanks so much for reading, I’ll see you in the next one! 🚀