Correcting GPS Locations of Cars using Particle Filters

Juan Mancilla Caceres
aiincube-engineering
10 min readJul 9, 2020

In this post, I will attempt to explain Particle Filters (PF) and their application in correcting GPS measurements. In my experience, most sources online provide correct and in-depth explanations but usually not in an accessible way, see for example the next paragraph containing Wikipedia’s definition:

“Particle Filters are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. A filtering problem consists in estimating the state of a dynamical system when partial observations are made. The objective is to compute the posterior distributions of the states of some Markov process… The term was first coined in 1996 by Del Moral in reference to mean field interacting particle methods…”

My goal is to help the general audience to understand PF in practical terms, but before we unravel what PF are and how we can use them, we should consider why do we want to correct GPS locations in the first place. In Figure 1, the red dots show the actual location of GPS readings from devices (e.g., phones, cars, etc.), these readings usually consist of a latitude and longitude (as well as other metadata such as time, speed, altitude). In most customer-available devices, this location is within 10m (approx. 30ft) of the actual location. As you can see in the image, this means that the raw data coming out of GPS devices may place you in the middle of a building, in the water, and not necessarily where you actually are. This may get worse when the device is close to obstacles such as large buildings. The task in this case is to find the actual location of the device, based on all the available information and to place the reading in the location where it actually occurred (or at least, the location where it most probably occurred).

Figure 1: Red dots show real life locations from a car with a GPS, green dots are the output of the particle filter with the location corrected.

One might think that snapping the point to the nearest street would be sufficient, but that is not the case. As can be seen in Figure 2, the nearest street to some of the points around the corner would imply that the moving object would do a quick U-turn which is unlikely given that we see that the object was most likely simply turning around the corner. The insight here is that we can guess the actual trajectory of the object by looking where it has been before and where it will be later.

Figure 2: Example of why snapping locations to the closest street is not enough. Yellow line shows the inferred trajectory if we simply snap each red dot to the closest street.

The task of predicting a state of a system (e.g., the location of a car) using previous states is called filtering, and PF are just one of many algorithms that can be used to solve this problem. Formally, the input to filters can be defined as noisy sensor measurements from a dynamical system, and the goal is to estimate the measurement (i.e., the new state) at a time t using all observations from the start.

One can think of a dynamical system in terms of states and observations. At each time step, the system moves from state to state following a particular pattern (usually represented as a function or a probability distribution) called the transition model, while a sensor emits a measurement based on the current state, which may contain noise or follow another specific pattern, traditionally called the sensor or observation model. See Figure 3 for a graphical representation.

Figure 3: Example of a Dynamical System. S_t represents the state of the system at time t which is not directly observable, O_t is the observation at time t. f_t and h_t are the transition and observation models respectively.

Following the symbols in Figure 3, the state of the system at time t can be computed as

where v_t is noise. Notice that here we are making a somewhat strong (but common) assumption that the current state depends only on the previous one, this is called the (First-order) Markov Assumption.

The filtering problem can then be specified as computing the following probability in an iterative way:

which can be read as “In order to compute the posterior probability of being in state s_t given all the observation from the beginning up to the previous time step, we only need to compute the probability of being in the previous state s_t-1 (this is the iterative part) times the probability of moving from s_t-1 to s_t, for all possible s_t-1 states”.

In theory, this value can be computed exactly using the Forward Algorithm but only in very specific cases can it be done efficiently (e.g., the distributions are all Gaussian). In practice, most of the interesting cases have a state space that is either too large for efficient computation or the distributions have to be approximated (e.g., through sampling).

It is in these cases where PF comes into play. The general idea is to represent the probability by a set of randomly chosen weighted samples (called particles), see Figure 4. The randomly chosen part is what makes this a Monte Carlo algorithm.

Figure 4: Visualization of the sampling done to represent the posterior probability (red line). The small dots below the line are the samples, the dots above the line represent the samples weighted by their importance to the representation of the distribution (the larger the more important).

The advantages of using this representation are several: 1) It can represent any arbitrary distribution, the more particles the more accurate our representation is, and 2) It allows us to more efficiently compute the desired values by limiting the search space, because instead of storing the whole distribution, we are literally only storing a list of N particles.

One of the underlying concepts that is very important for PF is Importance Sampling, which is the technique to estimate a probability distribution through samples. The basic idea is that, since we don’t know the real distribution, we need to sample from a distribution that is expected to be similar to the one we are looking for and assign more importance to the samples that are most representative of the real distribution.

Concretely, if what we want to do is to approximate P(s), the approximated value of P(s) is the number of particles with value s, which means that a lot of them will have P(s)=0. In addition, we will weigh the particles according to the probability of observing the actual observation given the state of s, i.e., w(s)=P(o|s).

Because of this weighting, the probabilities will not sum up to one, which is important if we want the weights to represent probabilities. To fix this, we can either normalize (i.e., change the values so that they sum up to one) or resample the particles (drawing with replacement) according to the weight until we get a total weight summing up to one. Doing this will ensure that particles that are most representative of the actual distribution will tend to survive for the next iteration while those less representative will disappear. In the end, the particle with the highest probability will contain the expected state.

Here is a summary of the steps we described above:

In order to predict the state of a system at time t do:At step = 1:    (Since we have no idea what the real state of our system is, we    throw particles everywhere, i.e., sample N particles according to the expected distribution): 
Sample the possible initial states S_i following P(s_i|o_i)
Compute the weights w_i as a function of P(o_i|s_i)
Resample N particles according to w_i (i.e., particles with higher should be more likely to be selected)
At step > 1: Compute the weights w_i as a function of P(o_i|s_i) for the current particles
Apply the transition model as a function of P(s_i+1|s_i)
Resample N particles according to w_i
Do this t steps and return the particle with the higher weight

It is important to notice that, in practice, a lot of the “magic” in making PF works is in the transition and observation model. You will need to think very carefully about it in order to have the best possible results. For the particular case of correcting GPS measurements, the transition model needs to be encode how objects move from one street to the other (e.g., given that your belief is that a car is at the corner of 2nd street and 5th avenue, what are the set of possible streets that the same car will be at the following measurement?). In a similar way, the observation model needs to encode what we know about the precision of GPS readings and actual locations (e.g., that the actual position may be within a 10m. radius).

Let’s do a small example that would definitely not warrant a particle filter but that will help us exemplify each of the steps above. Imagine an object (the small velociraptor in Figure 5) moving in a 2x2 grid. We want to track the movement of our dinosaur at every hour. We know that he may move in any way he wants, but from previous experiments (or from knowledge provided by famous dinosaur expert Dr. Alan Grant) we notice that he moves mostly in a clockwise fashion. Also, we have a blurry CCTV system that allows us to see where the dinosaur currently is, with some known error.

Figure 5: Object (small dinosaur) moving in a 2x2 grid. The task is to track the movement of the object at specific time steps. The blue dots represent our original 4 particles. The numbers represent the coordinates of each cell of the grid.

What we just described is the transition and observation model. Specifically, let’s assume the following transition model:

╔═════════════╦═══════╦═════════════╗
║ Moving From ║ To ║ Probability ║
╠═════════════╬═══════╬═════════════╣
║ (0,0) ║ (0,1) ║ 0.8 ║
║ (0,0) ║ (1,0) ║ 0.2 ║
║ (0,1) ║ (1,1) ║ 0.8 ║
║ (0,1) ║ (0,0) ║ 0.2 ║
║ (1,1) ║ (1,0) ║ 0.8 ║
║ (1,1) ║ (0,1) ║ 0.2 ║
║ (1,0) ║ (0,0) ║ 0.8 ║
║ (1,0) ║ (1,1) ║ 0.2 ║
╚═════════════╩═══════╩═════════════╝
all other possible movements have a zero probability.

The observation model will be the following: If the object is at position (i,j), then we will get the observation (i,j) with 0.75 probability, (i’,j) with 0.1, (i,j’) with 0.1, and (i’,j’) with 0.05. The primed indices refer to any value different to the actual index.

At step 1, we have no idea where the dinosaur is, so we will create one particle for each cell in the grid with equal probability (blue dots in Figure 5). I.e.:

s1 = (0,0) with w1 = 0.25
s2 = (0,1) with w2 = 0.25
s3 = (1,1) with w3 = 0.25
s4 = (1,0) with w4 = 0.25

where sᵢ is the identifier of the particle and wᵢ is its weight.

Our sensor reports that the dinosaur is at (0,0) with 0.75 probability, at (0,1) with 0.1 probability, at (1,0) with 0.1, and at (1,1) with 0.05 (see Figure 6). Therefore, when we apply our sensor model, we get the following weights for each particle:

Figure 6: Example of what our sensor sees. The darker the image, the more certain it is that the object location is that particular cell.

If we want actual probabilities, we can normalize the values, but this step is not necessary at this point. This result basically tells us that we believe that the original location of our dinosaur is at (0,0). Now let’s see what happens an hour later…

For the second step onward, we will only keep 3 particles per each iteration (in practice you want to keep thousands). In this case, we will keep s₁, s₂, and s₃ and forget about s₄ and those 3 particles is everything that we get from the previous step.

We first assume that all three remaining particles have the same weight, and therefore for the second step we have:

s1 = (0,0) with w1 = 0.33
s2 = (0,1) with w2 = 0.33
s3 = (1,1) with w3 = 0.33

We now apply the transition model, by multiplying the probability from moving from the cell specified in each remaining particle to each possible destination. We can also immediately apply the observation model which turns out that it gives us 0.75 for (0,1), 0.1 for (0,0), 0.1 for (1,1) and 0.05 for (1,0).

Therefore, in this case, we end up with 6 possible values:

s11 = (0,0) to (0,1): w11 = 0.33*0.8*0.75=0.198
s12 = (0,0) to (1,0): w12 = 0.33*0.2*0.05=0.0033
s21 = (0,1) to (1,1): w21 = 0.33*0.8*0.1=0.0264
s22 = (0,1) to (0,0): w22 = 0.33*0.2*0.1=0.0066
s31 = (1,1) to (1,0): w31 = 0.33*0.8*0.1=0.0264
s32 = (1,1) to (0,1): w32 = 0.33*0.2*0.1=0.0066

This means that our system believes that the dinosaur is now at cell (0,1) with 0.74 probability (obtained after normalizing the values).

If we would like to continue tracking the movement of our dinosaur, we would simply repeat the last steps: sample 3 particles from the 6 above (according to the weight), apply the transition function and the observation function and continue as needed.

The above example may be over simplified but in reality, it is not that different from correcting the GPS locations of cars, just think of the cells in the grid as actual streets and the observations as any latitude and longitude pair. The trick (an Ai Incube’s secret sauce) is in how we define the transition and observation models.

This blog post was meant to be a quick introduction to Particle Filters. There are many different variations and optimizations that can be done, but the general principle stays the same. I hope this was clarifying and entertaining to all readers.

--

--