Map Matching and the Processing Raw GPS Data on an Industrial Scale

Published in

Bright Box — Driving to the future

7 min readApr 24, 2018

Any measuring device, whether analog or digital, displays results with a certain level of inaccuracy and noise. The inaccuracy of a GPS sensor system depends on the inaccuracy of the sensor itself, and also on factors such as landscape, movement speed, the number of satellites and its location.

In our app we provide the user with the opportunity to examine driving directions in detail. If the app were to display raw, unfiltered data, those directions would take the driver off the road. The route it showed would run through buildings and bodies of water, some points on the route would be a long way from neighboring points, and sections of the might could even be missing.

I think everybody knows that there are solutions on the market that offer map matching services. Map matching processes coordinates and produces sets of coordinates that are tied to the actual road. But no map matching services can understand the specific nature of your data, and the results from processing raw data may not always be the best. That’s why we’ve developed a solution that makes it possible to process sensor data and match it to a road map with maximum accuracy.

Input Data

At the beginning of the development process the OBD dongle transmitted telematics data to the server in the form of points, each with the following parameters:

Coordinates obtained from the GPS sensor
Speed obtained from the vehicle
Engine ROMs obtained from the vehicle
Time of the specific point

Points were transmitted according to the following algorithm: once every three seconds or 20 meters. Note that the transmission algorithm is not ideal. However, within the framework of this specific problem we decided not to change it. It should be self-evident that using a better algorithm to transmit the points can only improve the results. The OBD dongle also collected data such as heading (direction of movement) and dilution of precision (the GPS’s accuracy level, broadly speaking). But this data was not sent to the server.

The Algorithm

The algorithm we applied can be divided into two parts:

Data filtering
Map matching
Partial processing

Filtering

No one knows the specific nature of your data better than you do. We decided to make use of our knowledge and do the data filtering on the service side. Let’s take a look at a simple sample route:

The graph shows the speed obtained from the vehicle by the dongle and the speed calculated from the GPS. As you can see, there are lots of deviations on the graph, both upwards and downwards. A standard filtering algorithm comes to mind at once — a Kalman filter or alpha-beta filter. And those are indeed what we tried first. But those kinds of filters didn’t work as well as one might hope. There are several reasons why not, including the low frequency and absolute diversity of the data (making it hard to get a single modified algorithm with the right accuracy) and unacceptable incidences of transformations of incoming data. A very simple linear filter for speed performed much better in testing. The essence of the algorithm is very simple: for all neighboring points we calculate the speed according to the GPS and compare it with the speed obtained from the vehicle. If the difference is greater than the acceptable level of inaccuracy, we throw out one of the points.

Here’s the pseudocode:

As a result of the filtering, we get data without deviations — but the points still don’t fit onto the road. But before we go to the map matching process we need to thin down our data some more — there’s no point passing ten points that all lie on a straight line when a straight line can be defined by only two points. All the same, there’s no point getting worked up over filtering, because the map matching service might mistake our data for noise. To remove unwanted points, we used the Ramer-Douglas-Peucker algorithm, or rather a slightly modified version of it.

For a group of points ordered in time, we calculate the distance for all points along an arc connecting the first and last points in the group. If the distance from each point is less than a certain value E, we return only the first and last points from the entire group. Otherwise we divide the group into two at the point that is furthest from the arc, then repeat the procedure.

Map Matching

Since the maps for our mobile apps and portals are provided by Mapbox, we decided to use their map matching service. However, we immediately encountered a limitation: there can be no more than 100 points per query. Even taking into account filtering and using the RDP algorithm to reduce the number of points, the average number of points on our routes is still 250 points. That meant that we needed to process them in batches — batches that had to be end-to-end, because in some cases straightforward batching would lead to processing errors. The batching algorithm is as follows:

N: number of points to be processed end-to-end
Split the route into batches of no more than 100-N

3. Then process the first batch

4. Take the last N points of the processed data and the second batch

5. Continue until the final batch

6. The final result consists of Processed1, Processed2, Processed3, and Processed4.

The next problem we encountered was determining the precision of our data. The Mapbox API requires the precision of passed data to be specified, even though the documentation lists that parameter as optional. And if you don’t pass a value, a default value will be inserted. The relevant parameter is “gps_precision.” This is what the documentation has to say about it:

“An integer in meters indicating the assumed precision of the used tracking device. Use higher numbers (5–10) for noisy traces and lower numbers (1–3) for clean traces. The default value is 4.”

However, we had never passed that particular data from the dongle since the service was developed. Given the sparsity of data, using algorithms to determine the noise level proved impossible. Therefore, we processed a certain volume of routes and tried to find the parameters that fit best. But how can you pick the best parameter for a thousand routes? Doing it manually would be unreasonable. It actually turned out to be possible to automate the process due to the specific nature of the results with a poorly-chosen precision parameter. If the parameter value is too low, sections of the route can get lost; if it is too high, you get superfluous loops added to the route.

The DTW algorithm is a good fit for identifying cases of this sort. After processing 1,000 routes and comparing the results using the DTW algorithm we determined the best results are obtained when the precision parameter is set to either 3, 6, or 10 (in different cases).

Finally, in the absence of GPS data about precision, we processed the data three times in parallel with gps_precision values of 3, 6, and 10, then chose the best result. We also planned to add a function for the dongle to send a dilution parameter, which significantly improves the quality of the service’ functionality.

Partial Processing

The end points of routes are often in places where no road appears on the map (parking lots, spaces between buildings, etc.). In such cases map matching algorithms do not work perfectly, and it would be desirable to be able to tell as accurately as possible where the user has parked their car. To solve this problem, we started processing not the entire route, but only the inner points — we throw out the beginning and end of the route, then we add the beginning and end from the original route to the processed data. The beginning and end are identified like this: we take either 5% of the route or the part before the vehicle’s speed reached 10 kph.

The Result

Finally, we’d like to demonstrate the results of our route processing.

Now the route almost always follows the road. Intense strong noise has vanished, and the missing parts of route have been restored (roundabouts and bridges).