Carlos HG
6 min readJan 2, 2022
Map-Matching — Photo by Amir Saboury on Unsplash

A few years ago, I was in a meeting with different coworkers of a transportation company and someone mentioned the next problem:

“It takes a long time to process the monthly fuel efficiency report of the entire fleet (distance travelled / fuel consumed). No problem with the fuel information, the problem is that we need to physically download the odometer information from every vehicle and that takes a lot of time.Why don’t we use the GPS information of the fleet to get the distance travelled? We have it in real-time” -he said.

Everyone said it was a bad idea because that distance is not accurate. At the time I had no idea about any spatial data processing techniques. Later, I started to learn data processing techniques, Python and all this new data scientist stuff, putting special attention in spatial procesing. Eventually I discovered that the problem mentioned in that meeting is all about map-matching, so this article is about giving a proper solution with a practical example using this technique.

There is a lot of information regarding map-matching on internet, Uber and Lyft also shared very good articles and conferences about this. Of course there is a mathematical approach for this topic but don’t worry this is a 100% practical project with less than 60 lines of code.

I will use VSCode, Docker with Vallhalla image, the code is in python with libraries like pandas, geopandas and folium. I will put all the references to great articles that helped me out with this project. The four main steps are:

Similarly, the code can be divided in the next steps:

  1. rawGPS_points

We are not going to work with real GPS data, instead we are going to create some pseudo-random GPS points that will be grouped together as a linestring in geojson format. I forgot to tell you that I’m from Mexico, so I will pick some points to describe a route in Tampico (where I live), for this we can use the website gejson.io. Just choose the option draw a polyline, when you are done choose Save/GeoJSON.

I picked a route from the stadium to a near HEB supermarket, but I make two files, one “low resolution” with few samples and other one “high resolution” with lots of samples, in both cases I try to add some noise (samples with some distance away from the road).

  • trip_lowres.geojson
  • trip_hires.geojson

3. Docker — Valhalla

Valhalla is an open-source routing engine, it is comprised of several library modules, the Meili module is the one for map matching. I follow the instructions of these links to get Valhalla run with docker in my Windows computer:

If you want to replicate my project, please remember to download Mexico’s road information in the geofabrik server.

2. Python code

Let’s get started with the six steps of the python code. Notice that the code is for entering one trip at a time, and after you process both trips, you can compare them. I’ll be using the low resolution trip file in the following code.

2.a Import libraries

Just make sure you have previously installed all required libraries, the last one decode_functionsis a python file with the defined function decode, we will talk about this later.

2.b Read & format gps info

The code reads the geojson file and creates the next geodataframes (gdf):

  • gdf_rawGPS_linestring — just a linestring with the raw trip
  • gdf_rawGPS_points — each point of the raw trip
  • gdf_rawGPS — a gdf with the linestring and the points of the trip

Aditionally, the code creates a dataframe (df) that will be passed to Valhalla:

  • df_rawGPS_points — each point of the raw trip but in df format

2.c Valhalla request

The code made the Valhalla-Meili request using df_rawGPS_points as mentioned. And also some extra parameters configured in meili_tail , for example search_radius (distance to look between points and road network). For more information you need to read the Valhalla documentation.

2.d Read & format Valhalla response

The code expects a valid trip result status_code=200. Once having the valid trip, Valhalla gives us among other things two main data:

  • matchings /geometry— a linestring describing the map-matched trip. The linestring is in an encoded polyline format, for decoding it, we will use the function decode . So you will need to have an extra python file in the same directory where you have your main python file and named decode_function.py
  • tracepoints — for every raw point Valhalla receive in the request, he returns a map-matched point

The code reads the response data, process it and creates the next gdf:

  • gdf_MapMatchingRoute_linestring — a linestring of the map-matched route
  • gdf_MapMatchingRoute_points — the points (vertices) of the previous linestring
  • gdf_MapMatchingRoute — a gdf with the linestring and the points of the map-matched route
  • gdf_mapmatchedGPS_points — a gdf with the tracepoints

The code for decode_functions.py:

For more information about decode a route shape, see the next link:

2.e Raw & map-matching routes — draw map

The code creates an m folium map with fixed center and zoom, so if you want to try another route you need to adjust both parameters as your route requires.

  • rawGPS_points — linestring & points of the raw trip (red)
  • MapMatching_rawGPS_points — tracepoints (white)
  • MapMatching_Route — linestring & points of the map-matching route (green)

The folium map is saved under mapmatching.html file in the main directory.

2.f Raw & map-matching routes — calculate distance

The code uses the geometry_lenth function to calculate the haversine distance of the raw trip and the map-matching route, and just print them as an output text in VSCode.

Results & conclusions

Putting together both “low-res” & “high-res” raw trips (red)and their respective map-matching routes (green).

Since I made this hypothetical trip and I know my city, I define the low-resolution map-matched route as the correct one.

For this experiment the map-matching results are very satisfactory, at least for me. Obviously in a real scenario, the results will depend on, among other factors:

  • Precision, accuracy & sampling frequency of the gps trackers installed in the vehicles.
  • Urban & rural topologies as skyscrapers or mountains near the road network.
  • Quality of the OpenStreetMap data for the location you are doing map-matching.

Without a doubt, doing a map-matching process to calculate travelled distance is better than not doing it.

While I was finishing this article I started to think that the therms “low resolution” & “high resolution” are better explained as “low frequency sampled” & “high frequency sampled” routes, just as a final comment. Any question is welcomed and I will try to answer it the best way I can.

The github repository for this project:

Carlos HG

Data-driven decision-making culture enthusiast. Especially interested in spatial data-science.