MapMatching and Shortest Path Routing at Bazaar

Published in

Bazaar Engineering

6 min readJun 21, 2022

There are hundreds of Bazaar Riders(BRs) on the street to fulfill thousands of orders every day. Bazaar uses BRs phone's GPS data to track and record BR journeys for vendor invoicing.

Why do we need map-matching and shortest path routing?

To fulfill the orders on a daily basis Bazaar has a last-mile logistics network all across Pakistan. The main cost of the last mile logistics is the fuel cost which fluctuates very frequently due to global market rates. So a slight increase in the kilometers will have a great effect on the delivery cost. Previously, we calculate the distance provided by the drivers which most of the time is inflated due to:

Drivers deviate from the planned route due to unforeseen circumstances like road blockages and narrow roads
The vehicle meter reading is not correct

We introduced GPS tracking of BRs by their phones but the GPS data that we get is often noisy and does not match the road. To get a clearer picture of this raw location data, we run it through an algorithm that takes raw locations and returns more accurate locations that are on the road network. This process is called map matching. Also, there are some missing location pings in between so we send it to the Shortest Path Routing algorithm to get the driving route between two points to complete the rider's journey.

Initial Challenges we faced?

The first challenge we faced was we have to store BR's location pings and most of the BRs use the internet while delivering the order. Storing rider's location pings was relatively easy as we integrate our open source location SDK in Bazaar Rider App. The next question that popped up was that BRs have low-end smartphones with less memory storage and low battery backup so how much location data we should keep per ride and what should be the interval of location ping retrieval.

The first Solution we implemented …

As I stated above we first implemented our location SDK on our Bazaar Rider App and for the second challenge, we kept the location data per order delivery and 30 seconds time interval for the location ping retrieval hence on every order delivery the location data is synced on our backend for distance computation.

On the Backend, distance calculation was kept simple and we computed distance by calculating Haversine Distance between the location pings and adding up all at every location sync call.

Challenges that popped up after the first implementation …

We had assumed that mobile GPS works very good and will provide us with accurate location pings so distance calculation will be an easy task but after rolling out the solution following were the results:

1. Location Pings were good when BR was moving towards the delivery location but at the order location because it took some minutes to deliver orders the location pings were scattered.

Figure 1: Scattered Location Pings at Delivery Location

2. In a city like Lahore the road network is so congested that in the city center the buildings are very near to each other and the roads are very narrow between the buildings. So GPS sometimes works and sometimes it doesn't and location pings look like below in figure:2.

Figure 2: Anomalous Location Ping Due to Mobile Tower ping provider

The anomalous ping is from the network tower. So three location providers provide location data to the mobile:

Mobile GPS(mobile internal GPS)
Network (in case of GPS not working location of the nearest network provider tower is fetched)
Fused(a mixture of both network and mobile GPS location)

3. Missing location pings due to app kill from the background. Below in figure:3, the location data is only available where BR has opened the application to do the order delivery marking while other times the app is removed from the background to save battery.

Figure 3: Missing Location Data Due to app kill

How do we overcome these issues?

The second challenge was the easiest to solve we simply filtered out the points that are provided by network providers and also applied a filter on location accuracy to filter higher precision location pings.

For the first and last challenges, we used map-matching to point the raw location ping to the road network and the shortest path routing algorithm to calculate the distance between two locations.

For map-matching and shortest path routing, we used open-sourced OpenStreetMap. Firstly Pakistan’s map file in protobuf format is parsed and the road network is stored in a graph format in the form of nodes and edges.

How shortest path routing is found?

As we now represented the OpenStreetMap file to the graph data structure we can easily apply any shortest path algorithm to find the distance between two points. We used the Dijkstra algorithm as it reduces the time complexity as compared to other shortest path algorithms.

What is map matching?

Map-matching is the process of mapping raw GPS locations to road segments on a road network to estimate a vehicle’s route.

Solving Map matching with Hidden Markov Model(HMM)

We used a commonly used state space model for map-matching the discrete-state Hidden Markov Model (Newson & Krumm [1], Map Matching @ Uber). In this system, we generate candidates by looking at the closest points on the road segment and using the Viterbi algorithm to find the most likely sequence of hidden states.

The simple working of the model is that we first take input observations that are our raw GPS pings and find which road segment could have produced that road segment. This step is called the candidate selection step. The commonly used technique for this is the K-Nearest Neighbour selection of the road segment adjacent to the raw GPS ping. After candidate selection, each GPS observation is projected to the candidate road segment. After this, we can set up our HMM.

All the GPS pings are the observations in our HMM and all the candidate road segments that we identified are the hidden states corresponding to our GPS observations. Next, we compute the emission probability:

Emission probability is the probability of observing a vehicle traversing a road segment. The probability will be higher when raw GPS pings are closer to the road segment and lower as you are further away from the road segment. After this step transition probability is computed:

The transition probability is the probability of a vehicle traversing from one road segment to another. So basically it is the difference between the haversine distance between the projections and the shortest path distance between the projections so the value closer to zero will more likely be the feasible route than with the bigger value of transition probability. The transition probability is computed for every pair of candidates between every consecutive GPS observation.

After all these inputs in our HMM we then feed them to the Viterbi algorithm to find the most likely sequence of hidden states corresponding to our input GPS pings.

The results after map-matching and shortest path routing are great and can be visualized below:

Before Map Matching and After Map Matching

BR Journey Pings Before and After Shortes Path Routing and MapMatching

Conclusion

After implementing map-matching and shortest path routing we found our distance calculation and rider journey recording has improved greatly. Distance recording is also automated hence on every tour completion the distance is calculated by our service asynchronously.

Special thanks to Hasnain, Ateeb, and the entire Operations team for helping us put this service into production!

References

[1] Newson P. & Krumm J., “Hidden Markov map matching through noise and sparseness,” Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems GIS 09, 336, 2009