How routing works: 4 simple algorithms

Bastian Buch
Omio Engineering
Published in
12 min readDec 11, 2019

At Omio, we have a vision to help people find the best travel options from any location to any location, regardless of how many travel modes or providers they need to use to get there. Rome2rio, who we recently joined forces with, shares that vision.

In this article we want to outline the basic algorithmic concepts of how we are able to connect almost any transport mode of the world to find possible connections between almost any two locations on the planet, a feature not even Google Maps provides.

For example, let’s search for how to travel from the small city Cáceres in Brazil to the russian city of Tulun, not too far away from the Baikal lake. While Google and other providers are not able to find any connection at all, we can craft up to 6 different travel routes within a few hundred milliseconds. How is that possible?

Traveling from Cáceres in Brazil to Tulun in Russia. Right: Google Maps Routing, Left: Rome2Rio.

With this article, we won’t outline how the overall routing system is designed or implemented. We also won’t talk about how the data is structured and how we connect different sets and sources of data. Let’s rather focus on the most basic and foundational element of how the routing engine works: The algorithm.

R2R uses an optimized version of the so called A* algorithm, which tries to find the best paths between two known locations by travelling from both locations towards the other using heuristics. But before explaining in detail how this works, let’s look into two basic path finding algorithms that are easier to understand and form the foundation for A*: The Depth First Search Algorithm (DFS) and the Breadth First Search Algorithm (BFS).

The scenario

All of these algorithms are so called graph traversal algorithms and also the R2R is based on high dimensional graphs. However, in order to make it easier to understand we will use a simplified view: A two dimensional map with 4x6 positions. This could be thought of as a very simple map, with roads and two cities A and B. We want to find the best (shortest) way to get from city A to city B:

A map representation we’ll use to describe routing concepts.

Let’s say those boxes without highways are called blocked boxes — areas we cannot use for travel:

White boxes won’t be considered as possible locations.

Even if the representation as a map is easier to understand, we still have a graph:

The map described before as a graph

In order to find a path from A to B, all three algorithms (BFS, DFS and A*) basically start form A, look around to find unvisited, unblocked boxes, and then walk towards those unvisited boxes, look again, walk again until B is reached. The decision which box to visit next is the basic differentiator between our three approaches and can lead to different outcomes in terms of path length and performance: DFS chooses one path and walks it all the way to the end, if there is a dead end it takes the next path until the target node was found. BFS walks all paths in parallel by only walking to the next node for each possible direction.

Let’s start with the DFS approach

The aim of DFS algorithm is to traverse the graph in such a way that it tries to go far from the root node. Stack is used in the implementation of the depth first search. Lets see how this works in our scenario using both representations:

DFS to walk from A to B

As you can see, following DFS we would choose one of the three walkable neighbors of A. Let’s say we go “west” to (4,2) and follow that path. At (3,1) there is a crossing and we choose to go “west” again to (3,0). This is a dead end as it has no neighbors. So we go back to (3,1) and follow the second walkable neighbor (2,1) and follow the path all the way to (0,6) — which is our target node.

The algorithm for this is fairly simple:

  • Step 1: Push the root node into the Stack.
  • Step 2: Loop until Stack is empty.
  • Step 3: Pop a node of the Stack
  • Step 4: If the node equals target, finish
  • Step 5: Mark the node as visited
  • Step 6: For each unvisited child of the node
  • Step 6.1: push child to the node

The DFS as simple java code:

// a simple implementation of DFS (just pure traversal)
public void dfs(Node source, Node target) {
Stack stack = new Stack();
stack.push(source);
while(!stack.isEmpty()) {
Node n = (Node)stack.pop();
n.setVisited(true)
if(target.equals(n)) return;
for(Node unvisited : n.getUnvistedChildren()) {
stack.push(unvisited);
}
}
}

This does not track the actual path towards the target nor does it return anything, when the target was found as the code shall only outline the different steps needed to traverse the graph.

The benefit of using DFS is that you (most likely) will need less iterations in order to find a valid path to the target. But it is not guaranteed that this path is the optimal one. As you can see in our example: It only takes 12 iterations to find a path, but this path has 10 hops. Using BFS we need 19 iterations to find a path that only has 6 hops. Let’s see how that works.

BFS

The BFS algorithm is more pessimistic about finding a good path on a first try and in a way “investigates” all possible paths at the same time. You can almost think of it as water flowing into a system of different tunnels. It also does not flow tunnel by tunnel but fills everything at the same time.

This is a very different approach for traversing the graph nodes. The goal of the BFS algorithm is to traverse a graph as close as possible to a root node. Queue is used in the implementation of the breadth first search. Let’s see how BFS traversal works with respect to our scenario:

We again start at A (4,3) and look around if there are any unvisited children ((4,2),(3,3),(4,4)). We then visit any of those children and remember all unvisited children of those children, which we then visit to repeat the process until one of the nodes we visit is the target node. Let’s see how those steps are structured, again it is fairly simple:

  • Step 1: Push the root node into the Queue.
  • Step 2: Loop until the Queue is empty.
  • Step 3: Remove the node from the Queue.
  • Step 3: If any of the unvisited children of the node is the target, finish
  • Step 4: Mark the unvisited children of the node as visited and insert the unvisited children of those children in the queue.

A basic implementation of BFS in Java:

// a simple implementation of the BFS algorithm (just pure traversal)
public void bfs(Node source, Node target) {

//BFS uses Queue data structure
Queue queue = new Queue();
queue.add(source);

while(!queue.isEmpty()) {
Node n = (Node)queue.remove();

n.setVisited(true);
if(target.equals(n)) return;

for(Node unvisited : n.getUnvistedChildren()) {
queue.add(unvisited);
}
}
}

As you can see, this is pretty much the same code as we’ve seen for DFS except the data structure is now a queue instead of a Stack. Same like with DFS, this does not track the actual path towards the target nor does it return anything, when the target was found.

The benefit of using BFS is that you will find the best path between the two nodes. But the cost is high, as you might need to check every single node in the graph.

Now, in the real world we have quite some meta-data about the map, the locations, the edges, and most importantly we know about our destination. This is where the A* Algorithm comes to the stage. It finds paths from one location to a known destination based on meta-data (or heuristics) that helps to forecast which next step might be the best towards our destination.

The A* Algorithm

As summarized above the A* algorithm uses heuristics to forecast the final “effort” to get from any location on the map to the target. This can be as simple as the minimal amount of nodes to visit before getting to the target plus the amount of nodes already traveled: At (3,3) in our example, we’ve taken 1 step plus we are at least 6 steps away from B — so the estimated cost at (3,3) is 7. Using this, we would have a clear indication for each location, about how promising it is. So it says something like: Hey I don’t know anything about roads, mountains, rivers etc. but I have a GPS and a compass and can calculate the distance as the crow flies:

With BFS the “frontier” of our search expands in all directions. This is reasonable if you would like to find a path to all or many locations. With A* we let the frontier expand more towards the target then in other directions. Based on the heuristics — as described above- every possible next step gets a priority property, which will be considered within the algorithm. In our example this is just the minimum distance to the target to keep it simple, but in a real implementation you should also take into account the costs that already occured: So for example, if the algorithm visits (2,4) this minimum future distance (or cost) is 4 and the past cost is 3 — so the actual value is 7.

For real routing scenarios, the cost for going from one location to another, and from one location to the target would also not be always the same: You would have highways, small roads, and even dirt roads. Taking into account both the actual cost + the estimated cost, we ‘prefer’ to travel on highways, but we explore dirt roads if they are competitive. That is, if the dirt road cost + estimated super-highway cost is less than all other possible paths, then we will explore the dirt road.

So let’s look at each step:

  • Step 1: Push the root node into the prioritized Queue, with a priority of 0 (cost = priority)
  • Step 2: Loop until the Queue is empty.
  • Step 3: Get the minimum cost within the queue (highest priority).
  • Step 4: Remove that node from the Queue and visit it
  • Step 3: if the node is the target finish.
  • Step 4: for any unvisited children of the node, calculate the cost and add to the queue.

A basic implementation of A* in Java:

// a simple implementation of the A* algorithm (just pure traversal)
public void astar(Node source, Node target) {

//A* uses PriorityQueue data structure
PriorityQueue queue = new PriorityQueue();
queue.add(source, 0);

while(!queue.isEmpty()) {
Float c = queue.getMinPriority();
Node n = (Node)queue.removeMin();

n.setVisited(true);
if(target.equals(n)) return;

for(Node unv : n.getUnvistedChildren()) {
queue.add(unv, c+1+estimateCost(unvisited, target));
}
}
}

Let’s meet in between: The bi-directional A* Algorithm

So, one might ask, if you know about where the target is and have access to some data that allows you to forecast costs for going from any location to the target, why don’t you also start at the target and walk towards the start using the same mechanics? You might meet in the middle, combine your experiences (which path you walked) — and voila, you have found an optimal path in collaboration…

Yeah, the software engineer says: This is possible. We just use two queues, iterate both at the same time using the priority approach and check if one of the queues found the target (if started from start) or the start (if started from target) or if both queues contain the same child it is a collision and we can combine both paths into one which is the optimal one.

The first thing we need to do is replicating the heuristics to also work for the other direction. In the picture below we added the estimated costs for going towards A (dotted circle outline):

Then we follow the following steps to find an optimal path between A and B:

  • Step 1: Push the root node into the prioritized Queue, with a priority of 0 (cost = priority), Push the target node into a second prioritized Queue, with a priority of 0.
  • Step 2: Loop until one of the two Queues is empty.

Forward Step:

  • Step 3: Get the minimum cost within the first queue.
  • Step 4: Remove that node from the Queue and visit it
  • Step 5: if the node is the target finish.

Backward Step:

  • Step 5: Get the minimum cost within the first queue.
  • Step 6: Remove that node from the Queue and visit it
  • Step 7: if the node is the target finish.
  • Step 8: If node the node from the forward step and the node from the backward step are the same, they collide and you found the best path. End.
  • Step 9: for any unvisited children of both nodes, calculate the cost and add to one of the queues.

So once again, how does this change the traversal in our scenario?

A basic implementation of bi-directional A* in Java:

// a simple implementation of the A* algorithm (just pure traversal)
public void biastar(Node source, Node target) {

// A* uses two PriorityQueue data structures
PriorityQueue fQueue = new PriorityQueue();
fQueue.add(source, 0);

PriorityQueue bQueue = new PriorityQueue();
bQueue.add(target, 0);

while(!fQueue.isEmpty() && !bQueue.isEmpty()) {

// forwards step
Float fC = fQueue.getMinPriority();
Node fN = (Node)fQueue.removeMin();

fN.setVisited(true);
if(target.equals(fN)) return;

// backwards step
Float bC = bQueue.getMinPriority();
Node bN = (Node)bQueue.removeMin();

bN.setVisited(true);
if(source.equals(bN)) return;

// collision
if(bN.equals(fN)) return;

// enqueue next steps
for(Node unv : fN.getUnvistedChildren()) {
fQueue.add(unv, fC+1+estimateCost(unvisited, target));
}

for(Node unv : bN.getUnvistedParents()) {
bQueue.add(unv, bC+1+ estimateCost(source, unvisited));
}
}
}

This optimized algorithm does not give better results in our scenario. The only difference is that it does not follow any wrong path for more than 1 step while the on-directional A* follows one bad path for two steps. However, the amount of iterations is the same.

For more complex graphs this works quite amazing, because it cuts of paths that are not relevant:

The schema above shows that in order to get from A to B and you would only start at A it might be that you consider all green and orange locations, if you would start from B to get to A you might visit green and blue locations. When you do both at the same time to meet midway, you will only see the green locations.

Summary

Routing Algorithms can be modeled using straight forward graph traversal / graph search algorithms that are fairly simple to implement. We explained the DFS and BFS approach that are pretty different in terms of behavior. The implementation is almost the same. The only difference is that for DFS you use Stack and for BFS you go with Queue. Adding some priority (actual costs + estimated future costs) to each node to be visited that is used to determine the order in which BFS visits nodes brings us to the A* algorithm. This can be optimized by starting the traversal from the Starting and the Target at the same time to detect a collision: This is called bi-directional A*.

Of course this is not the entire truth, as there are a lot of additions that further optimize the system: For example we do not load all modes into the graph but load them on the fly, the cost function for calculating priorities has some assumptions that might or might not work. But in general, that’s how routing works (at least in theory).

~

Enjoyed the read? Omio is hiring! If you would like to be part of our team and solve great problems, make sure you visit our jobs page and follow us on Medium.

--

--