How does Google Maps predict the ETA with high precision?

6 min readDec 15, 2023

Hello Aliens…..

It’s been some time since my last blog, as I am busy with my work and studies. Now that I have had some good time, I will be regularly posting the blogs from now on.

In this blog, I will try to give you imformation about how Google Maps estimates the time.

Let's say I want a route from Tempe Lake to Phoenix-Mesa Gateway Airport. Google estimates that it will take around 33 minutes to reach the destination. And how does it do this?

So, we can break this route down into road segments, and we need to find the time taken to traverse each road segment and sum them up in order to get the ETA.

What is a possible solution for this?

Possible Solution 1:

Maybe we can segment the entire route into road segments. And traverse each road segment and sum them up to get the ETA. So, the solution could be using a Feed Forward Neural Network. The input feed can be the encoded representation of that road segment, and the output can be a single neuron that gives time to traverse. Likewise, I can pass through all the road segments and get the time to traverse each of the road segments. At last, we can sum those times to get the ETA of the total.

Downsides of the above solution: Even though the above solution sounds like an easy solution, there is a problem here. Feed Forward Neural Networks require the samples to be independent of each other. But with road segments, traffic on one road can easily influence the traffic and ETA of the neighbouring roads, which clearly indicates that there is some dependence between the samples.

Possible Solution 2:

As the road segments are dependent on each other, we can use a network that can handle sequence data and parallelization. So, Transformers comes to our rescue. In the transformers, we could potentially use the encoder. Feed in the road segments simultaneously and get their corresponding embedding vectors simultaneously. And pass each embedding vector through the Feed Forward Neural Network, get the ETA for each of those road segments, and sum them up to get the total ETA.

Downsides of the above solution: We need a lot of data. Transformers have no understanding of how road segments are related to each other. It needs to learn this relationship from scratch with a ton of examples. In order to get an understanding of this, transformers are going to take a lot of compute power, a lot of time, and a lot of data.

But we have arrived at the point where roads adjacent to each other are more related than roads that are nowhere near connected.

Possible Solution 3:

The most intuitive way to surface this knowledge of roads is with Graphs. So graphs consist of vertices and edges. In this context, let us consider every road segment as a vertex, and if two road segments are connected to each other, their corresponding nodes will be connected by an edge. So we can essentially create a graph for all of Tempe or any section of any city that you want.

The goal now is to learn the embeddings for the nodes, i.e., the road segments, and this is done by an iterative process called Message Passing.

Let me give you a simpler idea of what a message passing is.

Consider a graph with just two nodes A and B, which points to a third node C. We are going to do message passing to C. This involves two steps.

Creating the message for node C involves taking the inbound nodes to C and applying some operation (addition of A and B) to the vectors.
Update State : Now we take the message vector from (1), take the current vector C and pass it into another function (addition) to get the new state of C. In the end, we just need the output of a vector that is representative of the embeddings of C.

But the summation operations we are using here might be a little too weak to actually get the embeddings that we need.

So, we can potentially use transformers instead. We can take two vectors, that is, nodes A and B. Also, we can take the information about the connecting edges, which would be AC and BC. Feed all of these embeddings into the transformer encoder. And we get the encodings of these vectors simultaneously. Next, they are used by the transformer decoder along with the current node vector C in order to get the next state (new state) of Vector C. This is message passing for node C and only for one-time step.

This process happens simultaneously for all the nodes in the graph network because each node's next state is dependent on its previous state. In this way, every node's neighbours are encoded into its embeddings, and we can execute this multiple times to get better and better embeddings.

So, on the first go of message passing, every node knows about its neighbors, and in the second iteration, every node knows about its neighbor’s neighbor. And in the third iteration, even their neighbors are encoded into every node’s embedding, and so on. In this context, after n passes of message passing, we have the road network embeddings.

Now, since the orientation of the roads doesn't change much, we can store these road embeddings in a look-up dictionary that we can use during inference time.

Now these road vectors contain information about the location of the road itself and its relationship with the other roads. But we might need some more additional information to get more accurate results, such as traffic information, speed level information, and accidents. All of this can be incorporated too.

This is where we pass all of this information into a feed-forward neural network and get the most up-to-date time to traverse that segment.

So to summarize, the prediction of the ETA is:

Determine the start and end points on your map.
Use some algorithm to find a path.
To determine the road segments along the path.
During the training phase, we could have stored all the road segments and embeddings into a look-up dictionary. So, during inference, all we need to do is just reference each road segment by that ID to that dictionary store and get the corresponding road segment embeddings.
Query some real-time features of traffic, accidents, and speed limits and pass them all along with the embedding vector into a feed-forward neural network to get the expected time of traversing that segment.
Sum all of the times to get the ETA from start to finish.

I would strongly recommend that the Aliens visit Part 1: Sentimental Analysis Using BERT to get an understanding of the transformer architecture in detail. I plan to share additional blog posts covering topics such as robotics, drive-by-wire vehicles, machine learning, and deep learning, etc..

Stay tuned.

Happy Learning…….

Thanks for reading the article! If you like my article, do 👏 this article. If you want to connect with me on LinkedIn, please click here.

This is Karteek Menda.

Signing Off

How does Google Maps predict the ETA with high precision?

Possible Solution 1:

Possible Solution 2:

Possible Solution 3:

Written by Karteek Menda