Vehicle Location Prediction with Markov Chain Models

Ebru Odok
Data X: Hondezvous
Published in
4 min readApr 28, 2021

Written by Adam Huth, Isabel Zavian, and Ebru Odok for the Hondezvous team.

In our last article, we discussed how we processed our data and overcame challenges to successfully pull in over 35 million rows of data. From there, we developed a KNN model for vehicle dwell-time predictions and used DBScan for creating dwell time clusters. In this article, we will explore our journey with building our location prediction model.

Figuring out which model to implement for vehicle location predictions was extremely challenging. After our preliminary research, we originally decided to create a Hidden Markov Model, but within the first few weeks of the semester we opted to instead build a SARIMA (Seasonal Auto Regressive Integrated Moving Average) model due to its ability to change predictions based on different times throughout the year. However, upon further research and multiple failed attempts to implement a SARIMA with the dataset, we concluded that a SARIMA model would not be appropriate for our dataset, because we preprocessed our dataset to be event-based, whereas SARIMA models take in TimeSeries data. Furthermore, we wanted our model to consider multiple columns (latitude, longitude, time of day, day of week, etc), yet all of the SARIMA implementations we had seen only used one column as an input for predictions.

We concluded that although nobody on the team had prior experience with implementing Markov models, it was ultimately the most intuitive and commonly implemented approach for location predictions. Markov models work by having a set of states, for which each pair of states i, j in the set has an associated probability of moving from state i to state j. To apply a Markov model to our data, we initially set location clusters as unique states in the Markov chain and calculated the probabilities of moving from the current location to a different location for each car. To process the data for the Markov model, we dropped all rows with no dwell time durations in order to only use completed trips, and kept latitude and longitude for only start and end locations.

An example of a Markov Chain for an arbitrary car and each state from the set of states. In this example, if the car is at home it has a 0.3 probability of going to the grocery store.

For the model architecture, we first set a buffer in order to prevent an excessive amount of coordinates that were generally in the same area (e.g. driveway vs the street in front of a house) and grouped these locations under one state. After we found the set of all unique states (read: locations) a car visited, we wrote a transition function that calculates the probability of the car starting in location i and ending in location j for all locations in the set of states. To do this, we utilized the prob140 data science library, created for UC Berkeley’s Data140 course.

A Markov Chain transition matrix for one of the cars in the dataset. The car has a probability 0.025 of going to state 1 from state 4.

After we calculated the transition probabilities for every state, we wrote a function that took in a car’s ID number and returned the next most likely states, and their corresponding probabilities, given its last recorded state in the dataset.

Although we are nearing the finish line of this project, we still have several steps left to complete in order to finalize our system. Our next steps for the Markov model includes specifying the states; we are planning to add additional variables to our states (e.g. time of day, day of week, etc) to achieve more accurate probabilities and overall better model performance. However, the team is mindful that this change could cause model overfitting, especially if our new variables are too specific. Overfitting would be detrimental to our model’s performance as it won’t be able to predict as well when new data is introduced.

Moreover, one of our most critical tasks is connecting our KNN and Markov models so that location predictions are taken into account when making dwell time predictions, and vice versa. From there, we will be able to fine-tune our models to ensure the highest possible accuracy and confidence when making all predictions. Once completed, we can then integrate our models into a Streamlit user interface, giving users the ability to see dwell time and location predictions for a specific vehicle or vehicles within a set date range.

Hondezvous is a team project by Charlie Duarte, Nikhil Dutt, Adam Huth, Ebru Odok, Xuerui Song, and Isabel Zavian.

--

--