Vehicle Dwell Time & Location Prediction Models: Early Steps

Ebru Odok
Data X: Hondezvous
Published in
4 min readMar 15, 2021

Written by Adam Huth, Isabel Zavian, and Ebru Odok for the Hondezvous team.

It’s a beautiful Tuesday afternoon — the sky is blue, the birds are singing, and unfortunately for you, you’re running errands. It’s 3:17PM — a full grocery list sits on your lap, and you’re supposed to be picking up the kids at four before dropping them off at soccer practice. Oh great; the cleats still haven’t come in yet. It will have to wait — groceries first! You frantically run into the store, reminding yourself to not forget eggs this time.

4:12PM, and you’re pulling into the soccer field’s parking lot. You hop out and pop the trunk, before noticing an unopened light brown box with blue tape. They came in! You forgot that you signed up for Amazon Key, and while you were shopping, the Amazon person placed your package in the trunk. Looks like the kids won’t have to wear vans again on the field.

This is just one application of what a prediction system could yield. Our vehicles (and our data) paired with quality prediction models creates a multitude of alluring new market opportunities, from on-the-go vehicle maintenance to city planning recommendations for EV charging stations. We’ve teamed up with 99P Labs to work on building these predictive models; specifically, we are looking to predict vehicle dwell time as well as location. Dwell time is defined as the time between two consecutive engine cycles; that is, the time between a vehicle parking and turning off to the time the engine restarts and the new drive begins.

We aren’t the only ones attempting to solve this problem. We weren’t able to find a lot of articles related to dwell time models for cars, but we found a few that were focused on buses and trains. Most of those projects used k-Means clustering and DBScan to find overlaps and patterns in the data and build a dwell time model.

We originally had a plan to work on three different models — Hidden Markov Model (HMM), Auto Regressive Integrated Moving Average (ARIMA) model, and a Long Short Term Memory (LSTM) model — for both dwell time and location predictions, then choose the model with the best test accuracy. However, upon further investigation and insights from countless research papers and UC Berkeley professors and staff, we ultimately decided to create and train two separate models for each prediction. When we were first planning on creating our own HMMs and LSTMs, we were told that HMMs would be extremely complicated and that building and training an LSTM model from scratch would take a tremendous amount of time, so we needed to reconsider our approach and focus on the safest model that would also take less time to implement and have a higher accuracy. Finding some pre-trained LSTM model was an option, but we looked deeper into the ARIMA model and realized that it was the best option for time series forecasting, which our project is generally centered around. Additionally, we decided against using a neural network because of its “black box” nature; interpretability is of value to both us and the 99P Labs team, and we felt that we would be unable to explain how a neural network predicted a dwell time or location based on a set of seemingly unrelated, random features.

Another struggle that we were facing was to determine whether we can build a single model for both dwell time and location predictions or whether we’d need a separate one for each and find a way to tie them together towards the end. We decided on the latter, because we concluded that k-Means clustering would be the most useful for the dwell time model and that an ARIMA model would be necessary to actually predict vehicle locations and develop a sense of accuracy towards it.

Currently, the team is working on filtering the dataset based on dwell times. We have created a new table of dwell time observations, with variables including a trip’s start and end time, dwell location, and dwell time. By the end of the data manipulation process, we will effectively be left with a “timeline” of dwell times and locations for each car and trip. These should be all variables necessary to build the first version of our dwell time model, and we hope to build upon this after we get our initial version up and running. As predicting dwell time is the main priority of our project objectives, we plan to spend the most energy building this model at the start so we can spend longer training and tuning hyperparameters.

Our next steps are to create a function that outputs a location geofence for specific latitude and longitude coordinate pairs. This will help solve issues we foresee with predicting the location for an area frequently visited with a range of latitude and longitudinal values (e.g. a grocery store’s parking lot). We assume a location cluster could have characteristics associated, most importantly, a characteristic dwell time. By creating this function, we should be able to start clustering locations based on coordinates and the amount of cars that remain parked between trips at the location. Essentially, we’ll be evaluating high-priority dwell destinations (which we assume would be top locations for deliveries), and feed that information to our ARIMA model to give it a bit of a boost, which is a popular technique for location prediction that we’ve seen in academic papers.

Hondezvous is a team project by Charlie Duarte, Nikhil Dutt, Adam Huth, Ebru Odok, Xuerui Song, and Isabel Zavian.

Links to sources consulted:

https://towardsdatascience.com/lstm-by-example-using-tensorflow-feb0c1968537

https://www.researchgate.net/publication/303527611_Method_for_analysis_and_prediction_of_dwell_times_at_stops_in_local_bus_transportation

--

--