Estimating Delivery Times: A Case Study In Practical Machine Learning

Published in

Postmates

9 min readOct 23, 2015

Machine Learning is rapidly becoming a required and critical component of engineering organizations across the tech industry. From movie recommendation algorithms to self-driving cars, it is clearly an exciting and compelling field. Companies are hiring armies of Machine Learning researchers to solve difficult problems like voice and object recognition.

What does this all mean to the average software engineer? In many cases, extremely specialized knowledge is necessary to outperform existing state-of-the-art systems. It should not be expected that just anyone can easily build a seven-layer neural network to solve any old problem.

Fortunately, there are readily available libraries that have simple implementations of the more common algorithms like Linear and Logistic Regression, K-Means, Naive Bayes, etc. For the right problems, these algorithms can turn into powerful tools.

With the release of Postmates 3.0, I had the opportunity to apply these tools to one such problem. I would like to share with you some insights I gained from the development process for Estimated Delivery Time, and hopefully illustrate how powerful the proper application of some simple and accessible Machine Learning techniques can be when applied to the right problem.

What Is Estimated Delivery Time?

The problem we will be focusing on is predicting estimated delivery time for particular merchants. These times appear in the product to help prospective customers get an idea of how long their orders might take. Before Postmates 3.0, we would only list the distance to each merchant. Ultimately, we felt that customers would care more about knowing when their orders would arrive. In particular, certain kinds of orders can be especially quick, and we would love to let our customers know just how fast our platform can be.

This problem is trickier than it first seems. Even though the length of the drop-off leg (delivery pickup location to drop-off location) is correlated to the length of the delivery, it does not tell the whole story. For example, if you order food from a pizza place that is only a couple blocks away, the long cooking time for pizzas would result in a longer order time than the length of the drop-off leg would suggest.

Postmates is a service that makes your life easier, and properly predicting estimated delivery time helps toward that end. The bottom line is without setting proper expectations with the customer up-front, the experience of the average customer suffers.

Understanding The Data

When deciding how to tackle problems like these, it pays to understand your data. Initially, it seems like we could gather a few pieces of information to create an accurate prediction model:

Distance between the pickup location (the merchant) and the drop-off location (the customer)
Distance between the Postmate at the start of the delivery and the pickup location
How long it takes on average for the merchant to prepare an order
The kind vehicle the Postmate is in (bicycle, car, on foot)

Immediately, we run into our first snag: because we are presenting a delivery time estimate to the customer before the customer has placed the order, there is no obvious way to know the vehicle type or initial location of the assigned Postmate. In other words, we have no idea whether the Postmate will be in a car, truck, scooter, bicycle, or on foot.

Additionally, there are a huge number of complications that may slow down the delivery. A few examples to illustrate the randomness of this problem:

Traffic may be more congested than expected
The merchant may be busier than expected, leading to longer than normal preparation times
The Postmate may have difficulty finding parking
Once the Postmate arrives at the drop-off location, the customer and the Postmate may have difficulty finding one another

and so on.

The noise in the data is evident in the graph below, where we plot delivery time vs. the distance between pickup and drop-off for deliveries throughout the day on a typical Sunday in Los Angeles:

The Baseline Model

In Machine Learning, it is common practice to begin with the most naive approach before putting more time and effort into developing a more complex model or set of features. This data is practically crying out for regression modeling. It’s tempting to brainstorm unusual and clever features or advanced models, but it’s always good to start with bread-and-butter techniques. In practice, teams have limited time and resources, so the simplest solution is often the best one. What if we just try simple Linear Regression with one feature?

This looks pretty reasonable. How good is this? To test the effectiveness, we’ll use ten-fold cross validation to train our model, and evaluate it using MAE (mean absolute error), which is easy to understand because it represents the average error (i.e. estimates are off by this many minutes on average). In this case, our MAE was 9.79. In other words, our predictions are off by 9.79 minutes on average.

Here a decision must be made: should we try to improve on this result, or stick with what we have? Arguably we could just go with this model, but lingering doubts remain: is nearly 10 minutes of error acceptable? Are there patterns in the data we have not yet accounted for? Why does there appear to be more noise above the estimated delivery line than below?

We decided to improve on our initial attempt.

Data Cleansing

An important part of applying Machine Learning in practice is making sure the integrity of your data is sound. Soon after trying out the baseline model, we discovered our data was messier than it should have been. It turns out that many of the deliveries had start and end timestamps that were improperly recorded, resulting in more unpredictable delivery data. We realized that if we use the actual GPS coordinates of the Postmates rather than rely on the flawed start and end timestamps, we can remove much of the noise. This small investment into data cleansing resulted in an MAE score of 8.78. This is a fairly significant improvement, and highlights the importance of really understanding the nature of your data.

Feature Selection

Are there any other pieces of information we could encode as features in our model?

We mentioned earlier the idea of average preparation time for merchants. Intuitively it seems like this information could help our model. How can we calculate this? When orders are placed on the platform, we collect estimates from the merchants on how long they think orders will take to prepare. If we keep track of the average order time for each hour of the day for each merchant, we can use these averages to predict preparation time given the merchant and time of day.

Let’s see how this new feature helps our performance. In the below graph, note that we are no longer fitting a simple line to a scatter plot, but we are fitting a linear equation to a two dimensional set of points (multivariate linear regression).

The plane represents the model’s predicted delivery time given preparation time and drop-off distance. Grey points represent jobs that had shorter than expected delivery times, where darker grey points were closer to the predicted time. Colored points represent jobs that had longer than expected delivery times, where blue points were closer to the predicted time. As expected, we can see that orders involving merchants with higher preparation times generally have higher total delivery times, so this appears to be a good feature. Our MAE is now 8.02.

Let’s try adding one more feature. Intuitively, we would expect that the amount of traffic present in a market given the time of day would influence the length of a delivery. A simple way to incorporate this information would be to take the average travel speed of all Postmates driving cars in a market given the time of day, and add this as a feature.

Note that we have less deliveries occurring in the middle of the night, so our data is more sparse there.

Because we have added a third feature to this feature vector, visualizing the results becomes more complicated. As such, we will just report the results we achieved with this additional feature: an MAE of 8.01. Not as much of an improvement as we would have hoped. We speculate that this feature is not that helpful because most of our deliveries occur between 8 AM and 9 PM, when the average driving speed is relatively flat. Also, we may already capture some of the value this feature provides in the drop-off leg vs. delivery time feature. In future iterations of this model, we might want to consider improving this feature by developing a more sophisticated traffic model, e.g. we might consider breaking markets down into a grid, and calculating the expected travel speed between each cell.

The Wait Time Mystery

After some simple data cleansing and some thoughtful features, we’ve gone from 9.79 to 8.01. Again we should ask the question, is this good enough? Maybe. With these sort of problems, we can always spend more time identifying features but eventually there will be diminishing returns for the time we put into it.

One nagging issue worth exploring remains: we have not really addressed the fact there there is still more noise above the predicted delivery time line/plane than below. Why would this be? This seems at least worth understanding.

Here, taking a look at the statistic known as “Postmate wait time” could be helpful. Wait time corresponds to the amount of time the Postmate spends at the pickup location parking, retrieving an order, and leaving. Postmate couriers hate wait time so we work to minimize this as much as possible. We find that on average, a Postmate spends about ten minutes of the delivery “waiting.” We account for this extra time by sending our Postmates to the merchant about ten minutes before the delivery is ready, according to the preparation time estimate we are given. Could it be that some merchants have higher wait times on average? Let’s take a look:

This is the plotted probability mass function for average wait time per merchant. For a given average wait time w, the blue bar above represents the fraction of total merchants who roughly had an average wait time of w. The red line represents a Gaussian distribution that we fit to the data. Actually, this looks a lot more like a Poisson distribution because there are no negative wait times (which makes sense). Presumably if there were negative wait times, this distribution would look much more like a Gaussian.

What the graph appears to be telling us is that some merchants in particular have deliveries with higher wait times on average. Why would this be? Two possible reasons:

Parking could be particularly difficult around certain locations
Actual preparation times may be higher than the times reported by merchants, which would cause our Postmates to arrive too early

In any case, it’s worth trying to incorporate wait time into our model because none of the features we already have capture this information. Now, we achieve an MAE of 7.40. Intuitively, the wait time explains the weird phenomenon we saw earlier where delivery time data was noisier above the predicted line of best fit, and more compact below.

How Did We Do?

We built an initial prediction model with an already respectable MAE of 9.79 with simple Linear Regression, and through thoughtful feature selection and data cleansing we achieved an MAE of 7.40. To put this in perspective, previously we had no way of telling customers how long orders would take before placing them, except that it would be under an hour. Now, we can predict delivery times without knowing the vehicle type of the Postmate with an average error of 7 minutes and 24 seconds. Moreover, we accomplished this with minimal time investment.

Though many problems merit hardcore machine learning solutions, there are many more that greatly benefit from simple models like Linear or Logistic Regression that can be tackled by those with even just a basic understanding of machine learning and data analysis principles.

A final note on implementation details: in developing the production version, we trained our model on a larger dataset.

Many thanks to Andrew Rider for his assistance in putting this post together.

Rick Fulton is an Engineering Lead at Postmates.