Machines Can Now Tell You How Long Your Move Will Take (Part 1 of 3)

Abraar Ahmed
Nov 30, 2018 · 10 min read

Core contributors: Abraar Ahmed, Naveen Lekkalapudi
Editor: Ben Cake

Image for post
Image for post
Even the most tactile services can benefit from statistical analysis and machine learning.

By the summer of 2017, Bellhops had been around for six years and completed more than 100,000 moves. Along the way, we expanded the services we arrange from labor-only to labor with the option to include a truck. As the company grew, we noticed that one of the biggest contributors to customer satisfaction was our ability to complete moves in the time we estimated. In the initial stages, the company’s estimates relied solely on the size and type of property (as provided by the customer), but as we grew, we noticed a wide variance based on just these two factors. The rate of runovers kept increasing, and the rate of misses also increased. This is when we began to study move-length estimation in earnest.

Image for post
Image for post

Move-length estimation plays a key role in setting the proper expectations for customers. The length of a move varies significantly based on the market, the type of service, and customer behavior. It is important for us to understand the effects of each of these factors before providing a customer with an estimate. In this post, we describe the process of transitioning from statistical analysis to machine learning in order to improve and personalize predictive performance.

Till now, you probably never thought a crew of data scientists would be just who you wanted to help you move. This investigation should help change your mind.

Estimation and customer joy

Image for post
Image for post
Image for post
Image for post
Figure 1: Customer appeasement and the net promoter score tracked over a range of move-accuracy percentages.

As data scientists, we seek insights that will inform the development of our product while also remaining cognizant of the effect of the product on the end user. In this case, the end user is the customer, and the metrics we use are industry standards such as net promoter score and the percentage of moves that require a cash appeasement. As reliability and satisfaction are core company goals, these customer metrics are used throughout the business. We explored data collected over the history of executed moves, with a focus on the executed move length, in order to better understand appeasement tolerance and customer satisfaction in relation to move-estimation accuracy. We learned that customers are happy if we complete their moves in half the time that they expected, with a tolerance toward the move going over by 10 percent of the expected time, while appeasement payouts drastically increase for moves that exceed by 110 percent of the estimate. Bearing this in mind, we defined the runover-rate metric as move length less than 110 percent of the estimate and the hit-rate metric as move length between 70 percent to 110 percent of the estimate.

Image for post
Image for post
Figure 2: Comparison of runover rate without defaults being used (worked) and defaults if used (default).

Strategy and feature additions

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Figure 3: Single-market distribution of moves based on specific features.

We added markets and move types to the lookup table, as they create a discernible effect — plus, during the exploratory-analysis phase we saw quick benefits of doing so. While these weren’t the only additional features we considered, we went into this with the knowledge that at some point the Cartesian product of the options under each feature would create such a massive lookup table that it would become inefficient to parse through. In our initial analysis, we noticed effects of the following features aside from those discussed above but were held back by size constraints:
● Booking lead time
● Booking day of week
● Booking time of day
● Day of week of move
● Time of day of move
● Number of stops
● Years spent by occupants at the property
● Square footage of the property
● Presence of elevators

Image for post
Image for post
Figure 4: Distribution of man hours as defined by product of personnel and hours on the move, sorted by property size and move type.

Exploration

Product moment correlation coefficient

Image for post
Image for post
Figure 5: Correlation between all features after categorization.

Generalized pair plots

Image for post
Image for post
Figure 6: Pairwise study of categorical and quantitative features. (Highlighted: the booking lead hours for people moving from properties with and without an elevator.)

Box and whisker plots

Image for post
Image for post
Figure 7: Box and whisker plots for length of moves by property type, measured in man hours spent on them, for A-to-B moves (as opposed to single-location, labor-only services).

Empirical cumulative distribution function (ECDF)

Image for post
Image for post
Figure 8: ECDF for length of move at each property type using our full-service product (labor with a truck) in one of our busiest markets.

Distribution bar charts

Image for post
Image for post
Figure 9: Distribution of drive times varies across different markets, with the right tail indicating the varied lengths for each.

The addition of new service options introduced new variables that affected the length of moves, such as the time required to drive from a customer’s first location through all the stops to the final location. We noticed that every 10 minutes of drive time pushed about 10 percent of orders to run over, and that many orders had a drive time < 30 minutes. In response, we developed a new order flow that allows us to add small increments of time (15 minutes) to a move estimate. We also worked through User Interface solutions with the customer-experience (CX) engineering team to provide for a granular, additive layer to the earlier crew-hour increments in booking.

How do we ensure we’re providing the right recommendations?

Back-testing

Image for post
Image for post
Figure 10: Using hit rate to show significant improvement in accuracy of estimates.

From the perspective of individual service options, for our best-selling service, which includes labor with a truck, we could have prevented another 20 percent of orders from running over during the months before resetting the defaults by making use of the new default estimates while still maintaining the hit rate, which has consistently been the more difficult metric to measure ourselves against using the lookup-table-based default-setting method.

Image for post
Image for post
Figure 11: Illustrating the importance of booking to estimate.

In 2018, we focused on stability and reliability while also launching in a few metropolitan cities to test our ability to expand our markets. In a two-month testing period for those new markets—an example of which is Washington, D.C., in which we have much less data—our new estimates outperformed booked man hours by more than 20 percent. This is compelling evidence that we’re on the right track. For 2019, we have laid the groundwork for reliable, rapid expansion.

Image for post
Image for post
Figure 12: Performance in newer markets.

Initially, we aggregated the drive time between stops into the actual length of a move. The issue with this method was that it introduced noise to move-length targets that could not be accounted for consistently, such as traffic and weather conditions.

The back-testing process indicated that we could further reduce the runover rate, as well as increase accuracy, by using the actual drive time instead. In future iterations, drive time will be computed by leveraging Google’s API and then adding it to the estimate within the order flow.

User behavior, processes, and automating default settings

Clearly, creating defaults that meet expectations benefits both the customer and the company, which begs the question: how do we convince customers to trust our defaults?

Were we really using the defaults in practice?

The process to reduce user intervention and increase engineering response time

Image for post
Image for post
Figure 13: A new page in the order flow with expectation-setting text.

To make the case to the concierge team, we analyzed their performance estimating orders and then ran the numbers by them to illustrate the value of the recommended defaults. Additionally, we helped the CX engineering team design new pages in the order flow to set better expectations for the customer, and then assisted them with A/B tests to ascertain that conversion rate did not drop. In the process, we also prepared them to set up a backend process to consume new weekly defaults.

Image for post
Image for post
Figure 14: Effect of web-page changes that provide customers more context.

The process to improve data collection

Automating defaults

Image for post
Image for post
Figure 15: Setting up Airflow Directed Acyclic Graphs to automate weekly model update.

Motivating examples:

  1. In our Raleigh-Durham-Chapel Hill market, during the period of January to April 2018, we completed 80 full-service, 1-bedroom apartment/condo orders, which averaged a total of 5 man hours, with the 80th percentile of 7 hours. But from April to June we had 128 orders of the same category but the average total man hours increased to 6 hours, with the 80th percentile of 8 hours.
  2. In Atlanta, during the period of January to April 2018, we completed 27 full-service orders for 3-bedroom homes, which had an average total of 5 man hours, with the 80th percentile of 9 hours. But from April to June we had 61 of the same category but the average total man hours increased to 15 hours, with the 80th percentile of 25 hours.

This kind of disparity over small periods of time caused us to seek a more proactive solution through automation, which I will describe in part two of this series.


Does solving problems like these appeal to you? Interested in joining us?
Apply here.

Bellhop Moving

Moving can get heavy. We keep it light.

Abraar Ahmed

Written by

Learning machines, unfinished books, incomplete essays, adventure dreams, adrenaline highs, half-baked experiments, and absorbing the human condition.

Bellhop Moving

Moving can get heavy. We keep it light.

Abraar Ahmed

Written by

Learning machines, unfinished books, incomplete essays, adventure dreams, adrenaline highs, half-baked experiments, and absorbing the human condition.

Bellhop Moving

Moving can get heavy. We keep it light.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store