The Science of Wait Time: How We Built Facility Insights

Piero Cinquegrana
motive-eng
Published in
4 min readDec 3, 2019

In August 2019, we launched KeepTruckin Facility Insights. A product that accurately predicts truck dwell times for every major shipping facility in the U.S. is a significant leap forward for the trucking industry. While the idea may seem simple, building it wasn’t easy. Pulling this off at scale required a large network of trucks generating location data, human annotation, ironclad safeguards for data privacy, and creative data engineering and machine learning.

Prime Directive: Protecting Carrier and Shipper Privacy

We started with a principal condition around data privacy and anonymity — ensuring we could present dwell time information for a large number of facilities without exposing customer data. We thought long and hard about how to maintain the privacy of both our customers on the carrier side as well as facilities on the shipper side.

We put safeguards in place to ensure that users cannot back out which carriers visited which facilities, how often facilities are visited, or how much traffic a specific carrier is contributing to overall activity at a facility. We also made sure that facilities with exclusive carrier-shipper relationships weren’t surfaced into the broader tool.

Step 1: Where Did the Truck Stop?

Detecting when a vehicle is stopped in order to load or unload — an event that we call a stoppage — is foundational to this product. Seems trivial, right? Well, not exactly. Drivers pull over to make a phone call, they get stuck in traffic jams, or are involved in collisions. From the very granular view of ELD messages, we only see GPS coordinates (latitude, longitude) like these:

GPS coordinates are insufficient to identify facility visits.

Using GPS coordinates alone, a vehicle may seem to be stopped anytime it slows down in gridlock or idles for a minute or two. We needed to distinguish these instances from visits to a facility! By using additional data, such as whether the truck is loaded or unloaded, the gas tank is full, or the engine is turned on or off, we were able to build a more complete picture.

Step 2: Identifying Facilities with Meter-Level Precision

After we identified stoppages, the next questions were where trucks were stopping, and which of those stops corresponded to logistics facilities visits. It became clear that we would need precise borders for each facility; after all, warehouses can be next to each other, or they can be near restaurants where drivers stop for long periods of time. And, of course, trucks sometimes pull over on the side of the highway to rest.

To get this right, we had to detect entries and exits down to meter-level precision. This is where humans and machines had to come together: machines identified facilities, and humans manually geofenced the areas with high precision. We invested heavily in this; over 100,000 hotspots of stoppages have been geofenced, and we continue the process at the time of publishing, adding thousands of new facilities each week.

We geofenced facilities to identify stoppages.

Step 3: Modeling Dwell Time — Even for Facilities with Few Visits

A key feature of Facility Insights is the ability to view dwell time predictions for different times of day, and day of the week. However, even with a robust set of historical data, some facilities had limited observations for specific hour-day combinations; for instance, one warehouse did not have much data for Tuesday at 8am.

We used machine learning algorithms to learn patterns of similarity and make predictions for facilities even when data was sparse.

For the Facility Insights tool to be useful, it must estimate dwell times for all the days of the week and times of day — and those predictions must be trustworthy, so that our users can rely on them to make important business decisions. We couldn’t show raw historical numbers, due to both privacy reasons and the fact that, for smaller facilities, we simply didn’t have enough data for all times of day and days of the week to make a reliable reading.

We tackled this issue by building a custom machine learning algorithm to model dwell time estimates based on warehouse-specific patterns, along with historical data of other warehouses that behave similarly to the one in question. This enabled us to predict dwell times for all facilities, typically to within about 20–30 minutes, while protecting the privacy of all facilities. As trucks in our network continue to visit facilities, we adjust the predicted dwell times to reflect recent data and improve our estimates.

What Have We Learned So Far?

We’ve identified over 120,000 facilities and counting, and our estimates reflect over 6 million unique data points in only the last six months. Our customers are already using Facility Insights to choose their loads, plan their visits to facilities, and keep records of excessive detention time.

Analyzing our data, we see some interesting trends. To name a few:

  • Mornings are busier than afternoons: Arriving at 4pm instead of 10am can reduce dwell time by almost 15% (from 74 minutes to 63 minutes), although we realize that many drivers prefer morning visits.
  • Colder weather means more idling: When we studied idling time and fuel consumption in Minnesota, we saw that drivers typically idle about 20% longer per facility visit in January than in May, translating to almost 20% more fuel consumed.

These insights are just the tip of the iceberg. For the first time, trucking companies can make truly data-driven decisions about the shippers they work with to reduce dwell time and maximize their drivers’ hours of service (HOS).

--

--

Piero Cinquegrana
motive-eng

Piero is a Data Engineer at Facebook Reality Labs. Piero held prior data science roles at KeepTruckin, Qubole, Neustar and MarketShare.