Better bus predictions (a lot better)

8 min readSep 24, 2018

We’re announcing on other venues today that bus predictions at the T are about to get a whole lot better. We think it’s a big win for our riders. For us transit and government tech nerds, the back story of how we got to this point is interesting too.

We love it when a user story and process story come together.

Why do bus predictions matter?

The very existence of bus predictions is a lifeline. Buses run in mixed traffic, so they almost never arrive exactly according to the schedule. Waiting for the bus inside is better than waiting outside. Knowing you have 3 extra minutes to play with your kids or finish an email or sit and stare while sipping your coffee rather than rushing to a bus stop is still amazing, 8 years later.

Even better? When you get to the bus stop and the bus comes when we said it would. Less time wasted is a win for customers, which is a good enough reason for us.

Better predictions also mean in the long run more dependable service, which means more riders. And we can use predictions internally to improve how we dispatch buses¹ — which feeds back into better predictions. It’s a tidy little virtuous circle.

What are the components of a good prediction?

Follows the goldilocks principle (but focused on “not too hot”).
Buses have inherent variability. When they are close, the time it takes to arrive depends on things like the pattern of streetlights. They could make a series of green lights in a row, or catch all reds. Further out, it is the traffic and passenger demand that are the main sources of uncertainty. Think about when a bus arrives as a draw from a distribution. We may have 99% certainty that the bus will arrive between 2 and 10 minutes from now— but that’s too big a range. When we make a point estimate for customers, we want to draw from that distribution so the bus won’t arrive earlier than predicted and passengers miss it (e.g. the 5th percentile). For dispatching, we want the most likely time the bus will arrive (e.g. the median of the distribution of predicted times).
Predicts well both in the short term and the medium term.
This is hard. It could take 1 minute or 3 minutes to get through the 2 streetlights 3 blocks from my house. In the medium term, traffic, number of passengers, and weather all come into play. If the outcomes are widely distributed, because we are predicting at the 5th percentile, we tend to end up with predictions that are much earlier than the bus actually shows. That is better than the opposite — riders won’t miss their bus — but it’s not a great experience because they have to wait for their bus longer than expected. It is not a problem unique to the MBTA, and projects like bus lanes, TSP, and a better fare system which leads to all-door boarding will mean better bus service — and better predictions.
Predicts for all vehicles that are running.
Riddle me this: if we don’t tell you a vehicle is coming toward you, and you are relying on our predictions, is it really arriving?
Differentiates scheduled and real predictions.
Far in the future, we can’t make a better prediction than the schedule. If it is the first trip of the day for the driver, and they haven’t logged on to the bus yet so we know they are on their way, we don’t have a better prediction than the schedule. If the GPS unit on the bus isn’t sending information, the same. A good prediction makes these circumstances clear.
Can handle interlining, other things that complicate predictions.
Our buses often go from one route to the next to make their schedules more efficient. This means that the bus you take may be on a totally different route when you’re looking to see when it’ll arrive at your stop. A good prediction can handle that complexity.

How do you measure prediction accuracy?

First some numbers. On an average day our buses make 450,000+ stops. For this exercise we record predictions every 15 seconds, so long as they have changed since the last prediction. That makes for, give or take, over 30,000,000 predictions, or about 70 predictions for every stop a bus makes. That is a lot of data, and it gives us some confidence in the results that follow.

When we measure predictions, we want to understand how accurate they are at different time scales. Each time scale has a different range for when we consider a prediction “accurate.”

Short term (0–3 minutes away): Between 1 minute early and 1 minute late
Medium-near term (3–6 minutes away): 1.5 minutes early to 2 minutes late
Medium-far term (6–12 minutes away): 2.5 minutes early to 3.5 minutes late
Long term (12–30 minutes away): 4 minutes early to 6 minutes late²

In addition, we compare performance on weekdays vs. weekends, different types of routes, and so on. We do this for every stop on every trip on every route our buses make. We also examine the proportion of total stops our buses make that actually have predictions. After all, if predictions are 100% accurate, but they only show up half the time, what good is that? And we look for predictions made that don’t correspond to a recorded arrival — because if 100% of a route’s stops are predicted, but the predictions include dozens of trips that didn’t actually run, that’s no good either.

How good are the predictions now?

They’ve served us well for the better part of a decade.

Right now, on an average day, our predictions are 75% accurate.

The spread across time periods is relatively consistent: 76% fall within 1 minute when the bus is 0–3 minutes away; 74% are accurate in the 3-6 minute bucket; 78% in the 6–12 and 12–30 minute ranges.
These basic numbers hold within a few percentage points across all types of routes, across weekdays and weekends.
Approximately 99.7% of stops that our regular bus service make have predictions.
They’re are also weighted to predicting that buses will come after the prediction than before the prediction. For example, for buses 12–30 minutes away, 25% of predictions were up to 4 minutes early; 52% were up to 6 minutes late, and 23% were outside both of those windows.

They could be better.

How much better are the new predictions?

We’ll move to 84% accurate based on the data collected during the procurement.

What does that mean? If you take the bus every day, today you probably see an inaccurate prediction 2–3 times a week. From now on it’ll be 1–2. The details:

Weekend predictions — when buses are less frequent, were 86% accurate.
Predictions in the short term were 82% accurate, in the medium term 83% and 85% accurate, and in the long term 82% accurate. The highest accuracy is for trips in the sweet spot of when you would be thinking about heading to your stop.
Predictions are weighted toward predicting buses will come earlier than they actually do, so you don’t miss your bus. For example, in the 12–30 minute bucket, 12% of weekday predictions were up to 4 minutes early, 71% were up to 6 minutes late, and 17% were outside both of those windows. That means fewer missed buses for our customers, across the board.
Predictions are also updated more frequently. The new system will make more than 50 million a day. That means if something goes wrong, you’ll know about it sooner.
We maintained that 99.7% of stops have predictions.

AND, we’re rolling out more frequent bus location updates. As of this morning 151 buses have them, and the entire fleet will be wired over the next 2 months, which will further improve our accuracy — especially for short term predictions.

How do we know that our predictions are better?

We don’t make the predictions ourselves. We use companies that do this for lots of other customers, so we can get the benefit of their specialized expertise.

Contracting is hard. Sometimes because of inane rules. But often because the contracts are high stakes — we need to think about all the ways they could go right and wrong in advance. When you are switching a prediction system that affects hundreds of thousands of customers, you really don’t want to mess it up.

Most of the time, that means that our requirements for integration, and for tools to help us keep the system up to date, result in a selection process that prioritizes the wrong thing. It means that vendors tell you they have good predictions, rather than show you.

This time around we changed that. Yes, we have the other requirements. And we care about price. But, for the first time in real-time information system procurements, (that we know about — let us know if we’re wrong!) the main basis for our contract award was the accuracy of the predictions. Here’s how we did it:

We provided our bidders with a historical feed, so they could calibrate their systems.
We then provided them with an actual real-time feed for 2 weeks.
We compared their predicted arrival times to our own record of actual arrival times to see which predictions were more accurate.

Our chosen vendor was better than their competitors — for our data, right now. Starting in October, Swiftly will be providing predictions on all MBTA bus routes.

We hope this helps benchmark for the industry. How can our vendors make better predictions for everyone, if they don’t know how they stand against the competition? We also hope this encourages more agencies to share their predictions and to adopt prediction quality as the #1 factor in deciding which vendor to use.

What’s next?

More frequent real-time updates. This means being able to watch the bus and make your own judgement. But it also means better predictions. If we can tell if a bus is stopped at a light or stuck in traffic, we can be more accurate, especially in the short term. We wrote about it about it a few months back, and WE STARTED ROLLING IT OUT THIS WEEK. 151 buses have it as of publishing, and more each day.
Fewer phantom buses. They are the worst — when you get what look like real-time predictions for a bus, only for it to disappear from your app when it gets to your stop. That happens often because of scheduled predictions and interfaces that don’t do a good enough job differentiating. What we need is to have better data so that we know when the bus is running but we just don’t have real-time information, versus when the bus actually isn’t running. We’re working to combine various systems and update dispatch processes to deliver on this promise to our customers.

And there’s so much more. Want to help? Come join our amazing team.

¹It’s more complicated than this because of different needs for the predictions. Such is life. But this is a good way to think about it.

²Of note is that we’re not strictly using the 5th percentile to judge for accuracy here — the standard is slightly looser. In practice we’re willing to accept slightly more error on the early side in order to gain a tighter distribution for accuracy.