Can we make Montreal’s buses more predictable? No. But machines can.
How we’re using machine learning at Transit to improve real-time predictions for Montreal’s STM buses 🙂🚍👉🤩🚍
The bus. It’s late. It’s early. It’s five minutes away. Six minutes? Two minutes… Okay nowwww you see it. And here it comes. Nice.
We have one job at Transit: to give you accurate ETAs. When your ETAs are off? The guillotines get rolled out and the pitchforks assemble. It doesn’t matter what city you’re in: real-time predictions for transit (or Ubers, or your own car) never seem 100% perfect. You’ve seen it yourself: ETAs going up and down, like yo-yos strapped to pogo sticks. 📈🙃📉
Why? It’s not because your bus’s GPS is bad — or your app is janky — or because the satellites fell out of the sky. It’s because calculating real-time ETAs must account for SO MANY variables.
Snow. Rush hour. Hockey game traffic. Protests. Crashes. Construction. Even when we know exactly where your bus location is, using those locations to extrapolate ETAs one, two, or ten stops down the line requires seriously tricky math. Too tricky for a human mind. We had to call in reinforcements…
How we use machine learning to improve our STM real-time predictions by ~15% 🤖
Not so long ago, bus schedules were created by sitting in the back of the bus with a pencil and stopwatch. Today, you can use GPS. Transit agencies can easily take, say, a year’s worth of “departure time” data, drop it into a spreadsheet, and spin out a schedule for a transit line — using the average ETAs for every stop.
But while GPS makes it easier to collect transit data, if you’re only giving riders an “average” time to expect their bus… using GPS data just means a schedule with better average ETAs.
Riders don’t want average ETAs. They want ETAs that reflect their real-time conditions. Enter machine learning.
Machine learning is an increasingly popular way to model complex behaviour. Instead of telling a computer “this is the formula you should use to calculate bus ETAs” we ask the computer “what formula should we use to calculate bus ETAs???” We give the computer all the historical data we have (vehicle locations, travel times, the disparity between “scheduled” departure times and the actual ones…) then let the computer run millions of simulations to find the best ETA formula…
What’s cool about machine learning? It will spit out formulas a human would never consider. It’s one thing to find a formula that fits one or two variables. But what if you have dozens of variables? Suddenly you’re no longer dealing with that simple 2-D x & y graph. You’re dealing with 3-D, 4-D, 20-D graphs… and in the case of public transit, those 20 variables are changing in real time.
For a machine, that’s no problem. As long as you have (1) the data to feed your model, (2) the processing power to sort through it, and (3) domain experts to pick the right parameters and fine-tune the model, machine learning can produce a hyper-precise, multivariable equation to model any data set.
With our galaxy-brain of public transportation expertise (and our in-house machine learning wizards) we wanted to create a formula for transit ETAs that was never so bad you missed your bus because of it. We started with our hometown, Montreal. We used two sources of bus data to train our model:
- Data from the STM
- Data from Transit riders using GO
We have a surprising amount of data for #2. We’re the most popular transit app in Montreal, and tons of our riders use our “GO” feature to generate real-time data for their line.
More than 175,000 STM trips are augmented with hyper-accurate GO data each month. It lets us update locations every second — instead of every 30 seconds.
Since GO’s vehicle locations are updated more frequently than agency’s locations, they’re useful for riders down the line who are looking for their bus. But until now we had no way to integrate those better locations into the ETA predictions provided by the STM. No way… until Ayser and Juan.
Before building our black box of magic bus predictions, Ayser and Juan had to set a benchmark. How good were our predictions? First, we had to establish what a “good prediction” was!
As the bus gets closer to the stop, we tolerate a smaller margin of error. After all: the closer it gets to boarding time, the less opportunity you have to react. With a 10-minute heads up, you can make up a minute of time by walking faster. But with a 30-second heads up to make a bus that’s coming 1-minute sooner than expected?
You’d need to find alternative transportation…
Next up, we looked at all of the predictions generated by the STM, and all of the actual travel times. We found that ~75% of ETAs generated by the STM fell within the “acceptable prediction” window defined by Ayser and Juan. Pretty good. Could machine learning make that even better?
So we feed our machine learning model data across a few main parameters, in order to more accurately predict travel times. Parameters include the:
- Time of day (to judge rush hour vs. normal traffic)
- Day of week (Sunday ≠ Monday traffic)
- How old the GPS report is (fast GO update >> slower STM update)
- Schedule delay (when the bus actually comes vs. when it’s supposed to) and
- Location variables (which can tease out correlations between lines that run on the same street, and account for travel time disparity between, say, a highway and a quiet street.)
Once our machine learning model has historical data for actual ETAs (the “targets”) and historical data for all the different parameters (which affect those ETAs), we can start training it.
The model starts off by making random ETA predictions from the parameter data (totally random!). Then it looks at the actual ETA (or “ground truth”) to see how it did. The model will automatically update its prediction formula — tweaking the weight given to each of the different parameters — to fit the data better, in its next attempt. Rinse and repeat a million times, until you have the best possible formula. Machine-made. No brains required!
With our prediction formula established, it was ready for the big time. We put it in production, for all of our riders in Montreal. When riders look up an ETA, we poll the real-time parameters (time of day, day of week, location, etc.) feed them into the formula, then spit out an ETA. The result? Transit is now able to improve STM predictions by ~15%, with better location data (thanks, GO riders! 📡) and a better prediction formula (thanks, machines! 🤖)
That’s 15% less patience required at the bus stop. 15% fewer missed buses. 15% closer to an app with perfect reliability. And our prediction engine is only going to get better, over time…
While “man-made” prediction formulas can’t account for changes to historical traffic patterns over time (without being thrown out, and re-written completely 🤢) — our machine learning model can use real-time feedback from riders to improve automatically. Bit of an upgrade from pencils and stopwatches.
So we’re excited for you to try it out. To see the impact on rider behaviour. We hope you can start budgeting for less and less “built-in” ETA inaccuracy. But even though our machines are smart, they’re not perfect. Sometimes they make errors. Good thing we have a team of engineers sorting through those errors to make your ETAs even better.
Once we perfect the recipe, we’ll be bringing it to cities beyond Montreal. What’s the ETA for that? Maybe you should ask a machine… 🤞🤖
Never used Transit before? Come try for iOS 🍏 & Android 👾
Are you an agency? Give your riders better ETA predictions, smarter trip plans, better mobile tickets, and an in-app experience so smooth they’ll bellow your praises. See why the best transit agencies team up with Transit.