Transparency for the MTA: The Citizen Driven Public Data Project

Published in

cylussec

6 min readMay 21, 2019

The 54 Bus slowly navigating around an illegally parked USPS vehicle in the downtown bus network (Brian Seel)

On time performance data is now available in real time for the MTA bus system provided by an independent citizen. And that’s only the tip of the data iceberg.

In June 2017, the MTA launched BaltimoreLink, which was the first major reboot of the Baltimore area bus system in nearly 50 years. However, we need metrics to show if the launch has been a success.

Over the years, the MTA has changed how it has calculated metrics, such as the On Time Performance. Danielle Sweeney discussed the issue at length, where the MTA was seeing numbers as high as 85% when they counted no show buses as 100% on time. Sweeney later found that the MTA had possibly changed their definition of an ‘on time’ bus from 1 minute early/5 minutes late to 2 minutes early/7 minutes late.

These changes are frequently not reported, including in a 2018 report to the Maryland General Assembly, while the MTA has one of the more lenient ‘on time’ values in the country. According to Transit Center, 16 of the top 20 transit systems have an on time window of 6 minutes or better. Only DC’s Washington Metropolitan Area Transit Authority and MTA use a much more generous 9 minute window.

The MTA also does not regularly release this data, which means that citizens are only updated on this information when the MTA includes it in a presentation, an update to the Maryland General Assembly, or if someone does a Pubic Information Request and waits a few months (Sweeney only got September 2018 data in January 2019).

I decided that needed to change. After the MTA started using Swiftly to track their bus fleet, and enable the Transit App, they also made that data publicly available. A few of us made an application during the Bmore Hackathon that pulls in this data, to allow for better tracking.

Methodology

For our purposes, it allowed us an unprecedented view into the inner workings of the inner workings of the MTA bus system.

Screenshot of the Transit App showing real time data for the westbound Navy line (Brian Seel)

Note: For some of the deeper technical details behind this project, see my previous writeup.

The MTA recently outfitted all of their buses with GPS units, and partnered with a company that enabled realtime bus tracking in Google Maps and the Transit App. The MTA also makes this data available in a public API, which allows the public to have an unprecedented view into the inner workings of the MTA bus system. Just from the data being used to generate the above screenshot, we can answer questions like this:

Where are the buses at any moment
When they get to a stop
What route they are on
What direction they are going
How close together they are (headways)
How fast they are going
Are they on time
What runs are cancelled

There are slight differences in the way I collect data, and the way the MTA calculates their data. I sat down with an analyst with the MTA to compare our data, and there were slight differences for a few reasons:

They use the departure time, and I use the arrival time. On time means that we need to select a specific point in time to act as our time point. Because of the data I collect, I count the bus’s arrival time, while the MTA counts its departure time, which means that dwell time counts against their numbers.
Swiftly is able to go back and clean up data, while my data is real time. If a bus stops right before a stop, it could either mean that its at a stop light, stuck in traffic, or picking up passengers. Swiftly is able to go back and figure out when the bus most likely picked up passengers, even if it was significantly before or after a bus stop. Sometimes the bus stop location is just plain wrong in the feed.
I only pull every 60 seconds. I am using an Amazon Web Services large instance. While it has some power behind it, an application like this requires an immense amount of horsepower. I don’t have the resources to poll more frequently.

Because of this, I saw a difference of about 2–3% in the on time performance data I was seeing compared to theirs. On average, my data was 15 seconds faster than theirs, which makes sense if I am counting arrivals and they are counting departures.

The Data

The OTP data for the month of April is available here: http://bmoretransit.cylus.org/otp_dashboard_april.html

On the left is the On Time Performance using the metric of 1 minute early and 5 minutes late (pink is on time, red is early, and orange is late). On the right is the OTP using the metric of 2 minutes early and 7 minutes late (green is on time, blue is early and purple is late).

The graphs on the left show the on time performance if we use the metric of 1 minute early and 5 minutes late. On the right is the same data if we use 2 minute early and 7 minutes late (which is the metric the MTA currently uses). Overall, changing the ‘on time’ window from 9 minutes to 6 minutes drops the overall ‘On Time’ number from 67.76% to 52.19%. Below that is the graphs for each of the high frequency routes the MTA serves.

We can slice and dice this data in an almost infinite number of ways. For instance, we can find out that the LocalLink buses actually have a higher On Time Percentage than the system as a whole.

The OTP if we remove the Citylink routes.

We can even look past OTP data. For instance, if we look at the CityLink Navy bus eastbound, we see some interesting bus bunching going on. I arbitrarily defined bus bunching as buses that are less than 2 minutes apart.

Why is there a bump over the section with the dedicated bus lane? Do buses just move more quickly through that area, which causes more frequent buses? I tried looking at if rush hour made a difference (M-F from 6–9a and 4–7p), but the bump was there on the weekend, during the middle of the day, and during rush hour. The best theory I have heard is that overloaded buses usually have lightly loaded buses behind them, and the bus lanes make it much easier for lightly loaded buses to catch up.

We can even look at data from specific runs, such as the instances where two Navy buses left Mondowmin within 60 seconds of each other.

Trip data from recent Citylink Navy runs where two buses started their route within 60 seconds of each other, and were immediately bunched.

The amount of data, and the number of ways to slice it is basically endless.

So many questions

There are so many answers in this dataset, and the dataset is growing every day. If you have suggestions for graphs, leave a comment on this story, or send me a message on Twitter. I will take suggestions for future entries. I will also start putting this data out on a monthly basis.

Also, if you would like to help with server costs for this, consider chipping in on Patreon. This is a heavy, data intensive application, and server costs are about $100 per month. Any little bit helps make this a sustainable project.

Transparency for the MTA: The Citizen Driven Public Data Project

Methodology

The Data

So many questions

Written by Brian Seel