Rebuilding goodservice.io for 2.0

Our own arrival time predictions, granular trip details, travel times and more.

Sunny Ng
Good Service
6 min readApr 6, 2021

--

When I initially wrote goodservice.io back in mid-2018, I had one goal in mind: highlight gaps in New York City subway service using real-time GTFS data. As the project evolved, the code was extended to analyze delays and slow speeds, and it eventually carried the dual-responsibility of also being the backend service for the Weekendest—the real-time subway map I released in October 2019.

goodservice.io 2.0. Looks like the original version, but so much better.

I soon came to realize that there were fundamental flaws with the assumptions I made with the codebase that would be difficult to fix or extend without a re-write. So I went back to the drawing board, and started from scratch to rebuild goodservice.io with what I know now about the dataset. I came up with a list of goals that I wanted to resolve with this new version of goodservice.io:

Goal #1: Show granular details on which trips are delayed and where exactly there are service gaps

From day one with the original version of goodservice.io, I relied on the abstraction of train routes (i.e. 4 train, E train, etc.) with physical lines (i.e. Lexington Avenue Line, Queens Boulevard Line, etc.) to denote where train issues are happening. Unfortunately, not only was it difficult to understand to the layperson, it was also not very precise information.

Human-friendly computer-generated service change notices on the Weekendest.

The effort to have information become more precise began last fall with the introduction of computer-generated human-readable service changes on goodservice.io and the Weekendest that describe what service changes are happening and which specific stops are affected.

In the new version of goodservice.io, you can see which trips are currently delayed for (left) and which trips are causing service gaps (right), along with their estimated locations.

With the new version of goodservice.io, information about each trip is no longer abstracted away. Instead, we can now view which trips at which locations are delayed, along with information about between which trips are there larger than expected gaps, leading to long headways. This information is now used to drive service summaries, leading to more accurate display of where train issues are happening.

Goal #2: Make our own arrival time predictions

Here’s a hot take: the way (almost) every transit app out there uses GTFS-realtime data has been fundamentally wrong. There, I said it. But why? Well let’s first think about why we use this data.

GTFS-realtime (or GTFS-RT) is the data that drives countdown clocks and live vehicle predictions on apps. The reason we use GTFS-realtime data across the apps is because we don’t all live in Japan, where transit vehicles are expected to closely adhere to published schedules. Since we can’t rely on schedules, GTFS-realtime was born as a way to provide real-time statuses of transit vehicles.

But how is GTFS-realtime data generated for the NYC Subway? Well we know that the L and no. 7 trains are equipped with CBTC signaling systems, which can be used to drive this feed. On the rest of the numbered trains, the not-as-new ATS signaling system dictates it. But on the rest of the lettered trains, bluetooth beacons that have nothing to do with the signaling system are used. While it may not be the case for the rest of the routes (though I’m still pretty positive it is), my hypothesis is that for these predictions, the predicted arrival times for any station is a simple calculation of using the estimated location of a train extrapolated with the published schedule to cover the rest of its trip.

If a Queens-bound train is at 7 Av/53 St, the countdown clocks at 5 Av/53 St, Lexington Av/53 St, Court Sq–23 St and Queens Plaza would say 1 min, 3 min, 6 min and 7 min respectively for this train, because that’s what can be extrapolated from the schedule through a simple calculation.

So wait… If we can’t rely on the published schedule to tell us when a train is going to get to us, how can we rely on the same schedule to tell us how long it takes for a train to travel?

Comparison of predicted travel times between 7 Av/53 St and Queens Plaza across apps. Transit app and myMTA follow the schedule and claim it takes 7 minutes. Perhaps Google knows something we don’t, by estimating the journey taking 11 minutes instead.

Then what can we use instead of the schedule? I wish I could say AI, but what we’re doing is not that sophisticated. Currently, we use a rolling average of the time it took the last several trips to travel between every pairs of stations in order to extrapolate, while omitting some anomalies (i.e. trains that were delayed, trains running early). It’s not by any means perfect, but it’s Good Enough. Perhaps in the future we should actually use AI…

Side-by-side images showing the changes in predicted arrival times as an uptown C train progresses from 86 St to 168 St compared to our projected arrival times.
How it started… how it’s going.

Goal #3: Decrease reliance on arrival time predictions for other metrics

If we’re not relying on GTFS-realtime data for arrival time predictions, then we shouldn’t be using that to measure other metrics, like headways, either. Instead of calculating the headway by finding the difference between two trip’s predicted arrival times at a shared station, let’s use the GTFS-realtime data to estimate where each train is, and calculate the time it takes for a train to travel the stations between them using rolling averages of recent trips, as previously described.

Status of the C train, showing the projected headway compared to the expected headway according to its regular schedule, along with its projected runtimes compared to its regularly scheduled runtimes.

Goal #4: Data should refresh every 30 seconds or less

When I first built goodservice.io, it was a scrappy project that was never intended to scale. It was supposed to provide a stateless snapshot of the current status of the subway system every minute, and everything for the most part ran synchronously. However, to analyze data like delays and travel times, it became more and more important to get more accurate data that required maintaining storing states of each trip, and this data updates every 30 seconds or less. For the re-write, the project incorporated Sidekiq to allow data to be processed in asynchronously and horizontally in parallel, and relied extensively on Redis instead of Postgres as its main persistence to further boost its read/write performance. Postgres is still used to store the static GTFS published schedule data. A custom autoscaler was also written so that during times of the day where there are more active trips, more resources can be automatically scaled to handle the additional loads of data. The new version now refreshes data every 15 seconds instead of every minute.

Additional features

This re-think of goodservice.io provided a good opportunity to port some features over from the Weekendest, including the ability to see predicted arrival times for each station and station accessibility updates. Now you can bookmark and view your station’s upcoming arrival times without opening up a map.

Upcoming trips and their departure times from 34 St–Herald Sq.

Our Slack app has now been updated to support displaying upcoming arrival times for any given subway station, all within a chat window.

goodservice.io’s Slack app has been updated to include estimated train arrival times.

Similarly, some goodservice.io’s new features have been carried over to the Weekendest, such as being able to view if a train is delayed when navigation to a station, and a train trip’s schedule adherence when viewing a trip.

The Weekendest will now indicate if an upcoming trip is delayed and how a trip adheres to the schedule.

goodservice.io and the Weekendest are complementary open-source projects to provide New York City subway riders detailed and up-to-date routing and statuses using public APIs. Contributions are welcome on GitHub. Feedback can be directed to @_blahblahblah or @goodservice_io on Twitter.

--

--