What is Good Service? — Part II (The Technical Part)

Sunny Ng
Good Service
Published in
6 min readOct 3, 2018

My previous Medium post explained what goodservice.io does and why I built it. If you’re interested in learning some background about the website, please read that first. This post is focused on its technical aspects.

The idea to build goodservice.io came to me a couple of months ago. On Transit Twitter, I have been seeing tweets from New Yorkers posting pictures of countdown clocks mentioning @NYCTSubway with abnormal headways—highlighting the fact that despite MTA’s claims on their website, service was not actually good.

I joined in on the fun too, of course.

Then it hit me… what if we automate it?

When the MTA finally finished installing countdown clocks on all their lines earlier this year, it also provided access to the data that drives them to the public. And I’m a software engineer! I should be able to figure this out, right?

If MTA isn’t always going to inform us when trains are running abnormally, why don’t we figure it out ourselves using the data they provide?

The Solution

After I knew with the problem was, I needed to come up with a solution. I settled on, how does the actual headway compare to the scheduled headway? The reason I wanted to compare between actual and scheduled headways is that train service levels can often vary on different lines, different time of the day, different day of the week, different directions, etc. Just because train service may look bad, it doesn’t mean the trains aren’t being run at the level they were supposed to be at.

12 minutes may seem like a long time to wait for a train, but it could just be running as expected on the C train.

I signed up for the MTA’s developer program, which is free and open to the public. Once you’re in, you’re given an API key to access the real-time feeds (i.e. the data that shows where trains actually are), as well as the static feeds (i.e. data of where trains are technically scheduled). The real-time feeds adhere to the GTFS-RT standard from Google with some extra data. These feeds provide data on trips assigned to trains, including the direction, the route, and the predicted stop time for each upcoming stop. The static feeds follows Google’s earlier GTFS standard, and they contain a zip file of CSV files that I ended up importing into a Postgres SQL database for me to query later.

The MTA Developer Program provides access to their real-time feeds. You just need to register and obtain an API key.

When I was coming up with the user experience for goodservice.io, I knew that I wanted to alert users if a route is not running close to the scheduled headways. But that information needs to be more granular, as train routes can be pretty long and delays may only affect a portion of the route. A natural way for me to divide segments of a route in the New York City subway system is by lines. The New York City subway is unique in the way that routes will diverge and converge with other routes using the same tracks in their journeys.

The A is running via the F downtown (via Delancey St) last night, but normal uptown. The left side shows no “Scheduled Max Wait” for “via Delancey St”, because it does not normally stop there, hence lack of scheduled stop time.

Another necessary requirement for an app like this, and is specifically unique to the New York City subway, is the ability to adapt to late night and weekend changes. The New York City subway runs 24/7 and as a result, routes are often re-routed or short-turned overnight and weekends to allow maintenance work to happen on the tracks. For this app to be functional at all times, it needs to be able to adapt to changes like the A running local, the no. 2 train not running into Brooklyn, the D swapping with the F in Brooklyn, etc.

The Implementation

In the end, the best way to figure out where a route can go is to simply build in the flexibility and make no assumptions how trains will run. After importing the static data into a database, I manually seeded in data of the physical lines (i.e. Lexington Avenue Line, Broadway Line, Seventh Avenue Line, etc), including which boroughs they’re located in, and their last local and express stops for each direction. I split up the lines by stops where routes can converge or diverge. Using the last stops on each line, I can use the real-time feeds’ estimated arrival times for those stops to calculate the time difference between each train and using that to calculate the headway. Similarly, I queried the upcoming stop times for last stops on each line, calculate time difference and using that to calculate headway.

Sometimes when trains on the same line go in the same general direction, it doesn’t really matter which one comes first.

The great thing about this method is that it also allows us to display train services by lines. When I first moved to New York, I lived in Queens and worked in Midtown. During the morning rush, I would take either the M or R—whichever came first, to get to work. It didn’t matter to me which train to take, because they both provided me with ways to go into the same general direction, which was Midtown. The “By Line” section is useful for that.

I used Ruby on Rails 5.2 because not only is it the latest version, it also includes excellent support for React apps, as it now bundles Webpacker. Aside from that, it’s also the language and framework I am the most familiar with at the moment from my work. There are probably better ways to do this with other languages and frameworks, but I wanted to use something that would allow me to quickly get on my feet.

As for the user interface itself, I chose to use Semantic UI React, as it fully utilizes React and does not rely on jQuery at all. It has also provided me with many components where I did not have to re-invent the wheel (i.e. modals, responsiveness, buttons, grids).

The Challenges

I wanted to be able to differentiate between express and local trains even if they share the same line. While trains are on the same line, they may not share tracks. For instance, when there’s a delay on the express routes, it doesn’t necessarily mean the local routes are affected. The challenge is that GTFS-RT doesn’t tell us explicitly if a train is running express or local. However, GTFS-RT does tell us each trip’s upcoming stops, and that can give us clues about its service patterns. For instance, if a trip’s upcoming stops included a local stop, it certainly is running local. Both local and express trains stop at express stops, so it doesn't tell us much if a train is expected to arrive at an express stop. To figure out if a train is running express, we need to be observing that the train makes no intermediate stops between two express stations.

Unlike typical databases’ and programming languages’ implementations of time, scheduled times in this case are more or less cyclic. When you’re looking for the upcoming scheduled stop times at say, 11:30pm, you’d want to be able to get the scheduled stop times for the remainder of the day, but also the first 30 minutes of the next day. To further add to the confusion, some stop times in the static feed go above and beyond the 24-hour days we have on earth.

Wat.

Also, the current routing of the M train is a bit interesting and breaks the typical paradigm of trains in New York of simply moving either northbound or southbound. On the M train, uptown from Manhattan is towards Forest Hills and downtown towards Metropolitan Avenue. However, based on this logic uptown M actually shares tracks with downtown J towards Broad Street and downtown M shares tracks with uptown J towards Jamaica Center. So adjustments were made to group them properly.

Even MTA’s own app SubwayTime has a hard time figuring out how to group J/M/Z trains. Trains bound for Broad St and Forest Hills share the same platform, as do trains bound for Jamaica Center and Metropolitan Av.
A workaround was needed to group these trains properly.

Thank you for reading my post! Please check out my website at goodservice.io. Feel free to send me your feedback on Twitter to @_blahblahblah. You can view the source code on GitHub.

--

--