Using Protocol Buffers with the NYC Subway

This guide assumes you know what Objects are and have a basic understanding of JavaScript and HTML. I’m a pretty terrible programmer, but I haven’t seen any decent guide for reading these feeds with JS. Please feel free to correct me @subwayexpert

What is a Protocol Buffer?

You don’t care. No. No, stop. It doesn’t matter. A protocol buffer is basically a ZIP file for an internet feed. It makes normal text ugly so the file size will be smaller. That is all you (really) need to know.

What is GTFS-realtime?

GTFS-realtime is a standard for communicating train positions and destinations that the Metropolitan Transit Authority sort of follows.

The 1,2,3,4,5,6 and S trains* are in one feed. The L train is in another feed. The Staten Island Railroad is in a third. Don’t ask me why they don’t all have their own. I have no idea. The L train feed looks so tiny.

*S: 42nd St Shuttle between Grand Central & Times Square

How do you read a GTFS-realtime feed with Javascript?

This complete and total genius has done all the heavy lifting: https://github.com/dcodeIO/ProtoBuf.js/

All you need to do is include three files:

<script src=Long.min.js></script>
<script src=ByteBufferAB.min.js></script>
<script src=ProtoBuf.min.js></script>

Get the .proto files from the MTA. There are two: nyct-subway.proto and gtfs-realtime.proto. Basically they just say how the feed is structured. Remove the .txt extension and put them in the same directory as your JavaScript.

Our sample JavaScript:

var ProtoBuf = dcodeIO.ProtoBuf;
var xhr = new XMLHttpRequest();
xhr.open(
/* method */ “GET”,
/* file */ “path/to/gtfs.pb”,
/* async */ true
);
xhr.responseType = “arraybuffer”;
var resp = xhr.response;
var builder = ProtoBuf.loadProtoFile(“nyct-subway.proto”).build(“transit_realtime”);
//FeedMessage is the “container” object of the entire feed
var msg = builder.FeedMessage.decode(xhr.response);
var jsonMsg = JSON.stringify(msg,null,4);
//prints feed object to the console
console.log(jsonMsg);
//feed will be the object that contains the feed in plain text JSON object.
var feed = JSON.parse(jsonMsg);

Making sense of the object

At the end of this (hopefully) you end up with a Javascript object. Hit F12 and click Console. That’s your object. It should look like a big mess (of JSON). There’s a header and an array of “entities” each with their own IDs. Each “entity” is a train. Yes, just a train. No abstraction. The entity data says (among other things) where the train is going and how far away it is (in units of time, not distance) from a list of stops.

Let’s look at a few attributes of a train:

feed.entity[0].trip_update.stop_time_update

stop_time_update is an array of all the information about where the train is going. Each item in the array is a different stop. Let’s look at an attribute of the first item.

feed.entity[0].trip_update.stop_time_update[0].stop_id

stop_id is interesting, because I think it’s the best way to describe the abstraction it’s describing. What? Look, every station has at least one platform, but that corresponds to riders. Riders stand on platforms and wait for trains. A station might have two platforms for two different lines, one middle platform for the same line, or two platforms across from each other.

A stop ID is not a platform, really, but the place where the train stops. For example, 601S. It’s where the Southbound trains stop on the lowest (01) station on the 6 line (the Pelham Bay Park station, if it matters). But from midnight to 6:00 A.M., 4 trains also stop there. So don’t take the stop ID too literally. It’s a signifier for a place where a train stops. We can find what the stop IDs really mean (what stations they’re in) in the MTA’s GTFS (static) zip file.

But stop ID only tells us one stop the training is going to. Let’s look at another attribute in the first stop_time_update entry: arrival.time. It tells us when the train is expected to arrive at that stop. There’s a low and high attribute here, but high is usually left blank.

feed.entity[0].trip_update.stop_time_update[0].arrival.time.low

time.low does not look like any time I’ve ever seen. Here’s an example: 1419532970575. Turns out it’s the seconds since 0:00 (12 midnight) January 1, 1970. Javascript uses a similar measurement in its Date() object, but it uses MILLIseconds (ms) not seconds. So if you want to use this with the Date() object, multiply it by 1000.

A common application of this data is to find out, like the countdown clocks in subway stations, how soon the train is coming. The feed has its own master time attribute, so you have a relative time that you can subtract from the arrival time.

Mind that there is also a departure.time attribute. It’s most important when considering terminus stations (the first stop on the route). Here, the arrival time will be null, because the train is already there and waiting to depart.

Loading the Realtime Buffer Feed

The first thing dcodeIO tells you to do is an AJAX call (XmlHttpRequest) to get the protocol buffer. Don’t bother. AJAX will not work with the MTA’s server.

I wrote a PHP script to download the protocol buffer feed, upload it to my server, then make an AJAX call to load that file from my site. If you have a better idea, I really want to hear it.

To get the files, you need to sign up on the MTA’s developer website and get an API key. It only takes a couple minutes. When you’re finished, to access a feed, use this URL:

http://datamine.mta.info/mta_esi.php?key=YOUR_API_KEY&feed_id=1

Insert your API key where it says YOUR_API_KEY (duh). The feed_id digit at the end corresponds to one of three GTFS-realtime feeds.

  • 1 is the 1,2,3,4,5,6 and S trains.
  • 2 is the L train
  • 11 is the Staten Island Railroad.

More?

The feed contains a lot more information, but this is a quick guide and honestly, I never bothered to learn what half the feed means. The complete reference is here.