Subways and Transit Updates in Real-Time

Tim Spann

Published in

Cloudera

6 min readFeb 13, 2024

Apache NiFi, Apache Kafka, Apache Flink, JavaScript, Python, GTFS, Postgresql, SQL

— — All Transit Systems: Add cache of system list in Postgresql

— — Using the Database Schema Registry in Postgresql

— — Adding MTA Bus Systems to All Transit Systems

— — Mobility Database Catalog

Source Code: https://github.com/tspannhw/FLaNK-Transit

The Real-time Data Feeds for MTA Subways produces an interesting version of GTFS data that I wasn’t getting before. This is Trip Updates, Vehicle Positions and Alerts all in one file. Well that’s a problem. So let’s fix it with Python.

The way to fix this was to use the GTFS Python library to split these into the three separate files and then just output one. You select which one with a parameter. I also found out that MTA requires a login as an HTTP header so we had to set that.

After I did the subway data, I found a number of other MTA feeds and another local one for me in Pennsylvania. SEPTA is the transit system there and they have a lot of data as well.

There is a free list of subway stations, I will next make a generic processor that does a lookup like the Halifax system.

https://data.ny.gov/resource/i9wp-a4ja.json

There is also a feed for the status of MTA Subway stations. I should look at that as well. The more feeds we get whether relatively static, batch or streaming we can contain to add them to improve our analytics, predictions, ML models and Generative AI. It seems that storing the tabular data in PostgreSQL, text data in a vector database (and maybe a search engine like Apache SOLR as well) to augment and enhance GenAI predictions is useful. It is also data that we can distribute to all of our multi-hybrid cloud heterogenous data platforms and systems. I probably will land a wide version of this data in Kudu, Ozone/S3/Object Storage, HBase and/or Iceberg. I will experiment, perhaps all as data pricing is cheap.

There are also dimensions to add from the TRANSCOM agency for road status, street and highway cameras, weather, haze, aircrafts, news, government advisories and alerts.

FLaNK-python-processors/GetGTFSCompoundFeed.py at main · tspannhw/FLaNK-python-processors

Many processors. Contribute to tspannhw/FLaNK-python-processors development by creating an account on GitHub.

github.com

MTA Subway Data to HTML Viewer

We made the URL and Key sensitive to protect them, this is easy in Python.

MTA Stations

Data Source:
https://data.ny.gov/resource/i9wp-a4ja.json

InvokeHTTP — call JSON REST
SplitRecord — into 1
EvaluateJsonPath — extract fields
QueryRecord — drop coordinates array
UpdateRecord — add UUID and Timestamp
UpdateDatabaseTable — prepare the records for SQL, build a table if it doesn’t exist
PutDatabaseRecord — insert into Postgresql
PublishKafkaRecord_2.6.2 — send records to Kafka topic (mtastations) as JSON
RetryFlowFile — try again on failure

Example Record

{
"DIVISION":"IRT","LINE":"Broadway-7th Av",
 "BOROUGH":"M","ENTRY":"YES","VENDING":"YES","STAFFING":"NONE",
 "STATIONLATITUDE":"40.840556","ENTITY":"","DAYTIMEROUTES":"A C 1",
 "ENTRANCEGEOREFERENCE":"[-73.940083,40.841024]",
 "NORTHSOUTHSTREET":"","ENTRANCETYPE":"Stair",
 "ENTRANCELONGITUDE":"-73.940083","STATIONNAME":"168th St",
 "STATIONGEOREFERENCE":"[-73.940133,40.840556]",
 "TS":"1707855985771","CORNER":"","EXITONLY":"NO",
 "EASTWESTSTREET":"","ENTRANCELATITUDE":"40.841024",
 "UUID":"1c0fe294-1edb-4122-abcb-8c117ac396f3",
 "STATIONLONGITUDE":"-73.940133"
}

select * from mtasubwaystations m 
order by borough asc, division asc, line asc, stationname  asc

OTHER DATA SOURCES

RESOURCES

API Documentation

Edit description

www3.septa.org

Maps | SEPTA

Download SEPTA's transit and street maps for the Greater Philadelphia service region. Use our Trip Planner and…

wwww.septa.org

API Licence Agreement

Looking for GTFS-realtime feeds for agencies that work with us? Request a Swiftly API key here. BY CLICKING ON THE…

www.goswift.ly

Developer resources

Landing page for MTA Data Feeds

new.mta.info

Parsing GTFS format transit data in real time with Python

This week I started playing around with real-time transit data from New York City’s Metropolitan Transit Authority…

bennettgarner.medium.com

GitHub - jonthornton/MTAPI: JSON proxy server for the MTA's realtime New York City subway feed

JSON proxy server for the MTA's realtime New York City subway feed - GitHub - jonthornton/MTAPI: JSON proxy server for…

github.com

MTA Subway Hourly Ridership: Beginning February 2022 | State of New York

Edit description

web.mta.info

Finding the Best Way Around

Utilizing Real-Time Transit Data for Travel Optimization.

medium.com

GitHub - tspannhw/FLaNK-MTA: MTA Data Sources

MTA Data Sources. Contribute to tspannhw/FLaNK-MTA development by creating an account on GitHub.

github.com

Subways and Transit Updates in Real-Time

FLaNK-python-processors/GetGTFSCompoundFeed.py at main · tspannhw/FLaNK-python-processors

Many processors. Contribute to tspannhw/FLaNK-python-processors development by creating an account on GitHub.

MTA Subway Data to HTML Viewer

MTA Stations

OTHER DATA SOURCES

RESOURCES

API Documentation

Edit description

Maps | SEPTA

Download SEPTA's transit and street maps for the Greater Philadelphia service region. Use our Trip Planner and…

API Licence Agreement

Looking for GTFS-realtime feeds for agencies that work with us? Request a Swiftly API key here. BY CLICKING ON THE…

Developer resources

Landing page for MTA Data Feeds

Parsing GTFS format transit data in real time with Python

This week I started playing around with real-time transit data from New York City’s Metropolitan Transit Authority…

GitHub - jonthornton/MTAPI: JSON proxy server for the MTA's realtime New York City subway feed

JSON proxy server for the MTA's realtime New York City subway feed - GitHub - jonthornton/MTAPI: JSON proxy server for…

MTA Subway Hourly Ridership: Beginning February 2022 | State of New York

Edit description

Finding the Best Way Around

Utilizing Real-Time Transit Data for Travel Optimization.

GitHub - tspannhw/FLaNK-MTA: MTA Data Sources

MTA Data Sources. Contribute to tspannhw/FLaNK-MTA development by creating an account on GitHub.

Screen Shots

Written by Tim Spann