GTFS Best Practices now available!

The General Transit Feed Specification (GTFS) has revolutionized multi-modal information and open transit data. Started as a simple way to represent fixed route transit systems by TriMet in Portland, Oregon and Google in 2005, GTFS data is now shared publicly by over 1,500 transit agencies worldwide, according to Transitland.

A GTFS dataset — the contents of the zip file and the text data in the stops.txt file

However, as is often the case with innovation, as more and more agencies started producing GTFS data, and more apps started consuming it, it became apparent that different agencies/app developers were interpreting, formatting, and managing data in different ways. This fragmentation makes it difficult to create and maintain an app that uses data from a large number of agencies.

Below are just a few of the common challenges that transit app developers have encountered.

Feed management

  1. Changing GTFS dataset URLs — In order to manage a large number of agencies in an app, app developers typically create software that automates the process of retrieving new GTFS datasets from agencies and integrating them into their application. So, if an agency puts their GTFS zip file at the location http://acmeagency.org/gtfs.zip, app developers will assume that new GTFS datasets will be uploaded to the same URL, and will continuously poll that URL to see if the data changed. If an agency changes this URL for a new dataset (http://acmeagency.org/gtfs-March-2017.zip), the app will never know that this new data exists.
  2. Changing GTFS dataset IDs — Let’s say a transit app user wants to bookmark a stop in an app. The app developer needs to have a way to track which stops the user has marked as their favorites. A simple way to do this is to save the stop_id of that stop to a database. Then, the next time the user wants to see arrival information for that stop, the saved stop_id is used to fetch real-time arrival information. This works great, until the next GTFS update when the stop_id for that stop changes. Now the user can’t retrieve arrival information for their favorite stop, because that stop_id no longer exists.
  3. Gaps in GTFS data coverage — Agencies typically update their GTFS data on a quarterly basis. Let’s say that it’s March, and the agency has several schedule changes that will go into effect in April. The agency scrambles to pull the new dataset together, and posts it just in time on March 31st at 11:59pm. Everything’s great, right? Well, not really. Apps need to download the new dataset, validate it, add the new data to their databases, and potentially push an update to all applications. This definitely takes more than 1 minute to do (and in many cases days or more, if there are problems in the data). This means that on April 1st, when the user opens the app, no transit service will be visible. Agencies are able to avoid this issue by using the GTFS Merge Tool to create a combined dataset for both the current and next schedule period, and by sharing/announcing this merged dataset to developers at least a week prior to the service change. This way, apps pick up the schedule change early and are ready to go when the change actually happens.

Data content

  1. Case of text — Some agencies PUBLISH ALL THEIR STOP NAMES, ROUTE NAMES, AND HEADSIGNS IN ALL CAPS! This makes text difficult to visually read in the app, and also makes it hard to distinguish between abbreviations and words.
  2. Abbreviations — When text is abbreviated, it’s not always clear what it means. For example, is “Dr.” for “Doctor” or “Drive”? Sometimes we can tell the difference visually in context, but this is particularly challenging for text-to-speech engines (e.g., for accessibility, voice-driven personal assistants such as Amazon Alexa, Google Assistant, and Siri).
  3. Loop routes — If you have a route that continuously runs in a circle, should the first/last stop be included in the trip twice, or just once? We found that the majority represent it with the same stop twice, once at the beginning of the trip and again at the end.

A solution — GTFS Best Practices

Recently, members of the GTFS community got together and discussed these and other challenges as part of an effort organized by the Rocky Mountain Institute. The output of this working group was a set of GTFS Best Practices that helps address some of the major challenges in data fragmentation (including the above-mentioned items) and provides guidance to transit agencies, vendors/consultants helping to produce and consume GTFS data, as well as app developers that make the information available in their applications.

The new GTFS Best Practices, available at http://gtfs.org/best-practices/.

I hope that these GTFS Best Practices will make it easier for agencies to understand what developers need to produce great apps and in turn produce better data. And, I hope it provides developers better guidance as to what they should expect from GTFS data feeds across a large number of agencies. Better GTFS data will also lay a solid foundation for real-time data, which will in turn increase the quality of GTFS-realtime feeds.

If you’re interested in more information on this topic, you might want to check out the following:

Acknowledgements to the National Institute for Transportation and Communities (NITC) that supported our engagement with the GTFS community as part of a project that is developing a GTFS-realtime validation tool, and to the Rocky Mountain Institute for supporting and facilitating the coordination with members of the GTFS community.