Visualizing Pittsburgh Transit

What does 6,500 trips look like?

Steve Cotter
5 min readSep 14, 2016

The buses are slow. They’re always late and overcrowded. They don’t run often enough. The experience of Port Authority can leave you with complaints like these. But when you’re riding, or more likely waiting for, the bus or T, do you ever consider how large and complex our transit system is?

Port Authority makes over 6,500 trips every weekday, racking up 213,000 rides. That’s like moving nearly 20% of the entire county’s population every day. How can you fathom 6,500 trips per day? What does that look like? Let’s find out.

Many transit systems publish their schedule data in the General Transit Feed Specification (GTFS) format. These files describe all the system’s routes, stop locations, trips, and stop times. Mapping services like Google use these data to provide transit directions.

First, consider the routes. Port Authority has over 300. GTFS data include a shapes table of the latitude and longitude point sequence for each route. To plot these, I used PostGIS to convert them into lines and exported them to GeoJSON. Using Leaflet and D3.js, I mapped them.

Port Authority’s routes based on GTFS data. Busway, HOV, and light rail routes are color-coded. Routes are slightly transparent, so darker lines are overlapping routes.

Mapping the routes conveys the extent of the system and highlights the busway and light-rail routes. Routes are color-coded. Those that begin with “P” and operate on the East Busway are shaded purple. “Y” routes are yellow for the South Busway. And so on.

The route map also illustrates Port Authority’s direct service model, which is pretty unique. Notice how far out many of the busway routes go. As they enter the urban core, suburban express routes jump on the busways to bypass congestion and go all the way downtown. There’s no need to transfer.

Now that we have the routes, we need the buses — trips in GTFS lingo. The data provide the stop coordinates and times for each trip. So at any given moment, if a bus has a scheduled stop you can easily tell where it is.

Now comes the tricky part. When a bus isn’t at a stop, it’s obviously somewhere between. But where? I made the assumption buses proceed at a constant rate between stops. So, if a bus stops at Point A at 9:00 AM and Point B at 9:10 AM, I assume that at 9:05 it’s halfway between A and B along it’s route. This is simple linear interpolation.

The SQL is not so simple. For buses that don’t have stops at the given moment, I find the previous and next stops. Then, I find the points along the route that are closest to the stop coordinates and cut the route line between these points. Finally, I interpolate a point along the route substring based on the fraction of minutes that have passed since the last stop. Rather than trying to do all of this at once, I wrote the query for a particular time of day. With a simple Ruby script, I ran it for each minute of each hour of the day. Finally, I used Node.js to transform the output into a smaller data format.

With the locations of all the active buses at each minute, I added them to the map.

The scheduled location of Port Authority’s weekday trips at 9:00 AM based on GTFS data. Busway, HOV, and light-rail routes and trips are color-coded.

Now let’s animate the whole system and see what 6,500 trips look like. While D3.js seems like the obvious choice, it turns out that animating thousands of SVG elements on the page will crush your browser. The canvas element with requestAnimationFrame is a better choice.

Using the canvas introduces another complication — the animation is way too fast. Most browsers animate at around 60 frames per second. At that speed, animating the entire weekday would take 24 seconds. But if we slow things down too much, the animation won’t be smooth. To address this, I added more interpolation in JavaScript. But this comes with a tradeoff.

We only know the locations of the buses on the routes at one-minute intervals. Interpolating between those intervals means our buses won’t necessarily stay on the routes. At low zoom levels, this won’t be very noticeable. But zoomed-in, our buses will deviate from the route lines. The only way to fix this is to run the visualization at a higher resolution than one minute. But the position data are already over 5MB compressed. So small deviations are worth the tradeoff.

The result of all this wrangling is a pretty smooth animation of all weekday trips. At peak times, there are over 430 active trips throughout the county. That’s a lot of of buses. So the next time your bus is late, remember there are hundreds of other buses out there trying to get thousands of other people where they need to go. Designing and managing such a system can’t be easy.

All trips at around 6:00 AM.

Leaflet takes care of nearly all the details of zooming in to the animation. I only have to adjust the size of circles so they’re not huge at high zoom levels. I also added controls to allow pausing and adjusting the speed.

Getting closer makes it easier to see how much is going on downtown during rush hour. Many of those 6,500 trips start or end downtown. Notice how quickly buses get in and out on the busways compared to mixed traffic like Liberty Avenue or 5th Avenue.

Buses circulating downtown at around 6:00 AM on a weekday.

Rush hour in Oakland makes it clear why Port Authority is considering bus rapid transit.

Buses running through Oakland at around 4:30 PM on a weekday.

If you’re interested in the details of data manipulation and animation, all the code is available on GitHub. Refer to the README file for an overview of the process. The code could be modified for another transit system’s GTFS data.

To improve the animation, the zoom behavior could adjust the frame rate and interpolation. Real-time data could also be used to show actual rather than scheduled trips. But Port Authority’s API restrictions and rate limits make it difficult to gather all the data at sufficient intervals.

Have comments or other ideas for improvement? Leave a response or fork the repository!