Checking the Pulse of the City: Visualizing NYC Subway Ridership during COVID-19

Published in

Two-N

11 min readSep 10, 2020

A behind the scenes look at how we designed and developed our latest project looking at NYC subway ridership and how it changed during the first few months of the Coronavirus pandemic. We are excited to share our process and project with you.

Project Motivation

“Most countries treat subway systems as national assets. They understand that their cities are their great wealth creators and equality enablers and that cities don’t work without subways.”
— Jonathan Mahler, The Case for the Subway

I have always been fascinated by the subway system in New York City. Yes, it may be falling apart currently and in dire financial troubles, but I genuinely think it is a wonder of the modern world. One of the older subway systems, it is also (during non-pandemic times) one of the most used, connecting the city and making it what it is.

Public transportation in a city like New York can serve as a significant equalizer — students, finance professionals, doctors, caretakers, tourists, and first responders all ride together to get where they need to go. In bringing together the population of all five boroughs, the subway allows the city to thrive and facilitates the vibrant density that draws so many of us to the city.

In ‘The Case for the Subway’, a New York Times Magazine piece from 2018, Jonathan Mahler beautifully writes “The subway may no longer be a technological marvel, but it continues to perform a daily magic trick: It brings people together, but it also spreads people out. It is this paradox — these constant expansions and contractions, like a beating heart — that keep the human capital flowing and the city growing.”

This metaphor of the subway as a beating heart feels especially salient as we begin to grapple with the realities of this Coronavirus pandemic. The subway is, in many respects, the city’s bloodline, pumping essential nutrients (read: workers) throughout the city, connecting vital organs with extremities. A natural question then follows, what happens when this body gets sick?

As we transitioned to working from home, my fascination with the subway took on a new tone. We stopped riding the subway and suddenly felt disconnected from the pulse of the city. That’s when the opportunity to pursue this project arose in our studio. To us, this project was a way to feel what was happening to the city while we were working from the cramped corners of our shoebox apartments. Visualization allowed us to reconnect to the city, to understand what was changing, even while we were sequestered at home.

In digging into the ridership data, we quickly realized something that should have been obvious — namely, that the ability to stay home and to avoid riding the subway was, in itself, a privilege that we had taken for granted.

We developed this project to share our observations with a broader audience, and to help us all begin to unravel what happened to the city in the first half of 2020.

Research and Early Design Iterations

We started by looking at MTA ridership datasets (turnstile and MetroCard swipes) and socioeconomic measures (from the American Community Survey) to try and get a better grasp of what story we wanted to tell.

Below you can find some of the many design iterations we went through as we tried to best understand the data. Most of these were developed using Observable Notebooks, our favorite tool for exploring quick prototypes.

As you can see these sketches range greatly from more analytical to more visual and experimental. In our early stages of project development we aim to try out as many visual representations and iterations as possible until we find ones that most effectively convey our findings.

For this project, we wanted to find an intuitive representation that would be easily graspable for a viewer new to this data, but also one that allowed for deeper engagement for those who were interested in learning more. We also wanted to find a way to evoke the more visceral embodiment of a subway’s movement, to convey the dramatic scope and sudden change of this system.

Findings and Project Goals

The main takeaway from this project was this: changes in ridership were intricately connected with the deep racial and economic inequalities of our city.

During this processes of exploration we uncovered trends that were perhaps obvious yet nonetheless shocking. This wasn’t because they necessarily contradicted our expectations (there are many, important, articles written about the unequal toll of COVID-related casualties in this country). Rather, we were struck by how clearly things we knew implicitly about inequities in the city were laid bare when viewed visually.

Map showing relative ridership at the peak of the coronavirus pandemic. Each circle represents an MTA station. The more orange the station, the more people were riding the subway (relative to that station’s baseline ridership).

By simply viewing relative ridership on a map, we immediately noticed that there was a very clear trend. Stations in Manhattan had significantly lower relative ridership rates then stations further in the periphery. Furthermore, there appeared to be very clear ‘pockets’ of high relative ridership (shown in orange in the image to the left).

Layering in census data allowed us to ask deeper questions about what these patterns meant and how different communities were affected during this period. By bringing in economic and demographic data, were were able to compare neighborhoods to each other.

For example, the images below reveal a stark contrast between two socioeconomically distinct neighborhoods. The first, a predominantly white neighborhood in Lower Manhattan, saw a ridership decrease of about 88%-97% during the peak of the pandemic, while the second, the predominantly Black neighborhood of Brownsville, saw a decrease of only about 70%. This means that over 30% of riders in Brownsville were still riding the subway during the height of the pandemic.

On top of this, the per capita income in Brownsville (the neighborhood on the right) is one fifth of that in Lower Manhattan (on the left), while we see that that the average relative ridership there was nearly six times higher.

Two contrasting neighborhoods. The color of the neighborhood represents the average relative ridership during the peak of COVID-19 in early 2020. Annotations on the side highlight census data for that neighborhood.

This was a trend we saw across the city, namely the lower the neighborhood’s per capita income, the more people were still riding the subway. By looking closer at these neighborhoods, we also see that many contain higher percentages of individuals employed in ‘educational, health and social services’ (as categorized by the American Community Survey) —presumably many of our frontline workers.

Scatterplot looking at MTA stations and how relative ridership varies as a function of per capita income.

We wanted to highlight these relationships between socioeconomics and pandemic-level ridership so we moved to a view that shows all of the subway station simultaneously across these two perspectives of ridership and demographics. Using a series of scatterplots, we were able to show changes in ridership on the x-axis, and critical social and economic metrics along the y-axis. On the left you can see one of the most striking graphs, focusing on per capita income. Here we can see clearly how wealthier communities were far more able to avoid riding the subway during this period.

Final Design Decisions

As visual communicators at Two-N, we spend much of our time thinking about the best way to convey insights so that viewers new to the subject domain are able to understand and engage with the data on their own. Below you can find some of the design decisions that we made for this project.

Finding the Best Dataset

Our goal in this project was to find data that could tell a story about subway ridership and demographics. Below we will discuss how we went about finding and deciding on the most effective datasets.

When we first started looking at subway ridership data, we used the publicly available MTA turnstile data. This provided data in roughly 4 hour intervals for every turnstile device in each subway station. Initially we were really excited by the level of granularity this dataset allowed, but ultimately decided that it added too much noise and inconsistencies to accurately tell our story. While we loved the ability to see the intra-day ebbs and flows of commuter traffic (peaks in the morning/afternoon, drops at night/over weekends), we felt that these fluctuations may detract from the primary narrative, namely a dramatic and sudden shift from the system’s baseline ridership.

We are also grateful to have had the opportunity to consult with representatives from the NYC Department of City Planning who suggested that we instead utilize the MetroSwipe Fare data (reported weekly for each station) as a cleaner and more reliable data source.

Aggregated daily MTA turnstile data vs. weekly MetroSwipe fare data.

Another data granularity decision we had to make regarded the level at which we were matching subway stations to census data. The high-level idea was to look at the geographic region surrounding each MTA station, and then pull the census metrics for that region. Below you can see a chart illustrating the different levels at which census data is reported.

Hierarchy of geographic region types for which the census reports data. Screenshot from https://www2.census.gov/geo/pdfs/reference/geodiagram.pdf

Our initial thought was that we wanted to be as precise as possible in doing this and to find the smallest meaningful geographic area for each station. So we started by looking at the level of census tract.

However we quickly realized that this was the wrong approach. By looking at these maps we discovered that census tracts were too small and arbitrarily-shaped to tie back to subway stations reliably. Furthermore we recognized from our experiences, that often people will walk further than the boundaries of their census tract in order to access the nearest subway station.

Ultimately we decided to match stations to the neighborhood in which they were located. New York City maintains a series of Neighborhood Tabulation Areas (NTAs) which, although not perfect, align more closely to our mental model of the city’s distinct regions. This allowed us to tie socioeconomic metrics more meaningfully to individual stations.

Illustration of how we matched stations (orange) to census regions (grey outlines). Census tracts are the smaller shapes on the left, neighborhood tabulation areas (NTAs) are on the right.

Choosing a Primary Metric

In order to tell this story properly, we needed to find a metric that effectively answered our main question — how many people were still riding the subway at the peak of the pandemic.

Initially we looked at the absolute numbers of riders overtime. As a metric, this choice restricted our ability to compare between stations in a meaningful way as the number of riders varied greatly between stations (imagine a small single-line station in Brooklyn vs. the Times Square hub).

We decided instead to focus on percent change, looking at how a station’s ridership changed relative to its own ‘normal’, or baseline, ridership. To calculate this baseline, we took the period from the beginning of 2020 up until New York State declared a state of emergency (March 7th). Then for each subsequent week we were able to compare it to that station’s average ridership value.

Our first intuition was to frame this metric as a station’s ‘percent drop in ridership,’ but we slowly realized that what we were focusing on in our story wasn’t the ‘drop’ per se, but the inverse of it — those people who were still riding the subway.

This may seem like a small or insignificant change (going from a ‘70% drop’ to ‘30% still riding’), but it is an important one. By focusing on the positive rate (those ‘still riding’) we were able to put the individuals, the actual human beings, at the forefront of the story, instead of reasoning about an abstracted negation (with a ‘percent drop’).

Image taken from the final project illustrating how we calculated the **baseline** for our primary metric: ‘percent still riding’. Here you can also see the decision to go with the centered aligned bars instead of a bottom aligned timeline/bar chart as we felt that this visual resembled sound amplitude waves. The visual analogy of the *‘WOOSH’* of a passing train, as well as the likeness to the peaks and valleys of an echocardiogram strengthened our decision to go in this direction.

Object Constancy and Intentional Animations

This piece was designed to visually and verbally guide viewers through the narrative in a clear and intuitive fashion. An essential part of being able to do that is by ensuring that the viewer is able to follow the story as it progresses. We can support this by maintaining the object constancy of our data elements throughout the project, or in other words, by ensuring that a data entity (a subway station in our case) can be tracked visually through an animation or transition.

Screen capture of the final project showing object constancy.

By maintaining a connection between the graphical element (the circle) and the data element (the subway station), we are able create sophisticated (and even surprising) animations, such as moving from a geographic map to a scatterplot, all the while ensuring that our audience is able to follow what is happening.

The way I see it, the power of data visualization lies in its ability to help people reason about abstract concepts by making these concepts more immediate and interactive. By creating visual ‘objects of reasoning’ that we can manipulate and engage with (like our station circles), we are able to ground and embody our understanding of complex ideas. This visual continuity is critical for supporting our cognitive processes as we reason about abstracted representations.

Opportunities For Deeper Exploration

This project was build primarily as a ‘scrolly-telling’ piece, meaning that viewers are led through a data experience simply by scrolling through the page. Often, this can be a fairly passive form of engagement. To counter that, we added some controls for individuals who wanted to engage more deeply with the data.

Going through the project you’ll notice that once you get about halfway through, a little control bar floats up from the bottom. This panel allows users to interrogate the data as it is most relevant for them. For example, by clicking the play button, one can see how ridership shifts week by week. Additionally, readers can filter for their subway line or neighborhood to get a view that is more salient to them.

screen capture showing possible user interactions for deeper engagement

Implementation Notes

This project was build using TypeScript and bundled with Webpack. For the data visualization elements and many of the transitions, we used D3js. We also utilized scrollama for help with the scroll observer and triggers.

You can see the live project here and explore the code here.

Final Thoughts

At the time of writing this, subway ridership is beginning to pick up again, but the MTA system faces one of the worst financial crises it has ever seen. I’ll wrap up this post by reiterating what a critical infrastructural feat the New York City subway is. It is essential for creating and maintaining the vibrant and economically thriving city that we have come to expect. It is also currently at risk and very much worth saving!

We hope you enjoyed our process overview and would love to hear your thoughts, questions, suggestions, and experiences.