A Journey Through BART Ridership Trends
In my previous article, I took a deep dive into the origins of Bay Area Rapid Transit (BART), looked into its inaugural goals, and investigated route expansion in the forthcoming years. I wanted to set the stage for BART ridership analysis, which is the focus of this article.
As any data scientist knows, it is good practice to understand the data before delving into a research question. Accordingly, I began dissecting ridership data from 1998 to Aug 2023 on the weekdays vs weekends, most popular stations pre and post-pandemic, and post-pandemic ridership recovery across counties. My goal with this exploration is to isolate patterns, trends, outliers, and unexpected results to get a sense of the stories hidden in the data. This exercise helps to formulate a suitable research question considering the nuances and context of the data.
Before I dig into my analysis, I want to share a Medium publication I chanced upon while scouring the Internet for inspiration. Abhishek Roy and Advait have a series on decoding transportation data called Commute Chronicles: Decoding Transportation. You can’t imagine my surprise when I encountered the series — Abhishek is a friend harkening back to middle school! It was a good omen, and the silver lining was that I could lean on him for support. Anyway, you ought to check out his series! Some of my analysis in this article is in dialogue with his, while others are unique explorations. Hopefully, between the three of us, we can provide a holistic picture of the trends we observe.
Data
BART provides bountiful data on ridership dating back to 2001. Monthly Ridership files contain ridership counts in an “entry-exit” matrix format. Each Excel file encapsulate the average ridership for the month by weekday, Saturday, and Sunday. BART also consolidates exits by date and station from 1998 to date in the Daily Station Exits file.
The Monthly Ridership is highly granular, as there are 50**2 permutations of station entry-exits and 23 years of data * 12 months = 276 files to parse through. I decided to start with the Daily Station Exits, as I found it to be more robust and responsive to Pandas functions. Matrice formats can be trickier!
Exploratory Data Analysis
Ridership Data
Just as Abhishek did, I began by plotting average weekday and weekend station exits across all stations from 1998 to Aug 2023. Ridership levels were trending upward till late 2015, and plateaued in 2016, after which there was a gradual decline across weekday and weekend ridership. You can see there is a seasonality component in that ridership levels tick up during summertime and settle down during wintertime and holiday season.
Of course, the elephant in the room is the sharp decline in ridership in March 2020 with the implementation of lockdowns to mitigate the spread of COVID-19. Ridership levels have not reached pre-pandemic levels due to subsequent Work-From-Home (WFH) policies and declining population in the Bay Area, though ridership is trending upwards post-pandemic.
One of the main goals of the San Francisco Bay Area Rapid Transit Commission was to ensure that BART connects commuters to their workplace. To quantify this, I looked at ridership distribution by weekday and weekend — weekday ridership accounts for 83.5% of total ridership, and you can see how this plays out over time in the graph above. Indeed, commuters are the main contributors to BART ridership and revenue. We do see a small shift towards weekday ridership post-pandemic, and this likely has to do with BART’s concerted efforts to connect riders to weekend events like basketball/baseball games and concerts.
Station Popularity
Next, I wanted to understand the most popular or busiest stations on the weekdays vs weekends. I decided to use data from 2019 onwards, as BART extended to Antioch and Pittsburg Bay Point in 2018. There is an overlap of 8 stations between weekday and weekend exits represented by the beige region in the Venn diagram above. These 8 stations span San Francisco city and Berkeley. San Francisco region encompasses the central business district as well as the main attractions for a resident or tourist, so it is an expected result. Berkeley caught me by surprise, but after some deliberation, I figured that Berkeley students and faculty are a large portion of the Bay Area population. Other characteristics like being younger, less likely to own a car, and a higher propensity to be environmentally conscious can help explain the results.
It may be valuable to normalize ridership levels by population. Berkeley is a busy station by virtue of a young, dense population. On the other end of the spectrum, Embarcadero is not a densely populated neighborhood. Accounting for population, it may still hold its position as a popular station, due to its proximity to the San Francisco financial district.
The stations that are exclusively popular on the weekdays are 12th St, 19th Street Oakland, and Balboa Park. The first two lie in Downtown Oakland, so commuters likely traverse these stations to get to their office. Commuters who work in San Francisco may live in Balboa Park to save on rent and keep a distance from the hustle and bustle of the city.
Interestingly, two popular exits on the weekends are the airports, namely Coliseum Station, which connects to the Oakland Airport, and the San Francisco International Airport. This suggests that when Bay Area residents are not frolicking in the city/Berkeley on the weekends, they are traveling!
I did the same exercise to identify the busiest stations pre and post-pandemic. All the stations in San Francisco proper and Berkeley maintained their position. Balboa Park and 19th Street Oakland were replaced by Fruitvale and Daly City, as they were were not able to recover ridership levels post-pandemic.
Geographical Trends
I aggregated ridership data by county. You can see that San Francisco and Alameda counties recovered the most post-pandemic.
I also ranked stations by average ridership levels and plotted it on the BART route, so we can view how station location interacts with station popularity. Downtown stations are the busiest and most popular. Suburbs close to downtown districts follow suit.
Stations in the outskirts are less traveled to. Specifically, stations past Daly City, Hayward, and Concord are less busy, which were not pat of the original 1972 BART route. Residents of these suburban regions likely prefer to travel by car, as it is faster and cheaper than “BARTing” to downtown districts. Once the four stations in Downtown San Jose are built in 2028, we will likely see an uptick in ridership along the southern stations of Richmond and Berryessa lines.
Based on the preliminary analysis, I discerned factors influencing ridership levels. For my economists, I have gathered independent variables for ridership activity, which is the dependent variable in this scenario. I have listed a couple:
- Distance from downtown district
Downtown districts house offices as well as restaurants and activities to cater to the public. I will peruse current literature in public transit to evaluate whether having a separate independent variable accounting for office space is necessary over and above a variable for distance between BART station and San Francisco downtown district.
- Population density by county or zip code
- Ticket fare
BART follows a distance-based fare model, so ticket prices vary by miles travelled. I will be discussing the use of distance-based vs flat fare models for BART in forthcoming articles.
- Median income
It is a contentious point in economic research whether public transit is an inferior or normal good. I think median income of a county will be a proxy for car ownership.
- Parking space
Current economic literature also cites parking space as a driver for transit. BART operates parking at 36 stations, which can be represented by an indicator variable.
I hope this rudimentary analysis gives the readers an overview of BART ridership. In the ensuing articles, I will formulate a research question to investigate using BART ridership data.
Bibliography:
Glossary of My Articles