Aaron Brezel, Aditi Hudli, Mythili Sankara, Tristan Orlofski
Our analysis of the pedestrian and vehicle data of the Brooklyn bridge for Assignment 4 of this course inspired us to study the correlation, if any, between weather and transportation.
Our scope began to gradually expand with the datasets we studied to understand transportation choices based on weather conditions in New York City.
Our initial idea was to look at the most popular mode of transportation- the MTA- and use the statistics to understand Citi Bike and cab movement across the city. The large swaths of unorganized and unruly MTA data posed sorting and computing challenges that we believed were better served focusing on a narrow yet well-documented set of transportation data. We, therefore, scaled down to For-Hire-Vehicles and Citi Bikes operating between the three nerve centers of the city- Grand Central, Penn Station and Time Square and later focused on two of these main stations- Grand Central and Time Square.
With regards to the data analysis, we initially spread ourselves thin by analyzing the poorly documented turnstile data at the three MTA stations. The idea was to note the number of people entering and leaving these stations every four hours and understand how, if at all, the exit of commuters from these station points influenced nearby Citi Bike and cab pickup. We later decided to focus purely on For-Hire-Vehicles and Citi Bikes, and additionally study their behaviors separately.
Our idea was to study these numbers and infer if weather conditions influenced the trends. In order to do so, we picked March 2018, which included days of varied temperature. We initially picked weekdays so as to make sure there were few other factors such as holidays influencing transportation choices but later included the month in its entirety.
We looked at three different stories for our idea, design and data visualization. We primarily looked at Todd Schneider’s visualization piece titled ‘A Tale of Twenty-Two Million Citi Bike Rides: Analyzing the NYC Bike Share System’, ‘Punishing Reach of Racism for Black Boys’ published in the New York Times on March 19, 2018, and ‘Why Cops Shoot’ published in the Tampa Bay Times in April 2017. While our methodology and data analysis were guided by Schneider’s piece, we looked to NYT and Tampa Bay Times for ideas for visualization. Our idea to represent commuters with moving dots was largely inspired by the latter two publications.
We collected our Citi Bike data set, which consisted of individual rides, from the Citi Bike website. Both the yellow cab and the for-hire vehicle data set, also of individual rides, came from the New York City Taxi and Limousine Commission.
Our initial plan to cover the entire city for an entire year was quickly proved not feasible. The sheer size of the data sets did not allow it. Instead, we decided to focus on a single month, March 2018 and two specific locations, Grand Central Terminal and Time Square. We decided on March, 2018 because of its variable temperature and weather. Over the course of 31 days, daily high temperatures varied from 39 degrees Fahrenheit to 62. In addition, there were several instances of snow during the month.
There were existing variables in the data that allowed us to filter for the individual rides we wanted. The Taxi and Limousine Commission uses a set of geographic “zones” to divide up sections of New York City for administrative purposes. Meanwhile, Citi Bike stations are named and Motivate, the company that operates the service, maintains a handy map that lists the locations of every station.
We could identify yellow cab and for-hire vehicle Rides around Grand Central Terminal and Time Square by searching for trips that began within zones 100, 162, 170 or 230. Meanwhile, we could select the bike rides we wanted by filtering for entries that left from either Pershing Square North, Pershing Square South or Broadway and West 41 St.
The final filter we added was to only include for-hire vehicle trips from rideshare apps like Uber and Lyft. This was accomplished by cross-referencing dispatch base information in each ride with a directory of base/rideshare app affiliation.
The heavy data lifting was performed in R using the dplyr R package. Once the March data was loaded, we used dplyr to filter the data. Limiting our scope to just the month of March and just Grand Central Terminal and Time Square significantly reduced our data load. For example, just limiting March yellow cab data to those four zones reduced the number of rows from 9,430,376 to 1,178,816.
When designing our interactive visualization, our main goal was to represent a large number of riders in as condense and visually digestible format as possible. To that end, we took inspiration from the New York Times piece on race and class and the Tampa Bay Times article on why cops shoot. Both represented people as individual shapes and both aggregated and manipulated the shapes (people) to boost the message of their story. We believe that anthropomorphizing dots to people boosts the effect of our visualization as well.
We would represent individual rides as dots and animate them in an author-driven format to illustrate how the weather affected ridership. Of course, limitations of data processing capabilities and scaling between different variables forced us to make some compromises.
First, as we experimented in D3, it became clear that animating nearly two million dots on the screen would present both processing challenges for the web browser and would potentially visually overstimulate the user. This prompted us to limit the number of dots on the screen. Instead, we could represent the individual rides as a proportion of those dots. In making this decision, however, we created another problem. The number of Citi Bikes used in March 2018 is several orders of magnitude less than either for hire vehicles or yellow cabs. There was no way we could represent all three variables in the same group of dots without significantly distorting the data.
So, we resolved in our limited time, to separate bikes and cars into two separate columns on the web page and run them concurrently as the user scrolled. To preserve some visual clarity, bikes would continue to be represented as dots while cars would now become squares. Each shape would also represent a different number of people.
Our final visualization was assembled using Scrollama to handle DOM element positioning and D3 to produce the visualizations. The only major D3 hurdle we had to overcome was making sure the coordinates of each visualization lined up properly as the shapes transitioned from their initial box to their graphs.
Our main challenge was cleaning and managing the scope of data, because of which we had to scale down the scope of our project. Going forward, we would like to explore ways to utilize all the data we managed to acquire and clean. Further, we would like to expand the scope of the project to include more areas from within New York City and provide users with the option to choose points between which they would like to study transportation behavior. Additionally, we hope to be able to include variables beyond just weather that influence modes of transport such as economic conditions in different regions, availability of parking spaces and the distance of financial hubs from these regions. Together, we believe, these data points and the resulting visualization would provide authorities of transportation, disaster management and city planning with actionable data that can be used to make crucial decisions.