The Startup
Published in

The Startup

Spatial Data Analysis and Visualization With Chicago Ride-Hail Trips Dataset

Comparing Single and Shared Trips and A Mini Network Analysis

In the 2019 report titled Transportation Network Providers and Congestion In The City Of Chicago, Transportation Network Providers (TNPs), i.e. Uber, Lyft and Via, are attributed as the major cause of congestion in Chicago’s downtown area.

In response to the issue, in January 2020 the mayor’s office put forward a congestion pricing policy that targets TNP trips. The policy includes a penalty on single TNP trips, i.e. trips with only one passenger, and a surcharge fee on trips starting or ending in a “Downtown Zone” on weekdays between 6AM and 10PM. According to City of Chicago website,

Effective January 6, 2020, Chicago’s landmark congestion policy will combat the plague of congestion, promote sustainable forms of transportation and support our essential public transit system, while making shared rides cheaper in the neighborhoods.

Inspired by the policy, this exploratory analysis attempts to unveil the different patterns between trips shared and unshared. Since the COVID-19 outbreak significantly disrupted established travel patterns, the analysis is based on a subset of the trip records occurred on February 13, 2020, a Monday before any pandemic measurement was in place. This post is the continuation of a previous study which investigated a different subset of the same dataset to show the spatial correlation between ridership and neighborhood socio-economic status.

Photo by Thought Catalog on Unsplash

This post is written for the course Urban Informatics taught by Professor Boyeong Hong at the Department of Urban Planning at Columbia Graduate School of Architecture, Planning and Preservation.

The body of this post has two parts:

  1. Descriptive Statistics
    This part uses basic statistics and spatial analysis to compare patterns across shared trips and all trips.
  2. Mini Network Analysis
    The simple analysis of popular origins and destinations is designed to help inform areas of congestion. Important information such as node degree and centrality are used as indicators for traffic patterns. To build a network based on trip records, the city area is translated into an approximate vertex grid of 0.6 mile by 0.6 mile. Coordinates for origins and destinations are rounded to 2 decimal points to form vertices, between which the number of trips travelled are counted as weights. A random point outside the city is chosen to represent all origins or destinations beyond the city boundary.

Part I: Descriptive Statistics

On Monday, February 10, 2020, Transportation Network Providers such as Uber and Lift delivered 246,152 trips in Chicago. Among them, 429,37 trips, roughly 17.4%, are authorized to be shared with other riders. The actual pooled trips with a number of 24,804 took up only 10.1%.

According to the median statistics, an “average” trip on this day traveled 4.45 miles and lasted for 14 minutes 48 seconds. For a shared trip, the median statistics are higher: 7.5 miles and 20 minutes 15 seconds.

Left: All trips statistics; Right: Shared trips statistics. Image provided by the author.

Shared trips usually travel longer and farther. The majority of all trips take 5–15 minutes, but the length of most shared trips range from 10–20 minutes. In terms of distance, the majority of all trips travelled less than 5 miles. For shared trips, a significant proportion of them travelled between 5 and 10 miles.

Left: All trips statistics; Right: Shared trips statistics. Image provided by the author.

Below is the breakdown of the number of ride shares by hour:

During the day, both the total trip number and the percentage of shared trips correlate well with the weekday business hours. However, the willingness to share trips was overall climbing.

Beyond work hours, number of trips dropped steadily until late night. Beginning at 8pm, actual shared trips increased simultaneously with the willingness to share trips.

Willingness to share ridership was highest during the late night period of 0–3am, a time when the number of trips made was the fewest. At 2am, percentage of trips authorized to share reached 23.7%, the highest percentage of the day. During the same hour, however, only 2096 trips were accomplished, the second smallest number in the day, only after the 3am number of 1907. Despite the strong will, only 8.5% of the trips in this hour was actually shared. This gap might be explained by the scarcity of both the drivers and riders, making it difficult for the realization of ride sharing.

Image provided by the author

Ridership also differs spatially. The choropleth multiples on the left show the number of pickups and drop-offs by city neighborhoods and compare the numbers with the percentage of shared trips. All maps are colored by quintiles.

The comparison reveals that TNP services were most frequently requested in neighborhoods in the north, where the percentage of trips shared were the least. On the other hand, most shared trips occurred in neighborhoods in the south and in the west.

The spatial pattern corresponds to the distribution of neighborhood socioeconomic status. The aforementioned 2019 report report produced similar findings, which state that shared trips “are more likely to be requested” where household income is below area median than where household income is higher.

Part II: Mini Network Analysis

Here is the top 10 most travelled-to destinations (left) and travelled-from origins (right). While the ranks are different, the nodes included are exactly the same. Other than the node 141 which is located in the O’Hare neighborhood, all other nodes are in the downtown area.

Left: Top 10 destinations; Right: Top 10 origins

Congestion in downtown districts is also confirmed by the examining of centrality measures. The charts below show top 10 locations with top degree centrality (left) and eigenvector centrality (right) score. The similarities between two charts indicate that locations with most trips are not only connected but also share large volume of traffic between each other.

Left: Degree Centrality; Right: Eigenvector Centrality

A map of the downtown area better shows the concentration of high centrality score nodes.

Left: Top 10 locations with highest degree centrality; Right: Top 10 locations with highest eigenvector centrality

COVID-19 and the Future

In major cities around the world, ride share companies are hard hit by the pandemics. Besides, the pandemic is changing people’s perception of shared space, both for the drivers and the riders. Thus it might also be interesting to look at how people use TNPs now, which might contribute to the stories about how drivers are surviving the pandemics.




Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +756K followers.

Recommended from Medium

5 Simple Techniques for Powerful Data Storytelling

Basic Algorithms — Finding the Closest Pair

Why Computational Biologists are Essential to Advancing Science

Economy Analysis

Visualizing Well Paths With The Welly Python Library

US Election 2020: Trump vs Biden on Twitter

Not All Data Heroes Wear Capes

Practical-1 |Practical-2 | Practical-3 | Practical-4 | Practical-5 | Practical-6 | Practical-7 |…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Shen Xin

Shen Xin

M.S. Candidate in Urban Planning, Columbia Graduate School of Architecture, Planning and Preservation

More from Medium

d3.js Choropleth Chart for City Data

Data Vizualisation, Art or Science ?

Facets — Data Visualization Tool for ML Datasets

Conserving Singapore’s endangered Straw-headed Bulbul through Big Data