A story about the New York City Bike Share System rebalancing.
It is widely accepted that open data has the potential to revolutionize urban transport. Open data initiatives such as Citi Bike provide a source of real spatiotemporal measurements that can be visualized in numerous ways. A good visualization has the power to educate and persuade the public by telling a story that is both compelling and verifiable.
Data may contain layers of information that are not readily apparent. Uncovering hidden data can give insight into current problems affecting the city. One of the most intractable problems is the issue of redistributing — or rebalancing — bicycles among stations in New York so that they are neither full nor empty and the system functions smoothly. The need for rebalancing arises from uneven bike flows, following a pattern in which residential zones receive the most bikes in the evening and commercial zones receive the most bikes at the start of the workday. Near the main transport hubs, high turnover rates of bikes require that stations be constantly resupplied.
Sometimes, uneven distribution is caused by topography, such as in hilly Barcelona, where bikes tend to accumulate downtown — the lowest part of the city. Most of the time though, asymmetrical distribution is a matter of traffic. Bikes are subject to rush hour and move to certain areas all at once. As a result, rebalancing is a complex task of not only making sure that bikes are available, but also making sure that there are enough empty docks to for bikes to park in.
An unbalanced system means an unreliable form of transportation and unhappy customers. In order to rebalance bikes, Citi Bike employs a team of small 3-bike trailers (during rush hour) as well as larger capacity trucks that move from fuller to emptier stations or to stations where high demand is anticipated. By all accounts, Citi Bike aspires to become a mode of transportation on par with the bus and subway systems, which means that reliability is a vital concern.
The object of our study was threefold: firstly, to visualize the time of transfer, origins, and destinations of rebalanced bicycles and compare this with the paths and times of bike trips, secondly, to cluster stations by average hourly availability, and thirdly, to create profiles for each station. For this project, only weekdays were considered as this is arguably when bikeshare operations are really put to the test. When clicked on the map, each station exhibits a unique profile of trip destinations, bike deliveries via rebalancing, and hourly availability.
This project is based on a combination of Citi Bike trip data and Open Bus availability data. Trip data is published monthly in CSV format on Citi Bike’s website. In light of the fact that here are tens of millions of records, we’ve designed an interactive tool for data exploration. To visualize this huge amount of data we used Mapbox GL JS API for the map and D3.js for the graphs visualizations.
There are two modes. The first mode shows routes of trips and rebalanced bicycles and the popularity of those routes.
Firstly you can view trip routes and compare this to rebalanced bicycle routes. These routes were approximated by the Open Source Routing Machine (OSRM) — an open source routing engine for shortest paths in road networks that combines sophisticated routing algorithms with the open and free road network data of OpenStreetMap.
How were rebalancing trips extracted from trip data? First, data was sorted by bikeid. We see that some trips started at stations not the same as where the previous trip ended, which indicates the bike was moved. We know that rebalancing must have occurred in a window of time: between the end time of its previous trip and the start time of the subsequent trip.
The time that a station is rebalanced is unknown to us because this data is not available, or at least it has not been made public. But we do know several things for sure: the time that any given bike was dropped off and the time that it was picked up. We also know that bikes are moved in batches, however we do not know how many are moved at one time.
The problem is that the window of time in which a bike was moved can be narrow or it can be very wide, ranging from a matter of minutes to several days. We decided to estimate the time at which a bike was moved as the mid-time, or the midpoint in between which a bike was dropped off and picked up again. Of course there is a margin of error, but averaging large amounts of data increases the probability that the average mid-time falls somewhat near to the actual time the transfer occurred.
The example above illustrates the transfer of a single bike. we can see that the bike was dropped off at the red station at 8 pm, moved, and then picked up at the green station at 9 pm. That means it was moved at some point between 8 and 9 pm. Because it was picked up from a different station than where it was dropped off, we know that the bike was moved by the Citi Bike operations team. The bike was moved from the red station to the green station at some point between 8 and 9 pm.
Bike availability is part and parcel of the rebalancing issue. The least desirable status of a station is completely empty or completely full, or in other words, no bikes to take and no place to park. Availability data was obtained from The Open Bus project, which aggregates Citi Bike’s JSON feed, capturing data at every station at approximately 10-minute intervals.
In order to identify clusters of stations exhibiting the same behavior in terms of availability, the K-means algorithm was applied using the NbClust package in R. K-means is an unsupervised learning method and was chosen because it is a simple algorithm that works well with large datasets. Each station provided 24 input variables, each representing the average availability factor (available bikes / total number of docks) per hourly interval. K = 3 as three clusters were recommended by 14 out of 26 criteria.
Cluster 1: Profile: Low — Low — Low
These stations have relatively low availability and tend to be about 30% full on average throughout the day.
Cluster 2: Profile: Low — High — Low
These stations have lower availability in the early morning and night than during the day between 9 am and 4 pm when they are on average 50% full.
Cluster 3: Profile: High — Low — High
These types of stations have higher availability in the early morning and at night when they average more than 60% full. Between 8 AM and 6 PM these stations typically are less than 40% full.
See the different colors for the stations?
There are three different colors which roughly correspond to three different modes of behavior (clusters). Some stations consistently exhibit higher or lower availability at specific times of the day, reflecting the demand of that station and, by extension, the flow of bicyclists into or out of the surrounding area.
Why do stations exhibit different types of behavior. As mentioned above, bike flows resemble traffic patterns. Let’s take stations in the Lower East Side and East Village — neighborhoods with a large proportion of residential and mixed commercial/residential areas by land use, known for nightlife and live music venues. Stations in these neighborhoods tend to be full at night, emptying out between 7 and 9 am and filling up again between 6 and 8 pm. The same pattern can be seen in the Upper West Side, which is primarily residential by land use. The opposite pattern is observed in stations located in Lower Manhattan and in SoHo, Tribeca, and around the Civic Center — neighborhoods that have a significant proportion of office and commercial buildings according to the Department of City Planning. All stations in these neighborhoods exhibit a similar pattern — low availability at night and high availability during the working hours from 8 am to 5 pm. The third pattern that stations can exhibit is relatively low availability at all times. Stations located on Broadway in Midtown South, stations near the Eastern border of Central Park, and as well as a large number of peripheral stations in Brooklyn (Bedford and Greenpoint) demonstrate this pattern of behavior.
It may be obvious that during the day, the business district fills up, residential areas empty out, and the stations near tourist destinations like Central Park are almost always low on bikes. But station behavior is also important because it give Citi Bike subscribers a general idea about what to expect from a station. For residents of Brooklyn that use stations near the entrance to the Manhattan bridge, for example, it is helpful to know that the general pattern of the stations nearest to them is high availability in the afternoon and low availability in the morning and at night.
What we found was that 8 out of the 10 highest demand station do not figure into the top ten stations that received bikes via rebalancing.
The stations that received the most bikes via rebalancing in 2015:
- 521–8 Ave & W 31 St (Chelsea) — 22,763 bikes
- 529 — W 42 St & 8 Ave (Clinton) — 18,892 bikes
- 511 — E 14 St & Avenue B (Stuyvesant Town) — 16,521 bikes
- 432 — E 7 St & Avenue A (East Village) — 13,312 bikes
- 517 — Pershing Square South (Murray Hill) — 12,059 bikes
- 519 — Pershing Square North (Murray Hill) — 11,257 bikes
- 356 — Bialystoker Pl & Delancey St (Lower East Side) — 7,648 bikes
- 520 — W 52 St & 5 Ave (Midtown) — 7,524 bikes
- 3230 — Penn Station Valet* (Midtown) — 7,089 bikes
- 445 — E 10 St & Avenue A (East Village) — 6,795 bikes
* Penn Station Valet is a depot for bikes and actually has a storage capacity that is much larger than the official number of docks — 7.
Top ten highest demand stations based on the amount of bikes taken:
- 519 — Pershing Square North (Murray Hill) — 104,813 trips
- 521–8 Ave & W 31 St (Chelsea) — 100,796 trips
- 293 — Lafayette St & E 8 St (West Village) — 95,890 trips
- 435 — W 21 St & 6 Ave (Flatiron) — 87,149 trips
- 497 — E 17 St & Broadway (Union Square) — 86,108 trips
- 426 — West St & Chambers St (Lower Manhattan) — 79,061 trips
- 285 — Broadway & E 14 St (West Village) — 73,682 trips
- 151 — Cleveland Pl & Spring St (Nolita) — 70,374 trips
- 284 — Greenwich Ave & 8 Ave (Greenwich Village) — 67,950 trips
- 402 — Broadway & E 22 St (Flatiron) — 67,497 trips
The ten most frequently occurring pairs of stations for rebalancing bikes were:
- 432 E 7 St & Avenue A (East Village) > 511 E 14 St & Avenue B (East Village): 4107 bikes
- 477 W 41 St & 8 Ave > 500 W 52 St & 5 Ave : 4097 bikes
- 359 E 47 St & Park Ave > 519 Pershing Square N : 3292 bikes
- 520 W 52 St & 5 Ave > 529 W 42 St & 8 Ave : 3057 bikes
- 492 W 33 St & 7 Ave > 519 Pershing Square N: 2651 bikes
- 511 Е 14 St & Avenue B > 432 E 7 St & Avenue A: 2628 bikes
- 329 Greenwich St & N Moore St > 363 West Thames St: 2621: bikes
- 520 W 52 St & 5 Ave > 449 52 St & 9 Ave: 2563 bikes
- 352 W 56 St & 6 Ave > 468 Broadway & W 55 St: 2233 bikes
- 415 Pearl St & Hanover Square > 432 E 7 St & Avenue A: 1942 bikes
To explore this huge amount of data we have published interactive tool at: urbica.co/citibike
This work is based on the MSc thesis «A geospatial analysis of bike share redistribution in New York City» by Alexander Tedeschi. Alex is a geographer that specializes in Russia, Eastern Europe, and Central Asia, and has recently graduated form the University of Lisbon with a degree in Geospatial Technologies. As an urbanist and avid cyclist, he would like to see cities become more liveable.
Urbica is a design and urban analytics firm based in Moscow. Urbica specializes in information design, user interfaces and data analysis. We are focused on human experience design around cities.
Thanks for reading. If you liked this project please share it!