Exploring open data from Oslo City Bike
A city full of blue and white bikes is a sure sign that spring has arrived. Carefree hipsters as well as parents running late for work are observed riding through the streets of Oslo.
I recently discovered this data and got really excited, and during Easter break, I spent some hours exploring and visualizing some of the bike rides. Here are the results:
First off, I wanted to get some data onto a map in order to see the geographical extent and distribution of trips. CartoDB makes it easy to put points on a map, and animate those over time.
However, since the historical data only included the station ID and not the coordinates, I needed to add the station coordinates to each trip. I wrote a Python script that looped through all the trips and added the start and end coordinates. In order to keep the dataset manageable I limited the data to July 2016.
On this map you can see a blue dot for each new bike loan; more bike loans in one location result in a darker color:
We get a feeling for which stations are most used, and see the cyclic pulse of day-night-day. But the map doesn’t say anything about where people travel.
Where do people travel?
The data contains information on both start and end stations, but obviously not the actual path the traveller has taken. Still, by visualizing travels between stations as lines, I hoped to get some overview of how people travel across the city.
I made some adjustments to the python script, so that it exports each trip as a line feature in a GeoJSON file. I could then import this file into CartoDB, and draw each line with low opacity so as to see frequently used lines as more intense. This is the result:
The map above includes rides from just a couple of days in the beginning of July, since including all trips from the whole month resulted in too many lines for the transparency effect to work. What’s missing here is the time dimension from the previous graph. But unfortunately, there’s no functionality for animating lines in CartoDB.
When do people travel between which stations?
Qgis is an open source project and software for working with geographical data. It’s not as easy to use as CartoDB, but has lots of advanced features. I found a plugin for Qgis called TimeManager, which I could use to show the trip data animated over time.
I rendered one image for every thirty seconds for the duration of a day, for trips starting on Friday July 1. The snapshots start at 06:00. when the stations open, and end at 01:30, when most of the bikes have been returned. More lines result in brighter color, and a bit of After Effect glow helps indicating the most busy areas.
In order to more easily compare different times of the day, I compiled snapshots from every 30 minutes of the day:
It is striking to see how the map lights up as people go to work between 8:00 and 9:00, and then head home for an early weekend at 13:30. Around 20:30 in the evening, there seems to be quite a bit of activity on Grünerløkka. Not so strange for a Friday night in July.
Any differences between weekdays and weekends?
I suspected that there might be some differences between weekdays and weekends. This turned out to be quite right, when looking at what time of day people start biking. The graph below shows how many trips started at which time of the day, for all days in July. The blue lines represent weekdays, and pink represent weekends.
As you can see, the variation between days is quite big, but in general there are more trips during the weekdays than the weekends.
However, if we look at the average duration of trips, it is clear that people spent more time with the bikes during the weekends than during weekdays in July 2016:
For how long do people keep the bike?
Looking at average data as in the graph above can be interesting, but it gets even more interesting if we start looking at the distribution of trips by duration. The graph below shows how many trips lasted between 0–1 minutes, 1–2 minutes, 2–3 minutes etc.
As you can see, a majority of the trips lasts between 4 and 10 minutes. What’s more surprising is to see that many trips lasted less than 1 minute. More on that later.
Flow between stations
One hypothesis I had was that some stations would be used mostly as start stations, while others would be used more frequently as end stations. In addition, I wanted to see the “flow” of bikes going between each station.
I discovered a tool I could use for creating chord diagrams, called Circos, and an online version of the tool where I could simply upload my data file and download the graph in vector format, so that I could easily make some visual adjustments.
Since there is a limit to how many stations I could include, I chose to include the 10 stations with highest difference between started and ended trips; 5 in favor of start and 5 in favor of end.
I saved different versions of the graph as images and put them in a Marvel prototype, so that you can click on different stations and easily see the incoming and outgoing flow of bikes. Note that the bikes that are transported by Oslo City Bike’s cars are not included here.
For example, have a look at Kirkeristen, the station at the bottom of the graph. You can see that there are a lot more trips that end there (all the color bands coming in at the left side) than the ones that start there (the green color bands going out on the right side). If you follow the paths between Alexander Kiellands plass and Kirkeristen, you can see that there were lots of trips going to Kirkeristen (a total of 348), but less than half of that (140 trips) went in the other direction.
If you studied the diagram for a bit, you might have noticed that there are a lot of trips that end in the same station where they started. In this graph I have highlighted those trips only:
For example, almost half of the bikes lent out from Vaterlandsparken were returned to the same station (549 out of 1190). Why is this? Do people in Grønland simply want to go for a ride and return the bike to the same station?
Probably not. If we look at the duration for the trips that started and ended at Vaterlandsparken station, this is the distribution we get:
Almost half of the bikes were returned before 2 minutes had passed. A likely explanation is that the user wasn’t happy with the bike s/he got, and returned it in order to get a new one.
More open data!
There is much more to be explored in the Oslo City Bike data. One obvious direction is to compare the patterns of bike rentals to weather reports, another idea would be to look more into the distances, elevation differences and travel speed between stations.
After I did these experiments I discovered an in-depth analysis of the NYC Bike Share System by Todd W. Schneider. Interesting to see that some of the patterns are similar. He also combines the bike data with probable travel directions, which could be interesting to do for Oslo as well.
Lastly, a big kudos to Oslo City Bike for providing this data in an easy-to-use format. Geeks like myself are eager to explore open data like these. I hope more organizations and companies see the benefits of making their data available to the public.