DC Bikeshare: Where do they all go?
Finding the compass bearing of 3.1 million bikeshare rides with R
I wrote about when people use DC Bikeshare a few weeks ago and was still curious about the data. The results from the heatmap were obvious: rides spiked during rush hour in the morning and afternoon. After finishing the post, I started to think about the direction of each trip and the direction of these trips broken up at each hour of the day. Warning: Having innate knowledge of DC locations will help understanding this post.
There are a million and one things to do (and have been done) with bikeshare data but I had never seen it quite like how I imagined it in my head. In my mind, I saw a flow of bikes heading to the offices of Dupont, Farragut Square and K Street areas in the morning and then out to the surrounding neighborhoods in the afternoon.
There are two questions I want to answer in this post:
- For every hour of the day, what is the average direction bikeshare users go?
- For a few select stops, which direction do bikeshare user generally go at each hour?
The dataset need some massaging to get it into shape. First, I removed rides that started and ended at the same station. There were 113,167 observations removed as I didn’t consider them true rides. Next, I needed to add GPS coordinates for the start and end of each ride. Finally, I needed to calculate the bearing from the starting station to the ending station of each ride.
The latitudes and longitudes of each bikeshare location came thanks to District Department of Transportation. I then summarized the large dataset to show me the mean starting and ending point for each hour. As always, the dplyr library was a lifesaver here. To get the visualization I wanted in the end, I needed both datasets (ride by ride and hour by hour) as explained later in this post.
With the starting and ending GPS coordinates in both datasets, calculating the bearing of each observation was simple via a function I lifted from the fossil package. I’m no expert on geospatial analysis so this saved me some major headaches.
However, to find the average bearing for all of these rides, I couldn’t simply take the mean of the bearing of each observation. A compass bearing is between 0 and 359.9 degrees. Imagine a compass with North at 0 degrees, East at 90, South at 180, and West at 270. For example, say most of the rides are generally heading North. The bearings will be a mix of 350 to 360 degrees and 0 to 10 degrees. The mean of these rides would then be around 180 degrees (South) and give us the exact wrong answer (Math version: the mean of a ride at 350 degrees and a ride at 10 degrees would be (350 + 10) / 2 = 180). Therefore, I had to find the mean starting coordinates and the mean ending coordinates for each hour and then calculate the bearing of that one route.
For every hour, the mean bearing of the average bikeshare ride in DC is boringly presented in a table here. To the good stuff…
Visualizing Bike Direction
Time to plot it! To make the visualization I imagined, I needed to use both datasets, the full 3.1 million observation dataset and the hour by hour dataset. The first dataset would show a spray of directions from all the rides individually and the second would show the bearing of the average ride in that hour chunk.
In my head, I see a compass with all of the rides transparent in the background and an arrow swinging left to right depending on the hour and the average direction. The ggplot package has a coord_polar function to turn a histogram in to a circular plot. After tinkering with the theme, I was able to make it look like a compass!
I created 24 images of each of these graphs, one for each hour of the day, then put them into a GIF making machine, and voila!
Some notes on the GIFs:
- The gray bars represent the number of rides taken in that hour in that direction. The longest length represents the most amount of rides in that direction for that hour. Therefore, you cannot compare the height of the bars from one hour to the next. I inserted the number of rides taken in that hour to provide some context.
- This GIF does not represent the distance of each ride in any way. Just the direction.
I decided to break down the GIF above to see where rides go station by station. I chose them based on where they were in DC (and the one in Mt. Pleasant because I used to live there).
As you can see, the direction most rides generally take can be drastically different depending on where that station is in DC. Again, this makes sense intuitively. If you start on the eastern side of DC, you will likely head west as more activity is that direction and there are more stations to dock your bike into.
I also noticed that there are a lot of rides heading directly West, East or South. As DC is a grid city, it seems many bike rides start and end on the same street. This makes sense. It also makes sense that certain stations, like Union Station, feed into rides heading mostly west (toward K street, Dupont, etc.).
I have a few reasons why directly North is not as common of a bearing. First, I removed all rides that started and ended at the same station. Those rides would have produced a bearing of 0 degrees, or directly North. However, there should still remain rides that travel directly North, say up 14th St. NW. My theory is that the further North you go, the more uphill the rides become. Being heavy bikes, uphill rides are not popular.
Conclusions and Next Steps
Looking at the map of bikeshare stations (and of DC in general), the majority of the stations and people are in NW. To commute to work from NW, you would generally head South. To go home, you would generally head North.
Looking deeper and the genesis of another post, I want to see the distance of each ride from each station in addition to the bearing. This will take me to the ggmap package, which I have only begun to tinker with.
As always, I would love to hear what you think. If you think I missed a great station to look at, give me a shout and I’d love to make it and see. Or if you want to see your local stop, I can help with that too!