Recommending New Locations for Vélib’ Stations in Paris

Lara Ramirez
5 min readFeb 19, 2019

This is an open project simulation for the IBM Coursera Course in DataScience, even though all the data presented is real.

Introduction

Vélib’ is a large-scale bicycle sharing system in Paris.

Launched in July 2007, it was wildly successful and set a great example for the rest of the world to follow. It is now the largest bike share outside of China, with around 20,000 bikes spread over 1,200 stations, and a daily ridership of about 86,000 people.

The name Vélib’ is a portmanteau of the French words vélo (“bicycle”) and liberté (“freedom”). The service aims to further the development of new forms of travel across the region that operate alongside existing transport options, especially polluting ones.

Beyond operating a good service, the most important aspect of the operation is choosing where to place the stations where people can hire and return the bikes, in order to maximise the chances that they will choose this type of transport.

The Paris Mayor has received a budget for 10 new stations to place in the city in 2019. His aim is to reduce areas that are commercially dense but don’t have a nearby station. He has asked for a recommendation of areas to target.

Data

Methodology

1 — Mapping existing stations

I cleaned up the Vélib’ stations database to keep only the coordinates of each station:

Using this and the Folium library, I placed a 100-meter radius circle around each existing station, to visualize areas that we can consider as sufficiently covered:

Seeing that the data also included stations outside of Paris walls, I added the borough by postcode to the stations data using the Google Geocoding API:

Then used the borough as a filter to keep only Parisian locations (postcode starting with 75). This allowed me to get a new visualisation of Parisian stations only:

Using the Paris boroughs dabase I added a neutral choropleth layer to better visualize the limits of the boroughs and of Paris itself, to double-check I didn’t have any corrupt data:

I individually removed the north-western outlier, to give me a clean final map:

2 — Mapping venue concentration

I cleaned up the Paris boroughs database to keep only the postcode, perimeter and surface:

Using to the Google Geocoding API, I added coordinates corresponding to each postcode:

Using this and the Foursquare Places API, I built a separate database of nearby venues from the centre of each borough. I used the perimeter as radius and gathered the maximum of results which is 50 per request, giving me 1000 results total. For each result, I recorded the venue name, its coordinates and the category it belongs to:

I had 89 unique categories in my resulting database, some of which could be regrouped. With some manual feature engineering I was able to reduce these to 14 main categories:

Mapping all venues as dots, I then created a heatmap layer to visualize concentration areas. Each dot is clickable to reveal the venue category for exploration.

Adding our previous station areas onto this, I had a great visual tool to spot potential gaps between areas covered and commercially dense zones:

Results

With a zoom on the reddest areas of the final map, it was possible to spot the 10 most commercially dense zones that were not sufficiently covered by Vélib’ stations:

Discussion

Manual analysis of the data represented revealed an issue with the quality of the Foursquare data. Some important venues were omitted, and the limit of 50 results per requests — compared to the number of venues in Paris — has greatly skewed this analysis. The fact that the Foursquare data is crowd-sourced also questions its reliability. The use of a different API such as Google’s Places would surely offer a much better analysis.

Conclusion

This visual analysis offers a valuable overview of which zones are covered by a 100-meter radius around each Vélib’ station, but the quality and limits of the Foursquare data greatly impact the accuracy of the heatmap for analysis of commercial density. As this is also an important factor for the Mayor’s decision, further analysis with a different data source is recommended.

--

--

Lara Ramirez

Science and creativity lover pursuing stimulating things. Neurosciences / Physique Quantique / Philosophie