Bike sharing & data science in practice

5 min readJan 16, 2020

This article is the first in a series where we will see how data science can practically help us build smart cities. We will assess a running public bike sharing system and see how it can be improved. I originally posted these analyses in Swedish back in 2016 on the Knowit Decision blog. Since then, many have ask me for an update — how did the system evolve? Did the operators learn anything and improve? Now the system will be closed down (January 2020) and replaced with a new. To me, this is an excellent opportunity to share this rework in English with you, together with an epilogue of what happened afterwards. Perhaps the City of Gothenburg could finally become a bit smarter?

Skip the intro and jump straight to the analysis and insights?

Optimizing utility in a Smart City
What makes a neighbourhood popular for bike sharing? (to be published)
How to optimize logistics for bike sharing (to be published)

Bike sharing causes feelings

In Gothenburg, Sweden, bike riding has long been a hot topic. The city constantly invests in new bike lanes and there is a bike sharing system, Styr & Ställ, which allows users to rent a bike from one station, ride for free during 30 minutes and return the bike to another station (or pay for longer rides). The bike sharing is operated by an advertising company (JC Decaux) in return for valuable publishing space worth millions. During 2016, there was a discussion on how to extend or renew the contract and this sparked a debate on how functional and price worthy the system really is.

Debate without facts

What surprised me was that the debate around a fully digital bike sharing system barely included any hard facts. Instead some claimed that the system was “free for the city” while others thought it was “the world’s most expensive bike sharing system”. The operator replied that “the bikes do not cost a single crown” [Swedish currency, SEK] and that “97 % of the users would recommend the system”. I’m not certain what alternatives the users would see (this was long before electric scooters came into town) but with annually more than 100 days of rain, I can warmly recommend any visitors to Gothenburg to try tram and bus for much more than 3 % of their rides. In any case, where are the facts? Hasn’t anyone analysed the system? How does it perform?

Diesel trucks enable the bikes ..?

You don’t need to be a data scientist to understand the business model: by delivering bike sharing, the operator gets free publishing space from the city, and the city provides the public as well as tourists with next-to-free bikes while the actual “costs” are efficiently hidden. (Missing income for advertisement space is also a cost.) But in practice, delivering a system for shared mobility isn’t always that easy. What if users prefer rolling downhill (e.g. from Chalmers University into town) but not necessarily riding the bikes up again? Already in 2012 a TNS Sifo poll noted that “some stations are often empty when bikes are requested or full when a user comes to return a bike”. The operator tries to handle this by redistribution where diesel trucks move the bikes between stations. But how well does this work? What is the availability of the system? Looking at the system as a common cost for the city, wouldn’t we expect the redistribution to be carried out as efficiently as possible?

So, how well does the Styr & Ställ really work?

A bike sharing system is an excellent topic for data science! Thousands of bikes generate data as they are daily checked out and returned. The user behaviour is affected by multiple factors such as station placement (geography), bike redistribution (logistics), events in the city (calendar), weather etc. And last but not least, data is publicly available. Therefore, we shall now demonstrate the power of data science and use it to scrutinize the bike sharing system. This series is divided into the following sections:

Optimizing utility in a Smart City — are there bikes and free stands when requested?
What makes a neighbourhood popular for bike sharing? — where should we have more stations? (to be published)
How to optimize logistics for bike sharing— are bikes moved around to generate the highest utility? (to be published)

Collecting Data Scientist Abilities

Along the way we will also highlight some abilities that can be relevant for becoming a data scientist. Let’s start with the first one.

How can we get hold of data?

Unfortunately the City of Gothenburg does not provide a data repository with historical bike rides from Styr & Ställ to connect to. It appears as if the only information available is a map showing the stations and the current status, as in the operators App AllBikesNow, as in this animated picture.

Data draught is the nightmare of an aspiring data scientist. So what now? It turns out there is an API to get the current situation for all stations at the Gothenburg Open Data platform, providing JSON/XML updates. I compiled a quick and dirty scraper with Scrapy that continuously logs the current status. Thereby, we will also capture transitions in the stations, showing us the dynamics of the system. (I have had this scraper running for over three years now, collecting a few updates …)

Data Science Ability #1 — data collection (and web scraping)

Disclaimer: data (science) incompleteness

We may have a hard time answering the question if users are satisfied or not from these data only. Satisfaction will of course depend on price but also on how easy the stations are to operate, the city infrastructure, weather, daily mood and many other factors. But the least we can require is that the system has the best prerequisites and is being continuously monitored, reviewed and improved. Who’s paying isn’t our primary focus. The analysis question is: how much utility could the investments generate?

Special thanks to Håkan Alsén and the expert team at Knowit Decision in Gothenburg for enabling and supporting these analyses!