How to make a smart city more useful

Fredrik Moeschlin
7 min readJan 16, 2020

--

This is the second article in the series of bike sharing analysis. Here we will show how data can help increase the utility in a smart city. For instance, from data we learn that with perfect availability the system would deliver 15 % more rides on an average business day — here’s money and environment to be saved!

Let’s start off with some basic facts of the bike sharing system to then go deeper and look at what makes the system useful, how users’ need or demand can be estimated and finally develop a performance indicator for utility. The main take away is to work with analytics and data science to continuously improve.

Overview of the system

As highlighted in the introduction, we get data from the current overview of Styr & Ställ. This data source may appear limited but by combining and analysing consecutive readings we can extract plenty of information, such as number of departures and arrivals for a station as well as trends. So what can we learn from the data?

  • 66 stations were online in June 2016 (72 stations in 2019)
  • 1741 bike docks in total June 2016 (1972 docks in 2019)
  • Station sizes vary from 15 to 60 docks per station
  • During nights most bikes are placed in the docks (few riding) and from this we can learn that there are around 900 bikes in the system (52 % of the number of docks, year 2016)
  • By tracking when the number of available bikes on a station drops or increases, we can estimate when bikes are rented or returned. There appear to be between 3000 and 4000 rides per work day carried out with Styr & Ställ (spring 2016).

This work of deriving information from the snapshot data may appear cumbersome but working with large sets of “simple data” can open for many new possibilities.

Data science skill #2: data wrangling

How empty is an empty station?

The perfect bike sharing system would always have bikes available to hire as well as free docks to return bikes to. Reality is somewhat different and stations often become full or empty. The problem, when we look in the data, is that we do not know for sure if the station is entirely empty or full. For instance, a broken bike often remains in an “empty” station (perhaps marked by a turned saddle). Or the last free dock is actually out of service.

To cope with this we can simply define “full” as a station having maximum one free dock, and “empty” as a station having maximum one bike available.The implications of this rough metric can be discussed, but at least we can now estimate how often stations are (nearly) empty or full.

In the plot above we see an overview of some selected stations showing how often they are empty (x axis) and full (y axis). The size indicates the number of bike docks. (Click for an interactive Swedish version, “tom” = empty.) From this we learn that some stations are full and empty during the day. An example is Drottningtorget, next to the central station. Below we see an average daily chart for this station (average work day during 2016). The morning rush often empties the station while people returning in the afternoon often fills it up full.

Frequency of full/empty for the station Drottningtorget, over time of day (average work day 2016)

A combined overview of full and empty stations is given the plots below. It gives some insights into how redistribution of bikes is carried out. We will come back to this topic in the next article.

Two alternative visualization of the daily variation in full and empty station — which makes more sense?

Data science skill #3, data visualization

Demand — when do we want to ride?

Naturally, empty or full stations is not a good thing. But who cares during night time? In rush hours a full station can be really frustrating if several want to drop of their bikes. Thus, we need to relate availability to demand. Now it is hard to tell from data if someone walked up to the empty station and got disappointed. However, we can use information from similar days where the station was not empty to estimate demand. For instance, assume that people in average rent 10 bikes from a certain station on Saturdays between 10:00 and 11:00 in the morning. Then, on a Saturday morning where the station is empty, we can assume that demand was still 10 bikes for this hour.

To simplify we assume that demand of bikes or free docks depends on hour of day and whether it’s a weekend or business day. We then estimate the demand per station using historical data where the station was not (nearly) full or empty. By adding up the statistics from the individual stations we get the system total demand and availability, as in the plot below. Dark blue shows the number of bikes rented and light blue the total demand, given that all stations had available bikes. With perfect availability the system could deliver around 15 % more rides on an average business day. And many would be saved the pain of arriving to a full station.

Data science skill #4: analytical reasoning

How about system utility?

To better understand the availability we can study what utility the system provides. With a utility function we can describe how happy a user becomes when about to rent or return a bike. Say that there would always be available bikes and free docks, then the utility can be defined to 1 (or 100 %). But what is worse, empty or full stations? An empty station is frustrating and leads to less rides (there are alternatives in taxis, walking or public transport). A full station however, causes another problem. A user who arrives with a bike to a full station is probably close to the destination but has a bike that cannot be returned. To get rid of the bike a new station with free docks has to be found, ridden to and followed by a walk or travel back to the actual destination. Therefore, I believe it makes sense to set the utility to 0 for a bike demanded from an empty station and to -1 if a station is full when one wants to return a bike.

Improving the system with a utility function

To estimate the total system utility we need to aggregate single events. From the data we derive the number of successful rentals and returns (+1 in utility). The demand previously estimated helps us to calculate the disappointments in empty stations (0 utility) and full (-1 utility). We aggregate the average utility per hour to get the system metric as in the plot below. The purpose of this performance indicator is not to give an absolute metric of the bike sharing system. Rather, we get a relative number to study over time, relate to and use to raise questions. Sort of an indicator how smart the system is and how it develops. Why are some days better than other? What affects the utility? How can we improve the bike sharing system?

Given an average system utility of 70 % it is clear that there’s potential for improvement. But there is no positive trend. From the demand analysis we saw that 15 % more rides would be made possible if no stations were empty. Now we add to that the frustration from full stations and if this is in the same region a total deficit of 30 % seems reasonable. Number of rentals are typically high on business days and at certain peaks we can see how the overall utility drops. The redistribution of bikes with trucks is focused on business days which could explain the lower utility on weekends. The drop on June 24 is most likely a direct cause of happy Swedes celebrating Midsummer, rendering chaos also in the bike sharing system.

Time to make bike sharing more data driven?

These analyses could be made a lot more precise given direct access to historical bike rides from the operator. However we still see that the system has a large potential for improvement. These analysis were carried out as an example of how a complex system can be addressed and understood with data science. With the means of data collection, derivation, visualization and modelling, we have assessed new aspects of the system. By learning from data you can continuously improve and develop. Le us hope that the system starts to improve so that more utility and bike riding is enable for the people of Gothenburg!

Data science skill #5: continuous improvement

--

--

Fredrik Moeschlin

Innovating on data wherever available — for new insights and a better society.