Predicting bike availability at bike share stations in Toronto

Catherine Yeh
AI4SM
Published in
15 min readOct 26, 2023

By Andrea Haw, Catherine Yeh and Angela Zhuang as part of course project of ECE1724H: Bio-inspired Algorithms for Smart Mobility. Dr. Alaa Khamis, University of Toronto, 2023.

Abstract

With traffic congestion plaguing Toronto and a growing emphasis on sustainable transportation, bike sharing has emerged as a popular mode of transit. Toronto’s bike sharing program witnessed a record 4.6 million trips in 2022, indicating a significant shift towards eco-friendly mobility. To further optimize this system and address challenges such as bike availability prediction, we propose a robust prediction model leveraging machine learning algorithms and graph networks. By utilizing Bike Share Toronto Ridership Data sourced from the City of Toronto’s Open Data Portal, we aim to predict bike availability across stations. Our approach involves modeling the bikeshare system as a graph, employing a Graph Attention Network (GAT) with multi-headed attention, followed by LSTM layers to incorporate temporal features. The GNN solution demonstrates superior scalability and the ability to encode information about various stations within the bikeshare network, offering a comprehensive and effective method for analyzing and predicting trends across the entire bikeshare system. These findings have significant implications for improving trip planning, preventing shortages, and enhancing the overall efficiency of sustainable transportation in Toronto.

Introduction

Why drive through traffic downtown when you could bike instead? With traffic congestion being one of the biggest pain points for people living in Toronto and sustainable transportation methods being encouraged to protect the environment, more and more people are turning to bike sharing as a way to get around. Toronto’s bike sharing program had a record 4.6 million trips in 2022 and the number of memberships sold continues to increase as more users take up the bike lanes [1]. Not only has the system flourished for sustainability and wellness reasons, but recent technology has made the process of locating, unlocking, riding, and returning a bike much easier.

To further enhance the system and mobility solutions in Toronto in general, we are interested in predicting the bike availability across stations in Toronto. If done well, there are several use cases that could follow. Such predictions could help facilitate the trip planning for bike riders or prevent shortages from occurring at stations by informing strategies to balance out the number of bikes held per station. Ride sharing companies or apps like Google Maps could also benefit from more accurate forecasting of bike availability as they often provide multiple methods of transportation as options to reach an end destination.

However, this task comes with challenges, particularly in the domain of contextualization. Considering the dynamic nature of factors like time of day, traffic patterns, and nearby station activity, it may be difficult to achieve consistent and reliable results across various real-life scenarios. Thus, our goal is to develop a robust prediction model that can adapt to these changing contextual factors. We will experiment with machine learning algorithms and graph networks to solve the problem at hand.

Literature Review

Researchers have explored different methods to tackle the problem of predicting the number of available bikes at each station or performing forecasting tasks. We can group these approaches into two main categories, namely statistical analysis models and machine learning approaches.

1. Statistical Analysis Models

Time series analysis has been leveraged in the past. [5] used an Autoregressive Moving Average (ARMA) model, while [6] adopted a modified Autoregressive Integrated Moving Average (ARIMA) model by incorporating spatial and temporal information to predict the number of available bikes at each station. The advantages of using statistical models are that they are easy to understand as they are able to provide insight into the relationships between variables with mathematical equations. They are also simple and only have a few hyperparameters to tune. Yet, the simplicity of the model leads to a few drawbacks. Models such as ARMA and ARIMA are linear. This limits their ability to capture non-linear relationships. Additionally, statistical models require many assumptions to identify the distributions and relationships, which could lead to inaccurate results [7].

2. Machine Learning Approaches

Ashqar et al compared the use of random forest and least-squares boosting to model the number of available bikes at each station, where random forest outperformed least-squares boosting with a lower mean absolute error at each station [8]. Instead of predicting the exact number of bikes at each station, Dias et al framed the problem as a classification problem, where they binned the number of bikes at each station into 5 classes (full, almost full, slots and bikes available, almost empty or empty), and used random forest to solve the problem [9]. Other researchers have leveraged machine learning-based approaches to solve forecasting problems that are not specific to bike share systems, where we can draw inspiration from their approaches to solve our problem. Specifically, Woodward in [4] looks to predict traffic using a graph neural network (GNN) model composed of a graph attention network (GAT) and a recurrent neural network followed by a fully connected layer, and included both spatial and temporal features during data preprocessing. Moreover, Graph Neural Networks (GNNs) have emerged as powerful tools for analyzing spatiotemporal data, finding applications in various domains, with notable use in transportation systems. In one instance, researchers constructed a knowledge graph using GNNs to model a metro system [10]. This was achieved by leveraging historical Origin-Destination (OD) matrices, providing insights into spatiotemporal correlations within the transportation network. Despite notable achievements, the field of GNNs for spatiotemporal data in transportation is continually evolving, with ongoing advancements contributing to the refinement and expansion of its applications. The advantages of machine learning-based methods are that they can scale to handle large datasets and high-dimensional features, and are good at the generalization of unseen data, providing more accurate predictions. The drawbacks of machine learning-based models are that they often have many hyperparameters to tune, which may be a complex task to experiment with.

Problem Formulation and Modelling

To predict the availability of bike sharing in Toronto, a spatiotemporal modelling approach is adapted to leverage the spatial correlation and temporal dependencies in the bike sharing system to capture the bike availability patterns.

The bike sharing system is represented as a spatiotemporal graph, denoted as G = (V, E, A), where V represents the graph nodes, which correspond to bike stations in Toronto. E represents the edges between station pairs, capturing their spatial relationships, and A represents the adjacency matrix.

Spatial and temporal data are fused to create a feature matrix across all N nodes (bike stations) and all T (time points). This feature matrix represents the spatiotemporal characteristics of the system, representing the number of bikes available for each station during time intervals [t,tt]. For example, X(t) with n=4 at time interval [t,tt], where n is the number of features, is illustrated in Figure 1 below. In this case, t5=t +Δt.

Figure 1. Feature matrix X(t) representing historic data of bike availability during a time interval [t,t +Δt]

In considering the modelling approach for our bike availability prediction problem. We explore the adaptability of the ST-GAT model developed by [2]. ST-GAT is a spatiotemporal graph neural network that consists of Graph Attention Network (GAT) and Long Short-Term Memory (LSTM) Recurrent Neural Network [2].

Graph Attention Network (GAT) can be implemented to model the spatial correlations among bike stations. The GAT learns hidden features with attention mechanisms (Figure 2). For a given node i at time t, the GAT computes an attention-weighted sum of features from neighbouring nodes j at the same time.

Figure 2. Attention mechanism is applied to compute new matrix with hidden features. N It is the set of neighbouring nodes for node i at time t, αij​ is the attention weight for node i and neighbour j, and W represents learned weights [2].

By integrating GAT with a Long Short-Term Memory (LSTM) network, the model is able to capture the temporal dependencies in the data. The LSTM is particularly effective at handling sequential data to model the bike availability at each station as it evolves over time.

Alternative to using LSTM, we could possibly use Gated Recurrent Unit (GRU) for faster model training speed[3]

By combining the GAT’s spatial modelling capabilities with the LSTM or GRU’s temporal modelling, this model aims to capture the spatiotemporal patterns of bike demand in Toronto.

Figure 3. ST-GAT Model [2].

The mathematical formulation of the bike availability prediction is illustrated as below (Figure 4).

Figure 4. [Vt​−F+1,…,Vt​] represents the bike availability for a set of nodes (bike stations) at time t and the F previous time steps. [V′t+1,…,v′t+H] represents the predicted bike availability for the set of nodes at time t+1 to t+H. These are the future predictions of bike availability.

Problem Dataset

In this project, we will be leveraging a) Bike Share Toronto Ridership Data and b) Bikeshare Toronto to capture the bike availability at each station throughout the day. Both belong to the City of Toronto’s Open Data Portal

Dataset a) includes the trip details at each station, where Trip Id, Trip Duration, Start Station Id, Start Time, Start Station Name, End Station Id, End Time, and End Station Name are relevant to this project.

Dataset b) consists of station information, specifically station_id, name, lat, lon, and address are relevant to this project.

While this information is publicly available and ready for download, the data is not yet in the shape that is needed to solve our problem. The station dataset only contains the most recent attributes per station. In order to have time series data available for prediction, we used the ridership dataset to generate hourly snapshots of bike station availability throughout the month of March 2023.

Section 1 of the Google Colab notebook below describes our method to generate the dataset.

Exploratory Spatial Data Analysis (ESDA)

Section 2 of the Google Colab notebook includes our analyses for this section.

Station Popularity

To start, we analyzed the ridership data to explore how the busiest stations within the city. We considered starting and ending stations separately, marked by whether they were listed as the start or end point of each bike trip. Collecting the counts for each station within one month of rides, we retrieved the following 5 stations resulted as the most frequently started from. The top 5 stations where bikes were picked up from by bike share users are shown below.

Figure 5. Top 5 bike share stations for starting a trip

Similarly, we looked at the top 5 stations by drop-off count. Most of these stations seem close to commuter transportation lines, for example Union Station (connections to GO Transit and the TTC) and other intersections where subway stations are held (College and Dundas stations). This suggests that a lot of bikeshare users may be using bikes as a method to reach their next transportation connection.

Figure 6. Top 5 bike share stations for dropping off a bike at the end of a trip

Indeed, breaking down by time of day provides more insight into how the system is being used. Union Station seems to be the top station for pick up in the morning, as well as drop off in the morning and afternoon. This aligns with the thinking that bikeshares are often used by commuters to get to work, for example.

Figure 7. Top 5 starting bike stations in March by time of day

In terms of drop off station popularity by time of day, it seems that in the morning they tend to be near major intersections or subway stations, while in the evening or at night these likely would be closer to where most riders live.

Figure 8. Top 5 ending bike stations in March by time of day

Visualizing over a map

The scatter map allows us to visualize the spatial distribution of bike availability at different station across the city of Toronto. We can identify high-demand areas with low bike availability and low-demand areas with high bike availability. Plotly’s Scattermapbox function can be used to implement such visualizations.

The following map shows the average bike availability at each bike share station throughout March 2023. It is clear from the map that most stations are clustered around the downtown core; this is also apparent by observation if you have ever wandered around downtown Toronto.

Figure 9. Average bike availability at each station for the month of March 2023

Taking the average snapshot in time a step further, we can see how the number of available bikes fluctuates throughout the day, which helps to identify peak or busy hours. As shown by the changes in colour as time goes by, there are certain points within the day where ridership peaks. By analyzing historical data and taking temporal data into account, we hope to be able to make predictions about future behaviour that more closely resemble what happens in real life.

Figure 10. Video of time-series bike availability map by hour for the month of March 2023

Proposed Solution

Our proposed solution is to use Graph Neural Networks to leverage graph representations of spatiotemporal data to make predictions. The objective is to predict the next hour’s bike availability from the preceding 4 hours for all bikeshare stations. Different prediction windows will be experimented with to see how they affect model performance. We model the bikeshare system as a graph by first obtaining a distance matrix of all bikeshare stations, constructing an adjacency matrix as an input to the graph attention network (GAT).

It was trained on 590 stations at once, incorporating multi-headed attention to learn important features to bike availability. The output of this GAT was fed into two LSTM layers as a way to incorporate the temporal features into the predictions. With a batch size set at 32, the LSTM model assumes an output tensor from the GAT with a dimensionality of [4,32,590]. Lastly, we add a fully connected layer to generate the predictions of dimensionality [1,32,590], representing the model’s forecasted bike availability for the upcoming hour across the stations within the batch.

The dataset was split in a way that included discrete days of data in each subset. A roughly 80–10–10 split between training, validation, and test set was chosen, translating to 25 days for training, and 3 days each for validation and test within the month of March.

Baseline Model

To benchmark our model, we introduce an exponential smoothing model — damped local trend (DLT) as our baseline. We utilize the python package, Orbit, published by Uber, to implement and train our baseline model [11].

The baseline is modelled as the following equations [11]

With the update process [11]

D(t) is the deterministic trend process, in which the package provides different options (linear, log-linear, flat, logistic). Log-linear performed the best among all options. The log-linear equation of D(t) is presented below.

The downside of using this package is that different models need to be fitted for each station. Therefore to obtain all models for all stations, it will require a substantial amount of resources and time.

Performance Evaluation

As we are essentially using the models for a forecasting prediction task, the evaluation metrics used to compare our algorithms and experiments included Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). They are calculated as follows, where e represents the error, defined as the predicted value minus the true availability, ŷ represents the predicted value, and y represents the ground truth value.

Baseline Model Results

Below are the results from the baseline model for station 7001. The prediction window we have selected was using the first four hours to predict the next hour. This resulted in the following MAE, MSE, and MAPE values.

Table 1. Baseline evaluation using the last 4 hours to predict the next hour.

We can visualize the difference between the prediction and the true bike availability in the figure below:

Figure 11. Bike availability prediction overtime of baseline.

Experimenting with an LSTM Only Model

An interesting pattern becomes apparent when considering the performance of using the same LSTM model across different bike stations. For instance, when the LSTM model is applied to station 7001, it demonstrates an ability to predict hourly bike availability throughout the day, as evidenced by a plotted comparison against the ground truth.

Figure 12. Bike availability prediction vs ground truth for station 7001.

However, this success is not consistently replicated across all stations. In another illustrative example, marked by significant fluctuations, the model struggles to capture temporal dependencies effectively, resulting in less accurate predictions. This discrepancy underscores the limitations of relying solely on temporal features and emphasizes the need for a more nuanced approach in capturing diverse patterns across various stations.

Figure 13. Bike availability prediction vs ground truth for stations with significant fluctuations in bike availability.

Quantitative Results for GNN

When evaluating the model on a held out test set, the GNN resulted in the following MAE, MSE, and MAPE values for the various iterations:

Table 2. MAE, RMSE, and MAPE for different prediction windows.

The training and validation errors for the above prediction windows are shown below:

Figure 14. Training and validation curves for the prediction window last 4 hours predicting next 3 hours
Figure 15. Training and validation curves for the prediction window last 4 hours predicting next 1 hour
Figure 16. Training and validation curves for the prediction window last 2hours predicting next 1 hour

Further hyperparameter tuning was applied by experimenting with different combinations of batch size, epochs, and prediction window.

The model parameters that were found to achieve the best results without overfitting and generalizing to more stations overall were:

  • Batch size: 32
  • Epochs: 300
  • Weight decay: 5e-5
  • Initial learning rate: 1e-3
  • Previous hour lookback: 2
  • Number of hours to predict: 1

We plot some graphs of availability for individual stations to get an idea of how well the model performs. Notice how when predicting the next 3 hours, the output appears smoother and to capture the general trend of bike traffic flows throughout the day, but predicting simply the next hour results in the predictions more closely following real-time changes. Depending on the application, both could be useful; if providing real time feedback to bikeshare users for example, the granularity would help to maintain accuracy so that a biker does not arrive at a station with no bikes. If the use case is more focused on patterns or route recommendations, the smoother prediction may be sufficient.

Figure 17. Comparisons between prediction windows: left is previous 4 hours predicting next hour; right is previous 4 hours predicting next 3 hours.

Conclusions and Recommendations

Overall, the best results and performance on the dataset came from the GNN solution. In comparison to the baseline, the GNN demonstrates superior scalability, allowing it to handle a broader scope of information. Unlike the baseline, which is limited to examining one station at a time, the GNN excels in encoding information about various stations within the bikeshare network. Despite the fact that the MAE, MSE, and MAPE values may register higher for the GNN, it emerges as the preferred solution when considering the entire bikeshare network. The enhanced scalability and the ability to encapsulate knowledge about multiple stations position the GNN as a more comprehensive and effective approach for analyzing and predicting trends across the entire bikeshare system.

To refine the model, we would be interested in exploring different methods and strategies in our experiments. On the data preprocessing front, we could include additional relevant information such as the weather, neighborhood population, population of individuals in the neighborhood holding Toronto bike share memberships, etc, to improve our model’s performance. We would also like to explore the impact of training the GNN with more granular data, ie, using data with a 10 minute frequency instead of an hourly frequency. Segmentation of the stations by popularity or location could also be done to train multiple models that would be more applicable to certain areas or use cases.

The links below are our project folder and implementations for GNN, LSTM, and Baseline:

References

[1] “A look back on 2022: Bike Share Toronto,” Bike Share Toronto, https://bikesharetoronto.com/news/a-look-back-on-2022/ (accessed Oct. 24, 2023).

[2] C. Zhang, J. J. Q. Yu and Y. Liu, “Spatial-Temporal Graph Attention Networks: A Deep Learning Approach for Traffic Forecasting,” in IEEE Access, vol. 7, pp. 166246–166256, 2019, doi: 10.1109/ACCESS.2019.2953888.

[3] R. Guo et al., “BikeNet: Accurate Bike Demand Prediction Using Graph Neural Networks for Station Rebalancing,” 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Leicester, UK, 2019, pp. 686–693, doi: 10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00153.

[4] A. Woodward, “Predicting Los Angeles Traffic with Graph Neural Networks,” https://medium.com/stanford-cs224w/predicting-los-angeles-traffic-with-graph-neural-networks-52652bc643b1.

[5] A. Kaltenbrunner, R. Meza, J. Grivolla, J. Codina, and R. Banchs, “Urban cycles and mobility patterns: Exploring and predicting trends in a bicycle-based public transport system,” Pervasive and Mobile Computing, vol. 6, no. 4, pp. 455–466, 2010. doi:10.1016/j.pmcj.2010.07.002

[6] J. W. Yoon, F. Pinelli, and F. Calabrese, “Cityride: A predictive bike sharing journey advisor,” 2012 IEEE 13th International Conference on Mobile Data Management, 2012. doi:10.1109/mdm.2012.16

[7] Ashqar et al., Citation2017; Ruffieux et al., Citation 2018; Yang et al., Citation 2020 https://www.tandfonline.com/doi/full/10.1080/154

[8] H. I. Ashqar et al., “Modeling bike availability in a bike-sharing system using machine learning,” 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS), 2017. doi:10.1109/mtits.2017.8005700

[9] G. M. Dias, B. Bellalta, and S. Oechsner, “Predicting occupancy trends in Barcelona’s bicycle service stations using open data,” 2015 SAI Intelligent Systems Conference (IntelliSys), 2015. doi:10.1109/intellisys.2015.7361177

[10] Y. Li et al., “Graph Neural Network for spatiotemporal data: Methods and applications,” arXiv.org, https://arxiv.org/abs/2306.00012 (accessed Dec. 17, 2023).

[11] E. Ng, Z. Wang, H. Chen, S. Yang, and S. Smyl, “Orbit: Probabilistic forecast with exponential smoothing,” arXiv.org, https://arxiv.org/abs/2004.08492 (accessed Dec. 17, 2023).

--

--