GNN-based Traffic Prediction in Calgary

Haolong Yang
AI4SM
Published in
16 min readOct 26, 2023

By Eric Liu, Haochen Zheng, and Haolong Yang as part of the course project of ECE1724H: Bio-inspired Algorithms for Smart Mobility. Dr. Alaa Khamis, University of Toronto, 2023.

Abstract

As the population and the number of drivers grow across the world, traffic is becoming a much more serious problem that commuting drivers need to face day by day. In this article, we aim to contribute to improving the situation by predicting the future traffic flow. In the beginning, we summarize the importance of solving the traffic issue by traffic prediction, define the problem from the perspective of traffic data provided by real-world traffic cameras, and list the challenges of predicting traffic based on geographical information. We propose a solution that utilizes Graph Convolutional Network (GCN) to learn the traffic behaviour and generate future data for a given selection of datasets. In addition, the predicted data are evaluated quantitatively and qualitatively to demonstrate the performance and accuracy of our result. Ultimately, we conclude with some thoughts and ideas about the overall problem scope, and the potential plan for future improvements.

Introduction

In 2022, Toronto drivers spent an average of roughly 199 hours in rush-hour traffic, leading the rest of Canadian cities with the highest number of time that drivers in rush hours have to endure [1]. The record is followed by Vancouver and Montreal, where it costs drivers 197 and 180 hours to maneuver in the traffic [1]. Moreover, from these impactful numbers, however, they are not unbearable enough. For the city with the worst data globally, as per the same report, last year commuting drivers in London spent an astonishingly 325 hours stuck in traffic [2]. This is a significant waste of resources such as time, money, and most importantly, people’s productivity. When converting these number of hours into excessive spending, road congestion costs the U.S. economy a grand total of $179 billion annually [3]. From these data, it is straightforward to see the remarkable negative impact that road congestion has on people’s lives and the world’s economy. This leads us to explore various resolutions for saving people’s time spent in traffic, thereby saving a considerable amount of unnecessary economic expense, which releases the potential for increased productivity. In this article, we intend to apply machine learning techniques in Graphical Neural Networks to better predict the traffic flow, in order to propose a better suggestion on the route people choose to drive around the city to their destinations during rush hours.

Image Source: TomTom
Image Source: TomTom

The problem we are trying to solve can be interpreted in the following way: Given a map that contains information about all roads and streets in the city, and a list of traffic cameras, which includes information such as their locations and the volume of traffic (in number of cars) that each camera recorded per day in real-time. We would like to use the existing information to predict the trend of how traffic grows and diminishes for the next couple of days. By utilizing the predicted future traffic data, navigation applications can make better calculations or decisions on route suggestions to drivers in advance, such that congestion areas can be avoided and the load on streets can be distributed more evenly, which results in a balanced traffic distribution around the city and reduced average road congestion in rush hours.

The challenges of predicting future traffic mostly focus on three aspects,

  1. Traffic cameras may not fully cover the entire transition. The cameras generally have a higher density in the downtown or central area of a city. For relatively distanced roads and highways that are outside of the city center, there are fewer traffic cameras. Therefore, while traffic data are available for almost every intersection in the city center, fewer data can be obtained from other areas, which makes it challenging to implement the algorithm and train the GNN model to calculate future data despite the noticeable difference in the amount of existing data in different areas of the map.
  2. The reliability and accuracy of predicted future data. The predicted data will only be useful to drivers and have a positive impact on reducing traffic if and only if they are reliable and accurate. Since we are implementing our own algorithm, it is critical and challenging to guarantee the quality of predicted data and maintain the same level of prediction precision for the traffic on every road in and around the city.
  3. The length of the future period can be accurately predicted. The potential applications that will utilize our predicted result are likely to require a certain length of future data so that the algorithm can be designed and decisions can be made to advise a time-saving route for commuting drivers who need to travel different distances and have different destinations. Therefore, we need to target a minimum length (for example, 7 days) and design the algorithm in a manner that the future data are reliably available for that future period, which is another challenging aspect.

Literature Review

Numerous studies have been conducted in the area of traffic forecasting aimed at enhancing day-to-day planning. When we specify the region into mid-and-long term traffic prediction, they can be divided into two categories: model-driven approach (also known as dynamical modelling) and data-driven approach.

The representative methods for the model-driven approach include the queuing theory model [4], the cell transmission model [5], the traffic velocity model [6], and the microscopic fundamental diagram model [7]. Nowadays, dynamical modelling has become less popular with the rapid growth of traffic data size. It uses mathematical tools and real-world knowledge to formulate traffic problems by simulations. With correct conditions and assumptions, dynamical modelling can achieve high accuracy when applied in various Intelligent Transport System (ITS). However, the shortage of this method is obvious: traffic data is influenced by many factors and it is difficult to obtain an accurate traffic model. It also has hard requirements in sophisticated systematic programming and massive computational power [8].

Data-driven methods including data analysis with statistic methods and machine learning models. In terms of classic statistical methods, it can go back as far as 1979 when Ahmed and Cook proposed algorithms based on the Autoregressive Integrate Moving Average Model (ARIMA) in freeway traffic time-series data [9]. In 1995, Hamed et al. [10] used the ARIMA model to predict the traffic volume in urban road networks. In addition, various variants were produced to improve the prediction precision of the model, which include Kohonen ARIMA [11], subset ARIMA [12], and seasonal ARIMA [13]. On the other hand, these models depend on the assumption of stationary, they neither can reflect the nonlinearity and uncertainty characteristics of traffic data, nor overcome the abruption of random events such as traffic accidents. Its limitation on the representability of nonlinear traffic flow due to the reluctance of spatio-temporal relationship make it a secondary option to machine learning and neural network.

In recent years, Machine learning has become a major contemporary approach to traffic flow prediction, with a particular focus on graph neural networks (GNNs). There are many different variants, such as Deep Convolutional Neural Network (DCNN), recurrent neural network (RNN), Temporal Graph Convolutional Network (T-GCN) [14], and Spatial-Temporal Graph Attention Networks (ST-GAN) [15]. The last two models affect this article the most. The T-GCN emphasizes the importance of neighbours in the framework, it requests the connection between different nodes in the prediction [14]. As for the second important model, ST-GAN, its special feature is that it utilizes the LSTM network to extract temporal domain features related to the time series information [15].

As a result, we can conclude that there are two key parts of the Machine Learning method for the time serial geospatial information: the connection of nodes and the time serial carry-on. For the previous part, Graph Convolutional Network has been selected. As an improvement of traditional CNNs, it emphasizes the connection between different nodes, which is the edge in the geospatial data. For the second part, we would like to add long short-term memory layers to address the analysis needs on time-series data.

Problem Formulation

This problem is formulated based on our dataset, which originates from an extensive collection of traffic camera images captured by the Data Exploration and Integration Lab (DEIL) at CSBP, Statistics Canada [16]. With over 2,500 cameras in its repository, the dataset records daily traffic counts spanning from February 2, 2022, to August 31, 2023. Each row in the dataset represents the traffic information recorded by one camera, including the camera ID, city name, camera location, and the number of cars captured by the camera each day. The primary area of interest in this project centers around the cameras situated in and around the city of Calgary. Here is what the dataset looks like:

It is essential to note that our problem model is derived from the above data. Therefore, for the entire problem space, the raw data can be represented in the following form:

where N is the total number of cameras and T is the total number of days of previous data available from our dataset.

For each x_t, it has the form:

where m is between 1 to N, and

denotes the number of cars on a certain day t, which is monitored by the traffic camera i.

For the y value, it would be

Meanwhile, the connection between different cameras is translated to the variable edge_index. The formulation of edge_index is

where e_1 and e_2 contain the two nodes of edges separately.

Problem Model: Multivariate Spatial-Temporal Graph Convolutional Network (MST-GCN)

In this article, we want to perform traffic forecasting in Calgary focusing on several important traffic cores monitored by cameras. The dataset contains key information including camera location and traffic amount by days from 2022–2–2 to 2023–8–31.

Considering the time-serial feature and the relationship and connectedness between different edges, we decided to use the model MST-GCN as the core of the study, which can be explained in three parts:

  1. Joint reconstruction-based (DAGMM) and prediction-based (LSTM, GCNs) methods to deal with time-serial data.
  2. Using a GNN-based model to deal with the graph-structure data, for the camera location and road connections. The road information is transferred to an unweighted graph G. G=(V, E), the spatial information of different cameras is translated to the G, while V is the cameras and E is the connections of cameras.
[14]

3. The overall diagram of the model has two parts, the first one is the spatial feature using GCN layers, and the second part is the temporal feature using the Long Short Term Memory.

Exploratory Spatial Data Analysis (ESDA)

Scatter Map

The map displays various camera locations that we have access to and offers an overview of their distribution across Calgary. A significant concentration of cameras is observed in the downtown area, the other cameras are positioned near major city streets. Based on the location information, it is possible for us to track traffic flow and do the respective prediction.

# Convert into a dictionary for plotly
traffic_dict = dict(type='scattermapbox',
lat=[x[1] for x in coords],
lon=[x[0] for x in coords],
mode='markers',
text=text,
marker=dict(size=8, color='blue'),
hoverinfo='text',
showlegend=False)

center=(43.662643, -79.395689) # UofT main building
fig = obj.Figure(obj.Scattermapbox(traffic_dict))

# defining plot layout
center_lat = 51.0447 # Latitude of Calgary
center_lon = -114.0719 # Longitude of Calgary
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0}, mapbox = {'center': {'lat': center_lat, 'lon': center_lon}, 'zoom': 11})
fig.show()

Heat Map

Using a heat map to represent traffic data provides an alternative perspective. Instead of focusing on individual camera spots, it offers a holistic view of traffic conditions in specific regions by assigning colors. In this map, shades of red indicate the intensity of traffic flow. The map aggregates data from nearby cameras to create colored blocks, effectively illustrating areas with significantly heavier daily traffic, and allowing for a broader understanding of traffic patterns.

from numpy import NaN
from folium.plugins import HeatMap

dm = folium.Map(location=[51.0447,-114.0719],zoom_start=12, scrollWheelZoom=True, dragging=True)

# Create a HeatMap layer from your DataFrame
heat_data =[]
for i in range(len(coords)):
heat_data.append([coords[i][1],coords[i][0],mean_traffic[i]])

HeatMap(heat_data).add_to(dm)

dm.save("traffic_density_map.html") # Save the map to an HTML file
dm

Proposed Solution

Data Processing

From our dataset, it is not uncommon that some cameras do not have the full coverage of traffic count every day. The value for those days that do not have a record under a specific camera shows as empty. Therefore, we need to deal with the missing data in the data source. This process includes removing the missing data from both day records and camera records.

rows_with_na = cdf[cdf.isna().sum(axis=1)>200]
na_counts = cdf.isna().sum(axis=1)
columns_with_all_na = cdf.columns[cdf.isna().all()]

After the above data processing, the result reflects there are days with no data collected at all and cameras are missing over 200 days of information.

cdf = cdf.dropna(axis=1, how='all')
cdf = cdf.head(144)
mean_traffic = cdf.filter(regex='^x202').mean(axis=1).round().astype(int)

# Fill missing values in columns starting with 'x202' using the mean values
for col in cdf.filter(regex='^x202').columns:
cdf[col] = cdf[col].fillna(mean_traffic)

Edge Formation

It is important to construct the connection between different cameras. The process is mainly based on the dataset attribute of camera_road and WKT, which represents the longitude and latitude information of the camera. From those attributes and their value, the camera_road is used to decide the cameras on the same road and the WKT is used to decide which pairs of cameras are closer to each other.

Since the required edge_index that we defined in the problem modelling has the format of two lists of nodes of edges, we utilize two arrays to carry the values of two lists.

cdf['edge'] = None

# Iterate through the rows of the DataFrame
left_arr=[]
right_arr=[]
for index, row in cdf.iterrows():
selected_object = row['traffic_camera']
sel_pos = getLatLon(row["WKT"])
selected_x = row['a']
selected_y = row['b']
similar_object_names = []
# Find similar objects based on either Type_X or Type_Y containing the same values
similar_objects = cdf[(cdf['a'] == selected_x)].sort_values("lon").reset_index()
if similar_objects.size > 0:
selected_ind = similar_objects[similar_objects["traffic_camera"] == selected_object].index[0]
similar_object_names.append(similar_objects.loc[max(selected_ind-1,0),'traffic_camera'])
similar_object_names.append(similar_objects.loc[min(selected_ind+1,similar_objects.shape[0]-1),'traffic_camera'])
# print(similar_objects.loc[max(selected_ind-1,0),'traffic_camera'],selected_ind ,similar_objects[["traffic_camera","WKT","a","b","lon","lat","edge"]])
similar_objects = cdf[(cdf['b'] == selected_y)].sort_values("lat").reset_index()
if similar_objects.size > 0:
selected_ind = similar_objects[similar_objects["traffic_camera"] == selected_object].index[0]
similar_object_names.append(similar_objects.loc[max(selected_ind-1,0),'traffic_camera'])
similar_object_names.append(similar_objects.loc[min(selected_ind+1,similar_objects.shape[0]-1),'traffic_camera'])
# Remove the selected object from the list, if it's included
similar_object_names = [x for x in similar_object_names if x != selected_object]
# Update the 'Similar_Objects' column with the list of similar objects
if len(similar_object_names) > 0:
cdf.at[index, 'edge'] = similar_object_names
for i in similar_object_names:
left_arr.append(index)
right_arr.append(i)

MST-GCN model

The MST-GCN model centers around two key components: GCNConv and LSTM. Input batches would process through the GCN convolution network, expanding to a specified hidden size to propagate hidden features. Subsequently, the data goes through layers of LSTM, and after compression via a linear layer, yields a final prediction for each camera.

class GCNLSTM(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers=1):
super(GCNLSTM, self).__init__()
# GCN layer for spatial information
self.gcn = GCNConv(input_size, hidden_size)
# LSTM layer for handling temporal dependencies
self.lstm = nn.LSTM(hidden_size, hidden_size, num_layers, batch_first=True)
# Fully connected layer for final prediction
self.fc = nn.Linear(hidden_size, output_size)

def forward(self, x, edge_index):
# Apply GCN layer to incorporate spatial information
x = self.gcn(x, edge_index)
# LSTM layer for handling temporal dependencies
out, _ = self.lstm(x)
# Fully connected layer for final prediction
out = self.fc(out)
return torch.squeeze(out)

Performance Evaluation

The spatial graph in the previous part shows the difference in the mean traffic flow of each spot within 2022. Most downtown areas exhibit intensive traffic, with corresponding impacts on connected roads. The prediction loss is computed using L2 regularization, employing Mean Square Error (MSE) to quantify the disparity between the target expectations and the generated output. The loss function serves as an indicator of prediction accuracy.

MSE involves squaring the differences, which magnifies larger errors and increases sensitivity to outliers. Lower MSE values signify superior model performance, with zero being the optimal value indicating perfect predictions.

Here is the graph showing the loss across epochs. Convergence occurred after approximately 100 epochs, stabilizing at around 75. The acceptable loss suggests that the squared error, denoting the difference between the result and expectations, is less than 10. However, there are strong fluctuations in the training loss, which can be caused by the extreme value from the most crowded areas that can heavily affect the result.

To further increase the accuracy of prediction, the effect of the large difference between traffic counts needs to be diminished. MinMaxScaler is added for this purpose. Here is the basic equation that we use to normalize data.

This normalization ensures that traffic counts at each location are scaled between 0 and 1, minimizing the influence of large numerical values. Consequently, the scale of the loss differs when considering the normalized numbers.

Here is a sample input of traffic count in each camera for 2022–02–09:

After normalization:

The subsequent prediction is thereby generated based on the normalized data:

This example serves to demonstrate the efficacy of traffic flow prediction based on the preceding 8-day records. Based on 8-day records from 144 cameras, our system was able to predict the distribution of traffic within an acceptable range.

Conclusions & Recommendations

The project is designed based on the dataset collected from Statistics Canada [16]. Due to unexpected reasons, some of the records are not consequential, and some days are unrecorded by traffic cameras, creating gaps in the timeline. The disconnection in the time series can heavily affect the prediction accuracy. During the data processing, we have to drop the days with no data recorded, which separates the continuous data into 4 pieces for prediction. We also dropped cameras whose missing data were above 20% for parity purposes. To deal with these missing data, we decided to fill in the blank with the average traffic amount. However, one noticeable limitation is that we were not able to confirm or evaluate there is no negative impact due to this process.

Another difficulty we concurred with was the error introduced by the large difference between traffic counts. Throughout the entire dataset, the traffic count varies from fewer than 10 cars per day to above 8000 cars. Without any adjustment, the loss can be extremely high and the neural network was not able to capture information, which ended up with creating predictions within a similar range. To resolve this issue, we proposed two ways of scaling the data down, applied multiplication to a certain ratio to reduce the difference and integrated MinMaxScaler. The MinMaxScaler showed overwhelming performance and provided us with a reasonable prediction.

Additionally, significant discrepancies between observed values and expectations in certain samples may be attributed to limitations in the information gathered from Statistics Canada [16]. In our project, the data collected from Statistics Canada can not directly illustrate the connections between cameras. However, GCN takes priority in adapting edge situations to analyze traffic flows. The potential further improvement of this project can be highly related to edge indexing and edge weight distribution. Both of these can provide more information on the connection between cameras, leading to a more persuasive result.

Reference

[1] TOMTOM TRAFFIC INDEX Ranking 2022, from https://www.tomtom.com/traffic-index/ranking/?country=CA

[2] Data shows how much time Canadians spent in rush hour traffic in 2022. Posted: Feb 22, 2023 4:00 AM EST, from https://www.cbc.ca/news/canada/hamilton/traffic-time-2022-1.6755102

[3] How traffic jams cost the US economy billions of dollars a year. Published Mon, Dec 23 201910:44 AM EST, from https://www.cnbc.com/2019/12/24/traffic-jams-how-they-form-and-end-up-costing-the-us-economy-billions.html

[4] X. Y. Xu, J. Liu, H. Y. Li, and J. Q. Hu, “Analysis of subway station capacity with the use of queueing theory,” Transportation Research Part C Emerging Technologies, vol. 38, no. 1, pp. 28–43, Jan. 2014.

[5] P. Wei, Y. Cao, and D. Sun, “Total unimodularity and decomposition method for large-scale air traffic cell transmission model,” Transportation Research Part B, vol. 53, no. 3, pp. 1–16, Jul. 2013.

[6] W. Qi, L. I. Li, H. U. Jianming, and B. Zou, “Traffic velocity distributions for different spacings,” Journal of Tsinghua University, vol. 51, no. 3, pp. 309–312, Mar. 2011.

[7] F. F. Xu, Z. C. He, and Z. R. Sha, “Impacts of traffic management measures on urban network microscopic fundamental diagram,” Journal of Transportation Systems Engineering and Information Technology, vol. 13, no. 2, pp. 185–190, Apr. 2013.

[8] E. I. Vlahogianni, “Computational Intelligence and Optimization for Transportation Big Data: Challenges and Opportunitie,” Springer International Publishing, pp. 107–128, May. 2015.

[9] Ahmed, Mohammed Shahgir and Allen Rusty Cook. “Analysis of Freeway Traffic Time-Series Data by using Box-Jenkins Techniques,” Transportation Research Record (1979): n. Pag.

[10] M. M. Hamed, H. R. Al-Masaeid, and Z. M. B. Said, “Shortterm prediction of traffic volume in urban arterials,” Journal of Transportation Engineering, vol. 121, no. 3, pp. 249–254, 1995

[11] M. V. D. Voort, M. Dougherty, and S. Watson, “Combining kohonen maps with arima time series models to forecast traffic flow,” Transportation Research Part C Emerging Technologies, vol. 4, no. 5, pp. 307–318, Oct. 1996.

[12] S. Lee and D. Fambro, “Application of subset autoregressive integrated moving average model for short-term freeway traffic volume forecasting,” Transportation Research Record Journal of the Transportation Research Board, vol. 1678, no. 1, pp. 179–188, 1999.

[13] X. Fabian, G. Ban, R. Boussad, M. Breitenfeldt, C. Couratin, P. Delahaye, D. Durand, P. Finlay, X. Flchard, and B. Guillon, “Modeling and forecasting vehicular traffic flow as a seasonal arima process: Theoretical basis and empirical results,” Journal of Transportation Engineering, vol. 129, no. 6, pp. 664–672, Feb. 2003

[14] Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, Haifeng Li “T-GCN: A Temporal Graph ConvolutionalNetwork for Traffic Prediction” https://doi.org/10.48550/arXiv.1811.05320

[15] C. Zhang, J. J. Q. Yu and Y. Liu, “Spatial-Temporal Graph Attention Networks: A Deep Learning Approach for Traffic Forecasting,” in IEEE Access, vol. 7, pp. 166246–166256, 2019, doi: 10.1109/ACCESS.2019.2953888.

[16] Data Exploration and Integration Lab (DEIL), CSBP, Statistics Canada (2023) Traffic Flow Dashboard. htttps://www150.statcan.gc.ca/n1/pub/71–607-x/71–607-x2022018-eng.html

--

--