Predicting Evolution of Dynamic Graphs

An Application of Spatio-Temporal Graph Neural Networks to Traffic Prediction

Tassos Sapalidis
Stanford CS224W GraphML Tutorials
10 min readJan 16, 2022

--

By Tassos Sapalidis and Andrea Vallebueno

To get the most out of this post, follow along in our supplementary Google Colab!

Road systems can naturally be represented as graphs, with the roads themselves being edges connecting points of interest. Traffic, however, is a more complicated concept; in addition to the spatial structure of the roads, there is a temporal element that must be considered. In this article, we will demonstrate how to apply deep learning techniques to dynamic graphs in PyG and PyTorch for the purpose of predicting traffic flow.

Visualization of DC traffic from 1:30pm–4:00pm [9]

What are dynamic graphs?

Simply put, dynamic graphs are graphs that change over time. This could involve the addition or removal of nodes or edges, or a change in node or edge features. We can think of dynamic graphs as a series of “snapshots” of a changing graph. In our case, the structure of our graph (i.e. the nodes and edges) do not change, but the node features do. We think of making predictions on these graphs as a type of time series forecasting problem.

At time t, we can think of making predictions on dynamic graphs as predicting the state of the graph H time steps in the future given M previous time steps. Source: Andrea Vallebueno, adapted from [11]

How do we formulate the traffic prediction problem?

For this project, we will focus on traffic prediction in the Los Angeles metro area using the PeMS District 7 dataset. Traffic data are collected by monitoring stations along major roads and highways every 30 seconds. These stations will be the nodes in our graph.

Matrix inputs to the model. Source: Andrea Vallebueno

Data Preprocessing

Due to the large size of the graph, we randomly select 50 stations out of the 2,789 monitoring stations in District 7. Part 1 in the colab pre-processes the PeMS dataset and generates the inputs to the model, and can easily be modified to build the inputs for a larger number of stations. You can run this colab to create your custom dataset, or feel free to skip to Part 2 to fit the model to the dataset of 50 stations! More details can be found in the Code section below.

Ideally, we’d like the edges to be the roads, or related to the road structure, but this data is unavailable. Instead, we connect all stations, weight the edges by distance, then “sparsify” by removing edges with a weight below a certain threshold. The result is the weighted adjacency matrix W visualized below.

PeMS reports its traffic data at the station level such that it is already aggregated in 5-minute intervals. In our pre-processing of this data, we ignore the direction of traffic and average the speed of traffic in each direction. This gives us a dataset of 1-dimensional features (traffic speed) for each node (station) at each 5-minute time step, which we denote V.

The result is the following graph, which we’ve visualized on top of a map of the Los Angeles metro area.

Visualization of graph used for traffic prediction. Black circles indicate traffic monitoring stations (nodes) and purple lines indicate graph edges. Darker lines indicate higher edge weight (shorter distance). Source: Tassos Sapalidis

Model Architecture

We use a Spatio-Temporal Graph Convolutional Network (STGCN) for our traffic prediction task, developed by Yu et al. [11]

Architecture of Spatio-Temporal Graph Convolutional Networks [11]

The architecture consists of two spatio-temporal convolution (ST-Conv) blocks, followed by an output layer comprised of a final temporal convolution and a fully-connected layer. Each ST-Conv block consists of a spatial graph convolution sandwiched between two temporal convolutions. Next we’ll explain how each of these convolution types work, starting with a high-level description followed by a mathematical description.

Temporal Convolutions: At a high level

The temporal convolution layers here work like time series prediction models. Remember that for each node, we have a sequence of node features indexed by time step. A temporal convolution takes a window of these time steps, and applies a function to the values in this window.

Visual representation of a 1D temporal convolution (GLU not included) [3]

The output of this function is then passed to a GLU, which determines which features are relevant for predicting future time steps.

A few things to note:

1) These temporal convolutions are causal, meaning we will only be using the result of this convolution to predict future node values.

2) The parameters for each temporal convolution layer are shared among all nodes in the graph. The latter reduces the number of parameters and allows us to generalize the model to other graphs.

Temporal Convolutions: Mathematically

We define the temporal convolution kernel Γ and input Y:

where

  • Kₜ is the window size, or the number of prior time steps to consider, and
  • Cᵢ and Cₒ are the number of input and output channels, respectively
  • M is the total number of time steps

The convolution operation Γ(Y) will produce a matrix [PQ].

The GLU is then defined as:

where

  • ⊙ is an element-wise product,
  • Z is the output, of size (M - Kₜ + 1) x Cₒ, and
  • σ(Q) is a sigmoid gate controlling which elements of P are relevant for the time series prediction

Spatial Convolutions: At a high level

The spatial graph convolutions essentially “spread” the node features across the graph. Using our traffic model as an example, we expect traffic information at one station to influence the traffic information at nearby stations. The spatial convolution allows us to capture this effect, using the (weighted) adjacency matrix of the graph. It works much like a traditional image CNN, but generalized to handle a graph structure rather than an organized grid of pixels.

Visual representation of a spatial graph convolution. Source: Andrea Vallebueno

At a high-level, the concept is straightforward. The math behind the implementation, however, can be tricky to grasp.

Spatial Convolutions: Mathematically

I recommend reading up on spectral graph convolutions before continuing; I found this article helpful.

In the context of our traffic prediction problem, we have M frames of the road graph, each of which represent the graph at consecutive 5-minute intervals. In each frame vᵗ, we have a signal

Cᵢ consists of a single dimension, the average traffic speed observed for a node at the specific frame vᵗ. Thus, the number of input channels for each node in the graph, Cᵢ, is equal to 1.

The spatial graph convolution *G involves the application of a kernel Θ to a graph G through a multiplication with the signal X.

where

  • L is the normalized graph Laplacian
  • D is the diagnoal matrix of node degrees
  • W is the matrix of edge weights
  • UΛUᵀ is the eigendecomposition of the Laplacian L, with Λ the diagonal matrix containing the eigenvalues of L and U the matrix of its eigenvectors

The computation of this graph convolution can become quite expensive. Therefore, we make use of a Chebyshev polynomial T in order to approximate this calculation. In this sense, they restrict the kernel Θ to a polynomial of degree K of the matrix of eigenvalues, Λ, with θ comprising the coefficients of this polynomial. Thus, they rewrite the graph convolution as follows.

where L˜ is the Laplacian scaled according to the maximum eigenvalue. [11]

The Code

All code can be found in the Google Colab linked at the top of this post.

We will implement this model using PyTorch Geometric (PyG). The ST-Conv block is conveniently implemented in an extension of PyG, PyG-Temporal.

Installation

You can pip install PyG-Temporal and its dependencies using the instructions in the PyG-Temporal repo. Note that we are using PyTorch 1.10.0 for this implementation.

Data Preprocessing

The Colab will walk you through the data preprocessing steps laid out above. If you’d rather skip this step and use our preprocessed LA metro data for May-June 2021, skip to Part 2 of the Colab.

Model Construction and Training

The Colab will walk you through constructing and training and model. The ST-Conv blocks are easily initialized initialized using the STConv layer in PyG-Temporal.

stconv_block = STConv(num_nodes, input_size, hidden_size,\
output_size, temporal_kernel_size, spatial_kernel_size,\
normalization = 'sym', bias = true)

After combining the stconv_block with the output layer described above, we’ve successfully defined a TrafficModel class to use as our model.

We train using the following pseudocode.

min_val_loss = np.inf
for epoch in range(1, num_epochs + 1):
l_sum, n = 0.0, 0
model.train()
for x, y in train_data:
y_pred = model(x, edge_index, edge_weight)
l = loss(y_pred, y)
optimizer.zero_grad()
l.backward()
optimizer.step()
l_sum += l.item() * y.shape[0]
n += y.shape[0]
val_loss = evaluate_model(model, loss, val_data, \
edge_index, edge_weight, device)
if val_loss < min_val_loss:
min_val_loss = val_loss
torch.save(model.state_dict(), model_save_path)

Due to the number of time steps included, this dataset is massive. We’ve included a small toy example in the Colab to demonstrate the training, but to train on the full dataset, we used an NVIDIA Tesla V100 GPU on GCP. Using the GPU, each epoch took a little over one hour. We trained for 7 epochs; you can see the training logs on this Weights and Biases dashboard.

Predictions and Performance

The table below presents a summary of the performance of the STGCN on the dataset used by Yu et al. [11] and on our PeMS dataset of 50 stations. Performance is evaluated using three metrics: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE).

Comparison of performance of our implementation of STGCN vs. that of Yu et al. [11] Source: Andrea Vallebueno

It is important to note that performance metrics are not directly comparable to those of the original authors. Yu et al. build a medium scale dataset comprising 228 stations in District 7, compared to the 50 stations used in this blog from the same geographic region. Moreover, we use different window sizes compared to those of the original authors, who use 60-minutes (12 steps of 5-minute intervals) of historical data to predict future windows of 15, 30 and 45-minute intervals (3, 6 and 9 time steps, respectively). We note, however, that our performance metrics on the test set are in line with those reported by Yu et al., particularly for their 15-minute window predictions.

Visually, the model’s predictions match intuition. During traditionally low-traffic times, like early morning and late evening/night, our model predicts low traffic. The model predicts high traffic during rush-hour periods.

Visualization of LA traffic predictions using our ST-Conv model for June 14, 2021 from 6:00am-9:00pm. Source: Tassos Sapalidis

We encourage you to visit this Observable notebook for an interactive visualization of the predictions! You’ll be able to directly compare one day of predictions from our model to actual recorded data!

Resources

[1] Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In: Advances inNeural Information Processing Systems 32. Ed. by H. Wallach et al. Curran Associates, Inc., 2019, pp. 8024–8035. URL:http://papers.neurips.cc/paper/9015- pytorch- an- imperative- style- high-performance-deep-learning-library.pdf.

[2] California Department of Transportation. Caltrans Performance Measurement System (PeMS). URL: https://pems.dot.ca.gov/ [last accessed December 7, 2021]

[3] Chao, M., Kulkarni, C., Goebel, K., Fink, O. Fusing Physics-Based and Deep Learning Models for Prognostics. ResearchGate (March 2020).

[4] Charles R. Harris et al. “Array programming with NumPy”. In: Nature 585.7825 (Sept. 2020), pp. 357–362.DOI:10.1038/s41586–020–2649–2.URL:https://doi.org/10.1038/s41586-020-2649-2.

[5] Fey, M., Lenssen, J. Fast Graph Representation Learning with PyTorch Geometric. (2019) arXiv:1903.02428.

[6] Knyazev, B. (2019, August 15). Spectral Graph Convolution Explained and Implemented Step by Step. Medium. Retrieved December 9, 2021, from https://towardsdatascience.com/spectral-graph-convolution-explained-and-implemented-step-by-step-2e495b57f801.

[7] The pandas development team.pandas-dev/pandas: Pandas. Version latest. Feb. 2020. DOI: 10.5281/zenodo.3509134. URL: https://doi.org/10.5281/zenodo.3509134.

[8] Rozemberczki, B., Scherer, P., He, Y., Panagopoulos, G., Riedel, A., Astefanoaei, M., Kiss, O., Beres, F., Lopez, G., Collignon, N., and Sarkar, R. PyTorch Geometric Temporal: Spatiotemporal SignalProcessing with Neural Machine Learning Models. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (2021), p. 4564–4573.

[9] Vivek, S. (2021, November 25). Visualizing real-time traffic patterns using here Traffic Api. Medium. Retrieved December 9, 2021, from https://towardsdatascience.com/visualizing-real-time-traffic-patterns-using-here-traffic-api-5f61528d563.

[10] Wei, H. (2021, September 1). Build your first Graph Neural Network model to predict traffic speed in 20 minutes. Medium. Retrieved December 9, 2021, from https://towardsdatascience.com/tagged/traffic-prediction

[11] Yu, B., Yin, H., and Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. arXiv e-prints (Sept. 2017), arXiv:1709.04875.

[12] Yu, B., Yin, H., and Zhu, Z. (April 28, 2019). Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting [electronic resource: python source code (GitHub repository)]. URL:https://github.com/VeritasYin/STGCN_IJCAI-18

--

--