Spatio-Temporal Forecasting using Temporal Graph Neural Networks

Published in

Data Reply IT | DataTech

8 min readJun 5, 2023

Introduction

Spatio-Temporal Forecasting is a complex task that drives both research and industrial innovation in several application domains.
Graph Neural Networks demostrated to be a promising approach to account the task since they allow to better simulate the relationship between the nodes on a spatio-temporal data graph.

In industrial scenarios, spatio-temporal forecasting can be applied to monitor industrial control systems (ICS), i.e. to predict the demand for certain products and optimize the supply chain accordingly.
Another example of application can be to detect traffic congestion, that can be linked to crashes, in order to take immediate action.

In this article, we will introduce how to approach the spatio-temporal forecasting task using most popular graph neural network architectures and present how to accomplish the task of traffic prediction.

What are Dynamic Graphs?

With the term Dynamic Graph we refer to a discrete sequence of static graphs G = (G₀, G₁, …, Gₜ) where each Gᵢ = (Vᵢ, Eᵢ), namely Vᵢ the set of nodes and Eᵢ the set of edges at instant i, and t is the total length of the sequence. Dynamic Graphs can be used both to represent graph-shaped data that varies over time (i.e. traffic) or graph structures that vary over time (i.e. social networks).

Spatio-Temporal Graph Convolutional Network

The Spatio-Temporal Graph Convolutional Network architecture (STGCN) [1], denotes the first attempt to use graph convolutions in spatio-temporal series forecasting.

Previous studies concerning RNN completely neglected spatial attributes of traffic networks. Even CNN with 2-D convolutions on grids, it can only capture the spatial locality roughly since data is split into multiple segments or grids due to compromises in data modeling.

The framework of STGCN consists of two spatio-temporal convolusional blocks and a fully-connected output layer at the end. Then, each ST-Conv Block contains two Temporal Gated Convolution layers and one Spatial Graph Convolution layer in the middle. The Spatial Graph Convolution is the common convolution on graphs which takes into account the spatial dependencies between nodes in the graph (message propagation + aggregation). The Temporal Gated Convolution instead aims to extract information from temporal dependencies of subsequent graph representations. Indeed, the Temportal Gated Convolution layer contains a 1-D convolution with a width-Kₜ kernel followed by a Gated Linear Unit defined as P ⊙ σ(Q) where P and Q are two matrices produced by the 1-D Conv and sigma is the sigmoid function and control which input P of the current states are relevant for discovering dynamic variances in time series.

Graph WaveNet

Graph WaveNet architecture [2] proposes a slightly different approach with respect to the STGCN described earlier, which provides better results with respect to the previous one and is still competitive compared to the most fanciest SOTA methods at the time this article is written (like ST-LGSL [3] or D2STGNN [4]).

The framework of Graph WaveNet consists of a stack of K Spatio-Temporal layers each one containing a Gated Temporal Convolution layer, described by two parallel Temporal Convolution layers (TCN-a and TCN-b), and a Spatial Convolution layer (GCN).

Graph WaveNet applies dilated causal convolution, which means that it is able to capture longer sequences with fewer layers. Thus, at each of the K layers, GCN computes spatial dependencies at different temporal levels. For example, at lower layers, it receives short-term temporal information, while at higher layers,it tackles long-term ones.

For instance, unlike the previous architecture, Graph WaveNet outputs the whole prediction sequence of subsequent T steps rather than generating each timestamp recursively. This translates to longer training time but shorter inference time, which can be set as a trade-off to choose the appropriate architecture:

Predict traffic congestions

As anticipated at the beginning, traffic prediction is the most popular application for spatio-temporal forecasting. In particular, datasets such as METR-LA and PEMS-BAY represent the baseline for performance evaluations for previously described architectures; for instance, they are already integrated in libraries such as Pytorch Geometric Temporal.

In these datasets, the nodes represent traffic sensors, which are used to collect traffic data, such as speed or volume. Each sensor is assigned a unique identifier, which is used to distinguish it from other sensors in the dataset. The edges in these datasets are not explicitly represented, as the sensors are not connected to each other in a physical sense. Instead, the edges are inferred based on the geographic proximity of the sensors.

In this article, for the sake of differentiating from other publications, I would like to describe how to deal with traffic data recorded with GPS. The goal is to predict the usual traffic flow in principal metropolitan areas, so that it can be possible to detect anomalies that can be linked to congestions or crashes.

Data enrichment with OpenStreetMap

Let’s assume we have at disposal recordings of trips as a set of coordinates (latitude, longitude) communicated with a certain periodicity. It is possible to enrich such data using OpenStreetMap APIs, to obtain the level of detail at the single street arc, called way.
To filter the ways related to a certain area of interest it is possible to query the OSM service through i.e. OSMnx library to get the boundary of the area.
Let’s see an example of query concerning the area of Milan:

import osmnx as ox
import plotly.express as px

polygon = ox.geocode_to_gdf("Milan, Lombardy, Italy", buffer_dist=None)

x, y = polygon.geometry.iloc[0].boundary.coords.xy
data = {'lon': x, 'lat': y}
bb_df = pd.DataFrame(data)
fig = px.scatter_mapbox(bb_df, lat="lat", lon="lon")

The first line of code will extract a shapley Polygon object which delimits the chosen region, it is possible to specify also a buffer distance in meters to expand the area. Next lines are useful to extract coordinates of the boundary to visualize it on a map.

Once extracted the area of interest, it is possile to retrieve all the ways inside it. We can do it again using OSMnx through the graph_from_polygon method, which allows also to choose the network type among i.e. drive, bike, walk or all. Then, using graph_to_gdfs method, it is possible to extract all the ways information. It can be useful for instance to encapsulate those information in a dictionary with the OSM way id as key:

import osmnx as ox

network_graph = ox.graph_from_polygon(polygon.geometry.iloc[0], simplify=False, network_type='all')
ways = ox.graph_to_gdfs(network_graph, nodes=False, edges=True)

ways_dict = {}
for index, row in ways.iterrows():
  osmid = str(row['osmid'])
  
  way_info = {
      'address': row['name'],
      'highway': row['highway'],
      'speedlimit': row['maxspeed'],
      'oneway': row['oneway'],
      'nodes': list([tuple(reversed(coord)) for coord in row['geometry'].coords]),
      'tot_length': row['length']
  }
  
  ways_dict[osmid] = way_info

As one can notice, each way itself is constituted of nodes, linking subsequent segments of each street; it can be useful to extract coordinates of these points to represent ways on a map.

Once all the streets in the area of interest have been extracted, which will constitute the nodes of the graph, it is time to extract the connections between them, or better, the edges of the graph.
It is possible to do that exploiting the network graph previously extracted as follows:

connections = {}

for u, v, key, data in network_graph.edges(keys=True, data=True):
    # Check if the current way has a connection to another way and extract its osmid
    if G.has_edge(v, u):
        k = str(data['osmid'])
        v = str(network_graph.get_edge_data(v, u, key)['osmid'])
        if k in connections: connections[k].append(v)
        else: connections[k] = [v]

This will build a dictionary containing the set of connections for each way.

The final step is to extract the corresponding way for the set of coordinates at disposal. Again, it is possible to do that by querying OSM through APIs. An example using the OSMnx library and the network graph previously extracted can be:

import osmnx as ox
nearest_way = ox.distance.nearest_edges(network_graph, lon, lat, return_dist=False)[0]

Based on what data we have associated with those coordinate points, it is possible to make different predictions: in case you have a set of coordinates, it is possible to record the number of transits in each way; if you also have other information, such as the recorded velocity in that point, it is possible to predict the traffic using also that indicator.

For example in Figure a) it is provided a visualization of the number of transits recorded at each hour of in the city of Milan, with the higher number of transits in yellow.

Figure a) Evolution of the daily traffic recorded in Milan

Experiments setup

Once we have associated our data with corresponding ways and retrieved the connections between them, it is possible to input them into the forecasting model. The following experiments have been performed using the open-source ST-GAT model implementation available at this github repo.

The dataset has been drafted using the top 500 ways with a higher number of hourly recordings over a period of 28 days, then it has been splitted in 20 days for training, 4 for validation and 4 for test. We use the previous 2 hours recordings to predict the traffic flow and the previous 3 hours for predict the average velocity on subsequent hours.

I left the configuration of the model as default, using MSE for the loss function, MAE for the evaluation metrics and 100 training epochs.

The results achieved are quite good as shown in Figure b.1) and Figure b.2). In fact, the model is able to forecast the traffic peaks and average velocity of streets in future days.
Moreover, it is possible to obtain even better predictions using more historical data in the training phase, taking care to use strategies such as early stopping to prevent overfitting.

Figure b.1) Traffic forecasting on a random way

Figure b.2) Velocity forecasting on a random way

Conclusions

Spatio-Temporal Graph Convolutional models have proven capable to forecast the traffic flow in metropolitan areas and therefore can be successfully employed to optimize traffic management and reduce congestion on roads. Moreover, it is possible to identify anomalies in traffic flow by analyzing significant deviations with respect to the forecast.

References:

[1] Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting

[2] Graph WaveNet for Deep Spatial-Temporal Graph Modeling

[3] Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting

[4] Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting

[5] Spatial-Temporal Graph Attention Networks: A Deep Learning Approach for Traffic Forecasting