Traffic Volume Forecasting with T-GCN: Importance of Feature Engineering when Solving Spatio-temporal Problems

Amy Kim
6 min readAug 7, 2023

--

Source: Unsplash

Outline

In the Introduction, the problem of traffic forecasting is introduced and conventional forecasting methods are mentioned. In Dataset, the traffic volume dataset chosen for the problem is described. In Feature Engineering, how important features are selected is explained. In Methodology, the T-GCN model is explained and the code snippets are shown. In the Results, the forecasting results are compared among different feature combinations and the models with a selected measure of RMSE. The forecasting results are also plotted for easier visual comparison of forecasting performance. In Conclusion, some insights from this project are included. In References, all the references used for the project are listed.

Introduction

Traffic forecasting has been an important problem in the context of urban traffic planning, traffic management, and traffic control [1]. For real-time, accurate traffic forecasting, both spatial and temporal dependence, of urban road network structure and of dynamic changes with time, respectively, need to be modeled, which remains a challenge.

Historically, spatial regression models such as spatial Durbin have been used to model and predict traffic volume, using spatial dependency information among the observations in different locations. However, such spatial dependency had to be pre-defined by domain experts, which is difficult and expensive [2]. Another approach was to use conventional time series models such as Autoregressive Integrated Moving Average (ARIMA). However, time series models only utilize temporal dependence and lose the learning opportunity from spatial dependence. Recently, as an alternative, graph convolutional network-based models have been suggested as a solution to capture both spatial and temporal information simultaneously. Combining graph convolutional networks (GCNs) and gated recurrent units (GRUs), for example, enables learning complex topological relationships within traffic data for forecasting [1].

In this project, one such model, Temporal Graph Convolutional Networks (T-GCN) will be used for the problem of short-term traffic flow forecasting using both spatial and temporal information. Given the connectivity of sensor locations, traffic volume time series, and other features in neighboring locations, traffic flow prediction aims to predict the traffic volume for the next 15 minutes for all sensor locations. Since the prediction target is numeric, Root Mean Square Error (RMSE) was used to evaluate the forecasting performance of T-GCN.

Dataset

The traffic volume dataset [3] prepared by Zhao et al. (2019) was used for this project. The traffic volume is measured every 15 minutes at 36 sensor locations along two major highways in the Northern Virginia/Washington, D.C., capital region. The 47 features include: (1) the historical sequence of traffic volume sensed during the 10 most recent sample points (10 features), (2) weekday (7 features), (3) hour of the day (24 features), (4) road direction (4 features), (5) number of lanes (1 feature), and (6) name of the road (1 feature).

For T-GCN, the adjacency matrix, which describes the connectivity of 36 sensor locations, and the feature matrix, which has 2101 observations that consist of 47 features for all 36 sensor locations, were used as inputs. The original dataset was in MAT format, so it was converted to NumPy arrays with a size of (len_dataset, num_feats, num_nodes), which were then converted to Torch Dataset. The feature matrix was divided into a train set and a test set, with 60% and 40% of the dataset, respectively. Then a train set was further divided into a train set and a validation set, which was 80% and 20% of the initial train set, respectively. The dataset did not contain any null values. Since the dataset contained both numeric features (e.g. traffic volume measurements in time) and one-hot-encoded categorical features (e.g. hour of the day, day of the week, etc.), it was hard to determine how to pre-process the data. In the beginning, three different versions of data were prepared: 1) not pre-processed, full feature dataset, 2) a normalized full feature dataset (which was divided by the maximum numeric feature value), and 3) a numeric dataset that contains only the previous traffic volume time series and ignores other categorical features. However, the normalized dataset did not contribute to any improvement in performance and was ignored eventually.

Feature Engineering

To select important features while jointly using temporal and spatial features of different scales, the Maximal Information Coefficient (MIC) was used. Inspired by another paper on a similar forecasting problem of short-term passenger flow forecasting [5], the MIC was chosen since it can capture a wide range of functional and non-functional relationships between variables.

As defined above, MIC is the mutual information between random variables X and Y normalized by their minimum joint entropy. MIC between two sets of variables (e.g., 48 features and traffic volume in our example) can be easily calculated using the minepy python package [6].

After MIC was calculated between all the features and target values, only the features with MIC higher than 0.7 were selected, which were the latest 8 traffic volume observations.

Methodology

Given the spatial information of road networks and the temporal information of traffic volume time series, the traffic volume for the next 15 minutes was forecasted with T-GCN.

T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction [1]

T-GCN takes the normalized Laplacian matrix calculated from adjacency matrix A, feature matrix X, and the zero-initialized hidden state as inputs, and calculated the updated hidden states using both GCN and GRU in the process.

T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction [1]

The feature matrix X_t and zero-initialized hidden state h_(t-1) were concatenated and used as inputs to GCN to calculate the reset state r_t and the update state u_t. Then, the reset state * the initial hidden state were used as inputs to GCN to calculate the cell state c_t. The next hidden state h_t was calculated as described above, via the definition of GRU. The final hidden state h_t was then used as an input to the fully-connected linear layer, which calculates the final output, the traffic volume for the next time step. The hidden dimension of T-GCN was tuned with a grid search from the values of [32, 64, 128]. The hidden dimension value that gave the smallest training loss was 64.

Results

The RMSE value of 0.0991 from T-GCN used with all the time series features was the lowest and was similar to the RMSE value of 0.0992 from T-GCN with the selected time series features. When all 48 features were used, the RMSE value was 0.11. So in terms of the RMSE, choosing and using more relevant features helped the performance, rather than using all the available features.

For an easier visual comparison of performance with a full feature set and with a time series feature set, the forecasting results from the best epoch with the smallest validation loss for each case were plotted. It can be observed that the T-GCN captures variations in the ground truth better when only the relevant, time series features were included.

Forecasting results with T-GCN with full features for sensor location 0
Forecasting results with T-GCN with time series features for sensor location 0

Conclusion*

Joint usage of temporal and spatial features of different scales could still be challenging, even with the GCN models that can take both pieces of information. Feature engineering is important in this context so that you don’t use irrelevant information for your problem and hurt the performance of the model. It would be interesting and worthwhile to try out other time series analysis models like ARIMA or spatial regression models for this type of problem.

*Disclaimer: Conclusions in this blog post are in no way derived ouf of thorough research and are subjective, temporary findings from one side project for learning purposes.

References

[1] T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction https://arxiv.org/pdf/1811.05320.pdf

[2] Spatial Auto-regressive Dependency Interpretable Learning Based on Spatial Topological Constraints https://dl.acm.org/doi/pdf/10.1145/3339823

[3] Traffic Flow Forecasting Data Set https://archive.ics.uci.edu/dataset/608/traffic+flow+forecasting

[4] T-GCN code repository https://github.com/lehaifeng/T-GCN

[5] Short-Term Passenger Flow Forecast of Rail Transit Station Based on MIC Feature Selection and ST-LightGBM considering Transfer Passenger Flow https://www.hindawi.com/journals/sp/2020/3180628/

[6] Minepy https://minepy.readthedocs.io/en/latest/python.html

--

--