Optimizing Traffic Sensor Placement in the City of Toronto

Yichen Liao
AI4SM
Published in
12 min readOct 26, 2023

By Yichen Liao, Xinyi Gong and Amir Hossein Mobasheri as part of course project of ECE1724H: Bio-inspired Algorithms for Smart Mobility. Dr. Alaa Khamis, University of Toronto, 2023.

Photo by Berkin Üregen on Unsplash

Problem Characterization

The optimization of traffic sensor placement in the City of Toronto is a multidimensional and difficult challenge in the era of smart mobility. It involves strategically determining the locations for traffic sensors throughout the city to minimize capital costs while adhering to specific performance criteria for traffic monitoring and control. This project can enhance smart mobility for the city of Toronto, congestion prevention, accuracy of traffic data, and the overall functionality of the city’s transportation infrastructure.

Sensor optimization in many studies focuses on accuracy of the prediction models, but in a real world scenario, resources are always limited. Therefore, strategically selecting types and locations of sensors can improve prediction coverage and accuracy of traffic data.

Purposes

Empowering Smart Mobility

Traffic control plays a vital role in smart mobility. In order to build cities with more efficient public transportation systems, more accurate navigation apps, and smarter autonomous vehicles, it’s essential to be completely aware of the traffic patterns. Gathering useful and accurate traffic information is possible by efficiently placing the traffic sensors in the optimal places in a city.

Congestion Prevention

In order to plan and design strategies to prevent congestion, we need to have exact information about the traffic levels and traffic flows throughout the city. Thus, we need to be able to determine the best choice of locations for traffic sensors.

Safety and Security

Using traffic data, we can come up with new city planning strategies in long-term or better route suggestions in a mobile app aiming to minimize the chance of incidents and traffic accidents.

Related Works

R. Semaan demonstrated in [1] that choosing sensors location and types strategically can greatly improve the accuracy of prediction in computer fluids simulation models. Seaman’s idea of selecting sensor locations is finding the most relevent input for the model and came out with an accurate result. Semaan randomizes sensor locations and measures prediction accuracy as a feedback factor for the machine learning models. This project is following Seaman’s idea of utilizing machine learning to optimize sensor placement in traffic systems, a chaotic system similar to a dynamic fluid system.

In [2] , authors argues how to use a heuristic algorithm to maximize the efficiency of traffic sensor placement. Their paper mentions that including movement vectors in the algorithm can reduce the number of sensors for a targeted accuracy target.

Our traffic prediction training method and machine learning modeling and architecture are heavily influenced by Amelia Woodward’s article [3]: Predicting Los Angeles Traffic with Graph Neural Networks. For any design detail and insight of how to make a prediction model, please refer to their medium article.

Problem Datasets

Overview

Our project uses a detailed collection of traffic statistics data for the city of Toronto, which comes from Statistics Canada [4]. This dataset contains the following elements:

  1. Traffic Recorder Counts:

This shows us the total traffic volume captured by each camera, containing all types of vehicles and their movement directions. This data helps us understand the flow of traffic for specific road segments.

  1. Camera Locations Information:

This consists of the exact geographic coordinates where the traffic is being measured, indicated by latitude and longitude, as well as name of intersections. It helps in locating the precise location of traffic sensors.

  1. Interval Traffic Data Counts:

The raw traffic volumes are recorded in intervals of 15 minutes, providing a detailed flow of traffic. This data spans from February 2, 2022, to September 30, 2023, offering an extensive timeline to analyze traffic trends and changes over a significant period.

Data Processing

This project decided to use data spans from February 2, 2022, to September 30, 2023 as a dataset. The data is processed before passing to the training model, and the main steps are:

  1. Save raw data into database:
  • Load raw csv file into Pandas DataFrames
  • Extracts latitude and longitude from the column ‘WKT’ and adds these to the DataFrame as new columns.
  • Inserts the transformed data into the ‘TRAFFIC_CAM_COUNT’ table in batches (to optimize performance and memory usage).
  • The insertion query takes values for camera ID, latitude, longitude, camera road, count date, and count number from the DataFrame and inserts them into the database.
  1. Patch the data:
  • Load the database from disk into memory first for faster data manipulation.
  • Find out rows with missing data.
  • Find replacement values for missing data.
  • Rules for patching the data:
  1. Rule 1 — Same Date Next Year: The function first tries to find the count number from the same date but in the next year. If a valid (non-null) count number is found for this date, it is returned as the replacement.
  2. Rule 2 — Same Date Last Year: If Rule 1 fails, the function looks for a count number from the same date but in the previous year. A valid count number from this date is returned if found.
  3. Rule 3 & 4 — Next/Last Week Same Weekday: If both previous rules fail, the function searches for a count number from the same weekday but a week before or after the given date. This is done by adding or subtracting 7 days to/from the given date.
  4. Further Search: If none of the above rules yield a valid replacement value, the function extends its search to the days immediately surrounding the given date (1 to 5 days before and after). This is a broader attempt to find a suitable replacement value.
  • Save the database back onto the disk.
  • Save the database back into the csv file.

Results of the data processing:

  1. After importing the CSV file into our database, we discovered that all 308 traffic sensor locations had some instances of missing data. A detailed analysis revealed that while a few locations exhibited high rates of missing data (around 80% of the dates), the majority of sensor locations had missing data for less than 10% of the dates.
  2. As we progressed with data processing, we found that 32 out of the 308 locations continued to have substantial missing data even after the patching process described above. To ensure the reliability and accuracy of our prediction model, we decided to exclude these 32 locations from our training dataset. Consequently, our final dataset for training the prediction model comprises data from 276 sensor locations, having undergone thorough data cleaning and processing.

Codes for data processing can be seen in the Github repo.

To construct our dataset, we pair input-output tuples (x, y), where ‘x’ represents a sequence of traffic data for 276 nodes spanning 50 consecutive days, and ‘y’ corresponds to the traffic data for the subsequent 2 days, serving as the prediction target. This structure is designed to enable the model to learn from historical traffic patterns over a 50-day period to predict traffic for the following 2 days.

The dataset is formed by sliding this 50-day window one day forward at a time, ensuring a continuous and overlapping sequence of data. When we sample a batch of size 32 from this dataset, we effectively select 32 distinct 50-day sequences from varying starting points across the timeline for all 276 nodes. Consequently, each batch is a three-dimensional array with the shape [32, 276, 50], where 32 is the number of different time windows in the batch, 276 is the number of nodes, and 50 represents the number of days in each window.

This methodical approach in dataset construction allows for a comprehensive representation of the temporal dynamics in the traffic data, crucial for training models to accurately predict future traffic conditions.

Exploratory Spatial Data Analysis (ESDA)

In this section, we are exploring the dataset from Statistics Canada [4]. Two most important insights we gained during the analysis are: Density of available traffic camera locations is a lot higher in downtown Toronto. Most of the traffic sensor are co-existed with traffic light locations.

To visualize traffic sensors data from the city of toronto, we have to obtain and parse the data as a csv format, this is a section of the data available on the city data portal:

Figure.1 Raw data from Traffic Volumes

Locations of available traffic cameras

m = folium.Map(location=[43.70, -79.42], zoom_start=11)
for index, row in data.iterrows():
color = 'red'
folium.Circle(
location=row['coords'],
radius=10,
color=color,
fill=True,
fill_color=color
).add_to(m)
m
Figure.2: Locations of traffic cameras

Counts of available traffic cameras

import folium
from folium.plugins import MarkerCluster
import pandas as pd
import re


# parse_point is a function to parse POINT strings
def parse_point(point_str):
match = re.match(r'POINT \(([^ ]+) ([^ ]+)\)', point_str)
if match:
return float(match.group(2)), float(match.group(1))
return None


# Applying the parse_point function to get coordinates
data['coords'] = data['WKT'].apply(parse_point)
# Drop rows with null coordinates
data = data.dropna(subset=['coords'])


# Create a map centered around Toronto
m = folium.Map(location=[43.70, -79.42], zoom_start=11)


# Initialize MarkerCluster
marker_cluster = MarkerCluster().add_to(m)


# Add markers to the MarkerCluster
for index, row in data.iterrows():
folium.Marker(
location=row['coords'],
# Use a default value if 'location' does not exist
tooltip=row.get('location', 'No Description')
).add_to(marker_cluster)


# Display the map
m
Figure.3: Traffic Cameras Count in City of Toronto

Density of available traffic cameras

toronto_data = data[data['coords'].apply(lambda x: within_bbox(x, bbox))]


# Prepare data for the heatmap (only latitude and longitude are needed)
toronto_heatmap_data = [coords for coords in toronto_data['coords']]


# Create a map centered around Toronto
m = folium.Map(location=toronto_center, zoom_start=12)


heatmap_gradient = {
0.0: 'yellow', # Start with yellow
0.5: 'orange', # Transition quickly to orange
1.0: 'red' # And predominantly red
}
# Add the heatmap to the map. We can adjust the radius and blur to get the desired visual effect.
HeatMap(toronto_heatmap_data, radius=15, blur=25, gradient=heatmap_gradient).add_to(m)


# Display the map in Jupyter Notebook or Google Colab
m
Figure.4: Density of traffic cameras in city of Toronto

Problem Formulation and Modeling

In our endeavor to optimize traffic sensor placement, we employed a strategy centered on identifying the most strategically valuable locations, specifically those with the potential to exert a substantial influence on traffic flow prediction. This approach was predicated on the utilization of the Traffic Flow Dataset from Statistics Canada [4], which encompasses traffic count data from multiple cities, including Toronto. The inherent spatial and temporal dynamics of traffic patterns necessitated the adoption of a modeling approach that could adequately capture these dimensions.

The spatial component of traffic activity acknowledges the premise that traffic patterns in proximate areas exhibit a degree of correlation, with the similarity tending to be more pronounced the closer the two points are. This spatial correlation is aptly captured by graph neural networks (GNNs), making them a fitting choice for modeling such spatial dependencies. Concurrently, the temporal aspect of traffic activity cannot be overlooked, as present and past traffic conditions are often indicative of future patterns. Therefore, a model that comprehensively considers both spatial and temporal elements is imperative.

Spatio-temporal graph attention networks (STGATs) [5] emerge as a potent solution in this context. These networks amalgamate the strengths of GNNs with attention mechanisms, thus facilitating a more focused analysis of pertinent data points within the extensive dataset. This methodology’s effectiveness in traffic prediction has been previously substantiated, as evidenced by its application in a study focused on Los Angeles (referenced as [3] in our work).

Our initial modeling attempt incorporated transformer encoders to extract salient temporal features from the traffic data. This was followed by a spatial modeling phase utilizing a graph attention layer. However, this initial model exhibited suboptimal performance, with prediction accuracy deviating by approximately 50%.

In pursuit of enhanced accuracy, we adapted an implementation from existing literature [3], specifically tailored to suit our dataset and the unique challenges of our problem space. This involved a recalibration of certain hyperparameters, including the adoption of 16 attention heads within the GAT framework. This refinement yielded a marked improvement in model performance, evidenced by a substantial reduction in validation loss compared to the initial model (0.16 versus 1.0, respectively).

To identify the most influential nodes within the network, which is crucial for effective sensor placement, we capitalized on the attention mechanism inherent in GATs. In GATs, the attention weights assigned to each node’s neighbors serve as indicators of their relative importance or influence. These weights guide the network in prioritizing information from specific neighbors deemed more relevant for the task at hand.

Our methodology involved processing an array of time windows through the model, each time extracting and averaging the attention weights across all attention heads (a total of 16 heads, as mentioned). It is pertinent to note that each attention weight corresponds to an edge within the graph. To ascertain the most influential nodes, we aggregated the attention weights of the edges connected to each node. These aggregated sums provided a metric for evaluating the influence exerted by individual nodes within the network.

Thus, by examining these aggregated attention weights, we were able to draw insights into the nodes that the model deemed most informative for its predictions, thereby guiding our decisions regarding the optimal locations for traffic sensor deployment. This methodology, rooted in the principles of graph attention networks, underscores the importance of both spatial and temporal factors in traffic prediction and sensor placement optimization.

The result of top 20 most optimal places for traffic sensors are showed in the figure below

Result of top 20 most optimal places for traffic sensors

Training of the traffic prediction model:

In our model’s training process, we meticulously tuned several hyperparameters to optimize performance, balancing the computational efficiency and predictive accuracy. We settled on a batch size of 32, which provided a good compromise between the speed of training and the stability of the gradient updates. Our choice of a window size of 50 allowed the model to capture the necessary temporal context, essential for the sequential nature of our data. The prediction size was set to 2, tailored to our model’s output specification. We employed 16 attention heads in our transformer-based architecture, enabling the model to simultaneously focus on different segments of the input data, thereby enhancing its ability to learn complex patterns. To mitigate overfitting, we introduced a dropout rate of 0.3, which proved effective during validation. The two temporal parameters — 32 and 128 — were fine-tuned to optimize the model’s handling of temporal information. These hyperparameters were carefully chosen through an iterative process of grid and random search, underpinned by empirical evaluation. Our efforts culminated in achieving a minimum validation loss of 0.1626.

Validation Loss with Different Parameters

A comparison with a real-world baseline

By now, we successfully identified the most weighted 20 locations in our model. In order to benchmark our methodology, we need to find some real-world data to compare with.

Dataset: Traffic Volumes at Intersections for All Modes from city of Toronto

The dataset from The City of Toronto’s Transportation Services Division collects traffic volume data across the city. In 2022, an average of 20.05 locations were counted each day. We aggregated this data and identified the top 100 locations where the most traffic data were collected by the division. We assume that the greater volume of statistical data from certain locations indicates their increased value to the city’s traffic analysis and planning efforts.

In the figure below, blue dots are the top 20 locations that our model identified as the most important, and the red dots are the top 100 locations that are most important to the Transportation Services Division . 5 out of 20 locations we predict overlap with the top list from the city. While the other locations from our prediction are in close distinct with the city one (Except for 2 locations southwest of the map, near the Pearson Airport).

Benchmark: Blue dots are from our model, Red dots are from City of Toronto

Reference

  1. R. Semaan, Optimal sensor placement using machine learning, Computers & Fluids, Volume 159, 2017, Pages 167–176, ISSN 0045–7930
  2. N. Mehr and R. Horowitz, “A Submodular Approach for Optimal Sensor Placement in Traffic Networks,” 2018 Annual American Control Conference (ACC), Milwaukee, WI, USA, 2018, pp. 6353–6358, doi: 10.23919/ACC.2018.8431678.
  3. Predicting Los Angeles Traffic with Graph Neural Networks: https://medium.com/stanford-cs224w/predicting-los-angeles-traffic-with-graph-neural-networks-52652bc643b1
  4. Traffic Flow Dataset from Statistics Canada: https://www150.statcan.gc.ca/n1/pub/71-607-x/71-607-x2022018-eng.htm
  5. Y. Huang, H. Bi, Z. Li, T. Mao, and Z. Wang, “STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1406–1414, Apr. 2021, doi: 10.1109/TPAMI.2020.3007032.
  6. Martin Rodriguez Vega. Optimal Sensor Placement And Density Estimation In Large-Scale Traffic Networks. Automatic. Université Grenoble Alpes [2020-..], 2021. English.
  7. Observability and Sensor Placement Problem on Highway Segments: A Traffic Dynamics-Based Approach
  8. S. Contreras, P. Kachroo and S. Agarwal, “Observability and Sensor Placement Problem on Highway Segments: A Traffic Dynamics-Based Approach,” in IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 3, pp. 848–858, March 2016, doi: 10.1109/TITS.2015.2491282.
  9. Optimizing the number and locations of freeway roadside equipment units for travel time estimation in a connected vehicle environment Arash Olia, Hossam Abdelgawad , Baher Abdulhai & Saiedeh N. Razavi Pages 296–309 | Received 14 Jul 2015, Accepted 12 May 2017
  10. Mirhoseini, A., Pham, H., Le, Q.V., Steiner, B., Larsen, R., Zhou, Y., Kumar, N., Norouzi, M., Bengio, S. & Dean, J.. (2017). Device
  11. Placement Optimization with Reinforcement Learning. 70:2430–2439 Available from https://proceedings.mlr.press/v70/mirhoseini17a.html.

--

--