Synthetic Populations & The Future of Transport Modeling

Published in

Urban AI

6 min readMar 21, 2023

By Sarah Popelka

Future of Urban AI #5 — Synthetic Populations & The Future of Transport Modeling with Arthur Getman

As with most urban systems, transportation networks can be elaborate and vast. Despite this complexity, urban planners strive to ensure that transport and mobility systems serve people efficiently by continuously making modifications as needed. Thus, they have sought ways of boiling down the multifaceted nature of transportation systems, to better understand the complex underlying processes and dynamics. Various techniques of transportation modeling (i.e. abstracting the particularities of transportation systems to estimate and build representations of the key components of transportation systems) help serve this purpose. Using such tools allows planners to answer questions related to how, where, and why people travel, in order to better understand the existing system and plan for future changes. Due to certain restrictions associated with traditional transportation modeling techniques (specifically a subset of models called “trip-based models”) and associated input datasets, they often produce results that can be out of date, and which provide a fairly generalized view of mobility dynamics. However, an increase in the availability, volume, and coverage of novel urban data sources has allowed a new form of transportation modeling to arise: the “activity-based model” (ABM).

As he describes in the fifth episode of “The Future of Urban AI,” Arthur Getman, a Senior Solutions Engineer at Replica, is using big data and artificial intelligence to drive the future of transportation modeling in the United States. Replica seeks to improve the field in two primary ways: by enhancing analyses by using more current data and by allowing planners to disaggregate data and follow the journeys of specific groups of people. American transportation models have typically made heavy use of US Census Bureau data and related derived products, as well as commuter surveys, all of which have very infrequent update rates. Replica, on the other hand, uses proprietary data sources covering mobile phone location data, connected vehicle data, point of interest data, consumer and resident data, economic activity data, and traffic count data, most of which can be obtained in near real time. Replica cleans and combines these data sources with publicly available data, such as OpenStreetMaps data and GTFS feeds to round out information about the built environment. In using these sources, Replica has the ability to perform highly granular and up-to-date analyses.

Replica has designed a number of artificial intelligence algorithms, in order to improve the utility of the data they obtain. First, they synthesize populations from demographic data, using techniques such as generative models to estimate household relationships and persona trainers to define typical work, home, and school locations for each household. To simulate the built environment, Replica uses OSM data to inform a multimodal routing engine. Having generated modeled population and environmental data as inputs to the ABM, Replica uses three types of machine learning models to predict the synthetic population’s travel through the constructed space: an activity sequence model to simulate an agent’s choice of activity, a location choice model to simulate which places each agent might travel to for each activity, and a mode choice model to represent which transportation mode they might select for their journey. By contrast to this high level of specificity, traditional models usually do not have sufficient data to estimate trip information beyond rough zonal origin-destination linkages, as well as generalized demographic information. The kind of specificity that Replica’s approach provides has led to a number of use cases in the United States, particularly with regard to incorporating an equity lens in transportation planning.

Getman and his team have utilized this sophisticated activity-based modeling approach to achieve a number of positive outcomes. In Culver City, California, the city wanted to embark on a Vision Zero program to reduce traffic fatalities but had limited data and staff resources. Typically cities utilize data on traffic crashes to come up with High Injury Networks. In smaller cities this data is more difficult to come by. With a lack of abundant crash data, Replica performed an analysis finding the highest concentrations of vulnerable road users, including pedestrian and bicyclists of specific ages. This data was paired with another data set — driver behavior data from devices inside of vehicles. Corridors with a high concentration of driver behavior events and vulnerable mode activity on a per mile basis were identified as the city’s High Conflict Network. Culver City managers can now use this tool to identify the highest concentrations of potential conflicts to prioritize implementing interventions on and messaging their urgency to the communities that stand to benefit from them.

Replica is also helping transit agencies adopt much faster to a rapidly changing ridership landscape after the onset of the COVID-19 pandemic. In New York City, the MTA used Replica data to understand and prioritize the most important trips; those of essential workers. The agency planned when service would close for cleaning of train cars during the peak of COVID in such a way that would not impact work commutes of essential workers, in an effort to avoid impacting these most important trips. The agency continues to use Replica data, most recently in its Extending Transit’s Reach report, an action plan to promote active mode and micromobility connections to transit facilities. The report relies on Replica data for pairing the demand for bicycle facilities around train stations along with the priority of investment at those facilities based on an equity index. The granular level of mode-specific trips that Replica data provides illuminates where residents and tourists are most dependent on, and in need of, facilities like bicycle lanes and bicycle parking in conjunction with their train trips. It can also be used to see where those in need of facilities have also been historically disenfranchised.

With the novelty introduced by urban big data sources, activity-based modeling represents a future of urban artificial intelligence not only in its innovative approach, but also in its augmented capabilities. Since the models simulate population characteristics and preferences, as well as environmental features, one can modify any of those input values to predict corresponding outcomes. As such, ABM lends itself to scenario modeling applications, such as estimating citywide transportation behavior shifts that might arise as a result of demographic change or altered employment dynamics. ABM can also allow planners to test modifications to policies or the built environment, giving them a decent impression of how those changes might affect traffic patterns within the city. ABM can thus help shape the on-the-ground future of transportation, in addition to itself representing a technological future. Getman does emphasize, however, the utility of combining traditional data collection methods with synthesized data, as sources of ground-truth information. Given the relatively high fidelity and precision of commuter surveys, they represent a source of accuracy that can be used to calibrate modeled data. However, AI methods, like those that Getman presents, have great utility in filling any gaps in sampled data, and have indeed led to more tailored transportation system interventions.

About the Author

Sarah Popelka is the Head of Educational Programs at Urban AI and co-authored the Urban AI Guide. She loves teaching and has taught programming and data visualization courses at UCLA and George Washington University.

Synthetic Populations & The Future of Transport Modeling

About the Author

Written by Urban AI