PopulationSim & the Synthetic Commons

Lisa Zorn
zephyrfoundation
Published in
5 min readNov 15, 2021

Modern travel demand models operate on what we in the analytical urban planning space call “synthetic populations”. Merriam Webster’s fourth definition — one of Dave’s favorites — of “synthetic” is as follows: “devised, arranged, or fabricated for special situations to imitate or replace usual realities”. In our case, we are “fabricating” a population for the “special situation” of “replacing” the “usual reality” of a city without, say, a network of bicycle lanes for a city with a network of bike lanes. More specifically, population synthesizers create a list of households and the persons in each household that represent the actual people that live (or are expected to live in the future) in a city, region, or state. Travel models then act on the resulting lists to simulate travel-related behaviors and outcomes.

Population synthesizers have a rich and interesting history. The Federal Highway Administration’s TRANSIMS project kickstarted the interest in these techniques in the types of activity-based travel models used by most large regions today — see Beckman’s paper on the topic. As the interest and need for population synthesizers increased, academic research and functional software began to emerge. Recognizing the need for a mature population synthesizer, the Oregon Department of Transportation sponsored the development of PopulationSim and then transferred ownership to the ActivitySim consortium of agencies. Key features of PopulationSim include:

  • On-going financial support from a collection of public agencies coming together to fund open source modeling tools;
  • Many of the technical advances developed by the travel model community over the past two decades;
  • The same software architecture as ActivitySim; and,
  • A core infrastructure of modern Python libraries, including pandas.

Badging Details

Joel Freedman of RSG submitted the PopulationSim product for a Zephyr Software Badge. The balance of this post discusses the badging details and outcomes. The Software Badging process requires that Zephyr answer the following three questions about a software package:

  1. Is the software useful to the Zephyr Community?
  2. Does the software contribute to a common problem space or benchmark in a manner that encourages community progress?
  3. Is the software easy to use?

Useful

As noted above, nearly every large region in the United States uses an activity-based travel model and activity-based travel models take synthetic populations as input. There is therefore a strong need for a well-documented and easy to use population synthesizer that runs quickly and generates valid outcomes. In the following two sections, we’ll dive more deeply into these two aspects of PopulationSim.

Common Problem Space

The opportunity to review and badge PopulationSim motivated the Zephyr Software Badgers to create a common set of population synthesizer files that can be used by the community. As my colleagues and I at MTC count ourselves amongst the community with a need for a population synthesizer, we contributed our own input files to this end. These files live in Zephyr’s GitHub repository dedicated to software badging benchmarks.

Moving forward, we hope that others developing and testing population synthesizers will use these common files to demonstrate the key features of their tools. And we hope the common files do not remain static: our goal is to, over time, add other files that can be used by population synthesizers.

Easy to Use

For users of ActivitySim, the PopulationSim package is familiar and can be picked up quickly. For those more accustomed to object-oriented Python packages, more time will be needed with the documentation — which is excellent — prior to diving in.

To make PopulationSim work with the Zephyr San Francisco population synthesizer benchmark files, we created a few short support scripts in R. I documented the entire process in the following brief (😉) markdown which includes all the commands required to setup PopulationSim for this dataset, run it, and create a Tableau-based visualization of the validation.

Everything needed to setup, run and validate Zephyr’s population synthesis benchmark in PopulationSim

The resulting interactive validation shows reasonable results. In developing this example, we did find a few interesting things to note:

  1. In our example, we specified controls at the county and travel analysis zone (TAZ) level. However, PopulationSim only outputs validation data for the TAZ level and does not do so for the county level controls, because the county is a “meta geography” rather than a “seed geography”. The process outlined above therefore summarizes the results for county controls, which is necessary until this issue is resolved.
  2. The importance attribute in the control constraints isn’t relative. If you run PopulationSim with a single control constraint for number of total households having importance=1, it will validate poorly. If you make importance=1000, it will validate perfectly. We were confused about this aspect — and about how to choose importance weights more generally — so we contacted Binny M Paul, one of the authors of PopulationSim and the corresponding author for the 2018 TRB Paper about the software. He responded with a detailed explanation of the importance weights — it turns out that they are hyperparameters used in the optimization problem in which PopulationSim is seeking to maximize entropy to keep weights uniform while matching the marginal controls. The discussion was illuminating, and would make a great addition to the documentation! The TLDR recommendation for setting the importance weights is:

The general guidance is to set it to a minimum value (e.g., 1000 has typically worked well for multiple regions) to ensure decent performance across all controls. Once a reasonable level of performance has been achieved, the importance weights of select controls can be increased/decreased depending on which controls are more important for the application. The importance weight on the total number of households control is set very high to ensure that the right number of households are produced.

The PopulationSim package easily passes all three Zephyr requirements, and we are pleased to award it an official Zephyr software badge. If you have used PopulationSim in your work and you have thoughts to share on this method of evaluating the software, please let us know. Do you have another open source population synthesizer that you love? Submit it and we’ll take a look!

--

--