A screenshot of the Doppelgänger tool applied to a census tract in California.

A first step toward creating a digital planning laboratory is populating it

Introducing Doppelgänger, Model Lab’s open-source population synthesizer.

David Ory
David Ory
May 25, 2017 · 8 min read

Introducing Doppelgänger

A planner’s first stop in describing the existing conditions of a community is usually the Census Bureau. To protect the privacy of respondents, Census data is delivered at different geographies and across different periods of time. For example, the best estimate of the number of households in a community may be available for each Census block from the Decennial Census (last conducted in 2010), and the best estimate of household income may be the five-year rolling data product from the American Community Survey for each Census tract. Combining these disparate data sets to create a coherent and complete representation of what is happening in a community at any point in time is difficult. It’s a bit like trying to completely understand a subject from photos that are taken from different angles, at different points in time, from different distances. Further complicating the problem, urban planners like to use non-Census data sets, such as school quality, that may introduce yet another set of geographies (e.g., school districts).

Convex optimization

The first technical aspect of Doppelgänger that excites us is convex optimization. To introduce the idea, consider an urban planner tasked with describing the existing conditions of a community. The planner starts with the following data from the Census Bureau:

  • Age and Income distributions, for which the best source is the 2011 to 2015 American Community Survey (ACS) and the data is available at tract geographies; and
  • Household structures (i.e., relationships between parents and children), for which the best source is the 2011 to 2015 ACS and the data is available at Public Use Micro Area (PUMA) geographies.

Bayesian Networks

The second technical feature of Doppelgänger that excites us is Bayesian Networks (often referred to as Bayes Nets). Bayes Nets can be used in population synthesizers in two important ways. First, Bayes Nets act as a means of extracting useful relationships from one data set that can then be applied to other data sets. For example, consider a data set that, for a relatively small sample of households, contains information on each household’s number of people, income, and number of vehicles. We can train a Bayes Net on this data to understand the relationship between these three variables. The outcome can be illustrated in a directed graph that looks like this:

Moving forward

Before we can understand how transportation services, policies, or infrastructure impact a community, we must understand who lives in the community today, tomorrow, and 20 years from now. Once we understand that, we can begin to think about how and why community members move around to carry out activities in service to themselves and their families over a typical day or week — a next step we are working on now. At Model Lab, we strive to translate our understanding into simulations that help urban planners experiment and learn. Doppelgänger is a foundational step in our journey and we hope it’s the beginning of a strong relationship with the urban modeling community.

Sidewalk Talk

Where technologists and urbanists discuss the future of cities.

David Ory

Written by

David Ory

WSP consultant; formerly Sidewalk Labs; formerly Metro. Trans. Commission

Sidewalk Talk

Where technologists and urbanists discuss the future of cities.