Published in

Deriving hospital travel times with population-weighted sampling

April 21, 2020 by Eric Buth
To explore more data on COVID-19, please go to

One of the early stories to emerge concerning COVID-19 was the vulnerability of certain communities in the U.S. caused by a lack of access to critical medical care typically provided in hospitals. Today, Navajo County, Arizona, home to three Native American reservations, the Navajo Nation, Hopi Indian Reservation, and Fort Apache Indian Reservation, is experiencing one of the country’s highest per capita cases of COVID-19 (435 cases per 100K people). What makes the county’s high rate especially alarming is the lack of points of medical care able to treat serious COVID-19 patients. To understand vulnerability through the lens of access to critical medical care, we generated a new feature, “median distance to the nearest hospital” for all counties in the U.S. Here we detail our methodology for generating this feature at the county level.

Navajo County COVID-19 Cases per 100k people with Median Distance to Nearby Hospitals (in seconds) April 21, 2020

Working effectively with geographic data from multiple sources often requires a strategy for translating between geographic units. This translation is a non-trivial technological challenge, but in the case of COVID-19 can prove important in answering simple but critical questions such as: are there enough medical resources available to serve the number of COVID-19 patients in a given area?

At Topos, we maintain a large amount of categorized information about the locations of businesses, public institutions, etc. — sometimes referred to as “points of interest” or “POI.” As the geographic resolution of POI data is effectively infinite (they are points in space), the core challenge is how to aggregate these points so they can be used in relation to features available at higher levels of granularity such as counties and states.

The most straightforward way to approach this aggregation is to simply count the number of points that fall within a larger geography, a strategy we take with other relevant POI such as housing units. However, there are cases where these numbers don’t fully capture the relationship that people have with the resources being counted.

The nearest hospital to Spray, Oregon, is in neighboring Morrow County, over an hour away.

What if the nearest hospital to a large part of a county’s population is actually in a neighboring county or several counties away? What if most of the grocery stores in a county are located far away from where the residents of the county live? What if a county represents the outer suburbs of a major city or is abutted by national park land? Simply counting points doesn’t sufficiently capture their accessibility, which is particularly important for critical resources like hospitals, urgent care centers, pharmacies, grocery stores or schools.

Rather than simply counting points within a region, we may want to have a sense of how easily those points can be accessed. To this end, we often look not only at geographic distance to these points but actually calculate how long it takes to reach these points via common modes of transportation (drive, walk, etc).

Subway time to selected points in NYC visualized on level 16 S2 cells.

In this project, we begin with the time it might take an ambulance to reach the nearest hospital that has in-patient services — that is, a hospital with bed count greater than zero.

In order to decide which of the thousands of hospital locations are the nearest to a given address, we use an S2-based geospatial index, which allows us to quickly search radiuses around those addresses to build candidate lists. This step is important because of the amount of time and resources it would otherwise take to evaluate the travel time to every hospital in the country. Once we have the reduced list of locations that are close — as the crow flies — we then need to determine which one is actually the quickest to drive to on available roads.

For the same reasons we needed hospital candidate lists, we now need a method for limiting the number of addresses we use as origins — the location point A from which to compute the path to point B using a routing API (, Google Maps, Mapbox, etc.). To accomplish this we construct a sample — a meaningful subset of addresses that roughly represents the entire county.

One way to build this sample would be to pick at random within a given county’s geographic boundaries. However, sampling in this manner risks significantly over-representing less densely populated areas. For example, in Deschutes County, Oregon, this approach results in selecting as many points from national forest land as from the city of Bend.

Random point sampling in Deschutes County, Oregon

The resulting travel times to nearby hospitals appear to be evenly distributed. Intiutively, this seems wrong: human geographic organization tends to concentrate both resources and population, and in Deschutes County at least 3 cities exist that should push the distribution away from this apparent randomness. If county residents are significantly more likely to live in a city with a nearby hospital, we’d expect that to be reflected by a concentration of values around a lower median travel time.

County, Tracts and Blockgroup boundaries of Deschutes County, Oregon

We adjust our sampling strategy to account for this issue by using the population counts of census block groups, which are significantly smaller than counties. We’re not producing final metrics at such a low level, but we can use the more granular population counts to weight our random sample of starting points. Imagine that for every person in a county we put one marble, labeled with the block group where that person lives, in a bucket. To get a population-weighted sample, we repeatedly pick a marble from the combined bucket — replacing it each time — and note the label.

The effect is that every person has an equal chance of being selected, even though the resulting block group counts are not themselves equal. Once we have constructed this list of block groups, we then pick a random address within their geographical bounds.

Population weighted point sampling in Deschutes County, Oregon

This sampling strategy now shows points clustered around three cities within Deschutes County, the population centers of Bend, Redmond, and Sisters — with some outliers along highway 97. The hospital travel time values now form something closer to a normal distribution, with a median around 13 minutes — a stark difference from the likely misleading 50 minutes of the previous example.

With our Median Distance to Nearby Hospitals metric in hand, we can now examine it in relation to the rapidly unfolding crisis of COVID-19. The visualization below highlights which counties have high per-capita COVID-19 infections with the lowest access to Hospitals (As of April 21, 2020)

To explore more data on Covid-19, please go to




Recommended from Medium

Storytelling and Data Visualization — Basics

[Research Report Vol.

Closing Finale of Developing Meaningful Indicator (DMI)

An app to correct for bias in mVAM results

Agile for Data Science

Course Review: ISYE 6501 — Introduction to Analytics Modeling

The latest listing of SCRT on Kraken is a good opportunity to stack up your SCRT bag and stake at…

Classification of flower types -Iris dataset: Beginner’s Level

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Transforming the way we understand cities with Artificial Intelligence | @topos_ai

More from Medium

Baum and Staddon: Conditioning As Positive Covariance

Feature Overview: Location-based Clustering — Dista Insight

COVIDaction Vaccine Data Co-Lab: Healthsites emergency health data validation workshop

Map of locations of health facilities with emergency health services in the district of Saint Louis, Senegal

Detailed stats-based item builds for Dota 2 (and stuff like that)