How to Create County Boundary Maps Only of Populated Areas

Use BigQuery GIS to trim county boundaries to include only pixels with population density > 10 people per sq km and use these boundaries in your plotting programs.

Lak Lakshmanan
Jul 2, 2020 · 5 min read

Recently, as pointed out by Abhishek Nagaraj on Twitter, the New York Times has started publishing dasymetric maps where they show Covid data but color each county based only the pixels within the county that are actually populated:

Compare that against their older maps:

And you can see how much better this is. The key thing to notice is that both of these are county-level data. But counties in the dasymetric map are colored only by the pixels within them that have > 10 people per square kilometer — sparsely populated parts of the county, such as forests, are ignored and this makes the resulting visualization more representative of things that affect people (and not land). In both maps, every county is represented.

It would be so much easier if there were a readily accessible dataset that gives you county boundaries that include only the populated areas. In this article, I will show you to create such a dataset in BigQuery and provide instructions on how to extract that data into a CSV file with GeoJSON or WKT to plot it in other packages.

BigQuery has a generous free tier (1 TB/month of querying) and each of the queries in this article processes 2 to 5 GB. They take 15 to 45 seconds to run and even without the free tier, would cost you about 1c each (cloud computing is cheap!). Given that, I suggest that you take these queries and adapt them for your needs rather than simply download my result.

Population Density Raster Data

WITH populated_areas AS (
SELECT bounds
FROM `ai-analytics-solutions`.publicdata.popdensity_sedac_rectangles
WHERE
year = 2020 AND
tile = 'gpw_v4_population_density_rev11_2020_30_sec_1.asc' AND
population_density > 10 -- persons per square kilometer
)
SELECT
state_fips_code, county_fips_code,
ANY_VALUE(county_geom) AS tract_geom,
ST_UNION(ARRAY_AGG(bounds)) AS populated_tract_geom
FROM
`bigquery-public-data`.geo_us_boundaries.counties,
populated_areas
WHERE
state_fips_code = '53' AND
ST_CONTAINS(county_geom, bounds)
GROUP BY 1, 2

Notice that I am passing in a tile. You want to look at the tile boundaries for details, but tile 1 covers the Western US (west, approximately of Ohio) and tile 2 covers the Eastern US. To query the entire country or query states in the transition area, specify tile 1 (_sec_1.asc) and tile 2 (_sec_2.asc).

The resulting table looks like this:

The key things to notice:

  • You get 39 results, one for each county in Washington
  • While the original tract_geom is a POLYGON, the populated_tract_geom is a MULTI_POLYGON since there are multiple pixels in each county

The result, visualized in BigQuery GeoViz:

Let’s look at an individual county (Asotin county in the Southeast corner of the state). Here’s the original boundary:

And here’s the populated areas — if we are drawing Asotin County, these are the only pixels we’d color:

Full dataset (for whole US)

CREATE OR REPLACE TABLE publicdata.us_tracts AS
WITH populated_areas AS (
SELECT bounds
FROM `ai-analytics-solutions`.publicdata.popdensity_sedac_rectangles
WHERE
year = 2020 AND
(tile = 'gpw_v4_population_density_rev11_2020_30_sec_1.asc' OR
tile = 'gpw_v4_population_density_rev11_2020_30_sec_2.asc'
)
AND
population_density > 10 -- persons per square kilometer
)
SELECT
state_fips_code, county_fips_code,
ANY_VALUE(county_geom) AS tract_geom,
ST_UNION(ARRAY_AGG(bounds)) AS populated_tract_geom
FROM
`bigquery-public-data`.geo_us_boundaries.counties,
populated_areas
WHERE
ST_CONTAINS(county_geom, bounds)
GROUP BY 1, 2

This creates a table with 3222 rows, one for each county.

I’ve made the dataset public. You can use it in BigQuery by directly querying or joining against this table:

ai-analytics-solutions.publicdata.us_tracts

For your convenience, that table also has the state code, county name, and population. This was the query I used:

Query to create county boundaries where the boundaries of populated pixels within each county.

Example of using the county boundaries to plot weather-related fatalities

WITH historical_weather AS (
SELECT
CONCAT(LPAD(state_fips_code, 2, '0'), LPAD(cz_fips_code, 3, '0')) AS county_fips_code,
SUM( injuries_direct + deaths_direct + deaths_indirect + injuries_indirect ) AS num_impacted
FROM `bigquery-public-data.noaa_historic_severe_storms.storms_201*`
GROUP BY 1
)
SELECT
tract_geom, populated_tract_geom, (num_impacted/population) AS impacted
FROM
historical_weather
JOIN
`ai-analytics-solutions.publicdata.us_tracts`
USING(county_fips_code)

In the query above, the historical data is the number of people impacted in each county. We then join it against the populated us tracts data using the county code. For plotting, we need the populated tract geometry and the number of people impacted per capita (so we divide by population).

Plotting this using BigQuery GeoViz, we get:

and zooming in on the midwest:

Had we not done populated tracts, we would have gotten an misleading county map like this (showing you just California):

Exporting table AS CSV

SELECT
state_fips_code, county_fips_code,
ST_AsGeoJSON(populated_tract_geom) AS county_tracts
FROM `ai-analytics-solutions.publicdata.us_tracts`

And click on “Save Results” to save it to Google Sheets or CSV file or some other format (it’s 3300 rows and 32 MB):

Enjoy!

The Startup

Get smarter at building your thing. Join The Startup’s +792K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +792K followers.

Lak Lakshmanan

Written by

Data Analytics & AI @ Google Cloud

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +792K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store