The Startup
Published in

The Startup

How to Create County Boundary Maps Only of Populated Areas

Use BigQuery GIS to trim county boundaries to include only pixels with population density > 10 people per sq km and use these boundaries in your plotting programs.

Recently, as pointed out by Abhishek Nagaraj on Twitter, the New York Times has started publishing dasymetric maps where they show Covid data but color each county based only the pixels within the county that are actually populated:

Compare that against their older maps:

And you can see how much better this is. The key thing to notice is that both of these are county-level data. But counties in the dasymetric map are colored only by the pixels within them that have > 10 people per square kilometer — sparsely populated parts of the county, such as forests, are ignored and this makes the resulting visualization more representative of things that affect people (and not land). In both maps, every county is represented.

It would be so much easier if there were a readily accessible dataset that gives you county boundaries that include only the populated areas. In this article, I will show you to create such a dataset in BigQuery and provide instructions on how to extract that data into a CSV file with GeoJSON or WKT to plot it in other packages.

BigQuery has a generous free tier (1 TB/month of querying) and each of the queries in this article processes 2 to 5 GB. They take 15 to 45 seconds to run and even without the free tier, would cost you about 1c each (cloud computing is cheap!). Given that, I suggest that you take these queries and adapt them for your needs rather than simply download my result.

Population Density Raster Data

Joining a vector dataset of county boundaries against a raster dataset of population density can be quite hard. Earlier, I have written about how you can efficiently query raster data in BigQuery by demonstrating on the global population density dataset published by Columbia University. Here, I am using that raster dataset (I’ve made it public) and joining it against US count-level data to trim the county boundaries in Washington State to just populated areas:

WITH populated_areas AS (
SELECT bounds
FROM `ai-analytics-solutions`.publicdata.popdensity_sedac_rectangles
WHERE
year = 2020 AND
tile = 'gpw_v4_population_density_rev11_2020_30_sec_1.asc' AND
population_density > 10 -- persons per square kilometer
)
SELECT
state_fips_code, county_fips_code,
ANY_VALUE(county_geom) AS tract_geom,
ST_UNION(ARRAY_AGG(bounds)) AS populated_tract_geom
FROM
`bigquery-public-data`.geo_us_boundaries.counties,
populated_areas
WHERE
state_fips_code = '53' AND
ST_CONTAINS(county_geom, bounds)
GROUP BY 1, 2

Notice that I am passing in a tile. You want to look at the tile boundaries for details, but tile 1 covers the Western US (west, approximately of Ohio) and tile 2 covers the Eastern US. To query the entire country or query states in the transition area, specify tile 1 (_sec_1.asc) and tile 2 (_sec_2.asc).

The resulting table looks like this:

The key things to notice:

  • You get 39 results, one for each county in Washington
  • While the original tract_geom is a POLYGON, the populated_tract_geom is a MULTI_POLYGON since there are multiple pixels in each county

The result, visualized in BigQuery GeoViz:

Let’s look at an individual county (Asotin county in the Southeast corner of the state). Here’s the original boundary:

And here’s the populated areas — if we are drawing Asotin County, these are the only pixels we’d color:

Full dataset (for whole US)

Let’s do this for the whole country, not just Washington, and send the output to a table for easy querying:

CREATE OR REPLACE TABLE publicdata.us_tracts AS
WITH populated_areas AS (
SELECT bounds
FROM `ai-analytics-solutions`.publicdata.popdensity_sedac_rectangles
WHERE
year = 2020 AND
(tile = 'gpw_v4_population_density_rev11_2020_30_sec_1.asc' OR
tile = 'gpw_v4_population_density_rev11_2020_30_sec_2.asc'
)
AND
population_density > 10 -- persons per square kilometer
)
SELECT
state_fips_code, county_fips_code,
ANY_VALUE(county_geom) AS tract_geom,
ST_UNION(ARRAY_AGG(bounds)) AS populated_tract_geom
FROM
`bigquery-public-data`.geo_us_boundaries.counties,
populated_areas
WHERE
ST_CONTAINS(county_geom, bounds)
GROUP BY 1, 2

This creates a table with 3222 rows, one for each county.

I’ve made the dataset public. You can use it in BigQuery by directly querying or joining against this table:

ai-analytics-solutions.publicdata.us_tracts

For your convenience, that table also has the state code, county name, and population. This was the query I used:

Query to create county boundaries where the boundaries of populated pixels within each county.

Example of using the county boundaries to plot weather-related fatalities

As an example of using the county boundaries, let’s plot the number of weather-related fatalities in the US in the 2010s:

WITH historical_weather AS (
SELECT
CONCAT(LPAD(state_fips_code, 2, '0'), LPAD(cz_fips_code, 3, '0')) AS county_fips_code,
SUM( injuries_direct + deaths_direct + deaths_indirect + injuries_indirect ) AS num_impacted
FROM `bigquery-public-data.noaa_historic_severe_storms.storms_201*`
GROUP BY 1
)
SELECT
tract_geom, populated_tract_geom, (num_impacted/population) AS impacted
FROM
historical_weather
JOIN
`ai-analytics-solutions.publicdata.us_tracts`
USING(county_fips_code)

In the query above, the historical data is the number of people impacted in each county. We then join it against the populated us tracts data using the county code. For plotting, we need the populated tract geometry and the number of people impacted per capita (so we divide by population).

Plotting this using BigQuery GeoViz, we get:

and zooming in on the midwest:

Had we not done populated tracts, we would have gotten an misleading county map like this (showing you just California):

Exporting table AS CSV

If you are not doing your GIS analysis in BigQuery, you might want to export the populated county tracts. You can do that by doing this in the UI:

SELECT
state_fips_code, county_fips_code,
ST_AsGeoJSON(populated_tract_geom) AS county_tracts
FROM `ai-analytics-solutions.publicdata.us_tracts`

And click on “Save Results” to save it to Google Sheets or CSV file or some other format (it’s 3300 rows and 32 MB):

Enjoy!

--

--

--

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +756K followers.

Recommended from Medium

Saving Money Using Data Science

Hippocratic Oath for Data Scientists

Why does variance matter? 🤔

Discretized Conical Emission in Multimode Optical Fibers

The most overlooked opportunity for Excel users is visual design

Songstats: New Homepage & Platform Pages

Songstats: New Homepage & Platform Pages

Global Happiness During Crisis

5 Project Ideas to Kick Start Your Data Science Career

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lak Lakshmanan

Lak Lakshmanan

Operating Executive at a technology investment firm; articles are personal observations and not investment advice.

More from Medium

How we are building Multi-Cloud Portable Workspaces at NetBook

Understand BigQuery Schema Auto-detection

Running dbt on Google Cloud’s Vertex AI Pipelines

Cloud Data Fusion: Update deployed pipelines through REST API