vis.gl
Published in

vis.gl

Visualizing Unemployment for U.S. Counties with kepler.gl

Back in 2013, before I became interested in data visualization and pursuing it for my master’s thesis, I stumbled upon this d3 block by Mike Bostock in which he maps unemployment rates for each county in the U.S.. The simplicity of this visualization, as well as its ability to communicate detailed information, sparked my interest. Five years later, I recreated the same map in kepler.gl, without writing as much code.

Data

The data set that we used for this project was generated from the following sources:

Processing

Now that we obtained both the shapefiles and unemployment rates for all U.S. counties, we cleaned the data, combined it, and put it in a format that is consumable by kepler.gl, GeoJSON.

First, we converted the Census shapefiles to a GeoJSON file. The easiest way to do this was to drop the Census zip file we previously downloaded into MapShaper. Then, from the top right corner, we clicked export and then selected GeoJSON.

For those who are more inclined to using command line tools (like myself), ogr2ogr is a great tool for converting between various geometry file formats. You can do the conversion through the following command:

ogr2ogr -f GeoJSON -t_srs crs:84 counties.geojson cb_2017_us_county_20m/cb_2017_us_county_20m.shp

Second, to make it easier to parse and join the labor force data set with the geometry above, we needed to convert it from XLS (Microsoft Excel file) to a CSV (comma separated values text file). This is easily doable through Excel or Google Sheets by simply opening the file in either, removing all unnecessary columns and headers, and then saving the sheet as CSV. You can also do it through the command line by using the in2csv utility from csvkit.

The CSV should look something like this:

Skeleton of the labor force data set per county

For the last step, we wrote a small program to iterate through the counties in the shapefile we generated in the first step, and added the labor force information from the CSV we generated in the second step. The way we joined the data was very simple because both data sets had the following identifiers for each county:

  • statefips or STATEFP: federal code for the state
  • countyfips or COUNTYFP: the federal code for the county within each state

The following Python script took the labor-force.csv data file and the counties.geojson geometry file and produced a geometry file that contains both the shape of each county and its corresponding labor force information.

Python script to join the labor force data with the geometry data for each county

Visualization

Now, we were finally ready for the easiest and most fulfilling step of the process: visualizing the data set. We took the generated GeoJSON file, dropped into Kepler.gl and … voila! We had a map!

The resulting map after dropping the counties_unemployment.geojson file into kepler.gl

But, we were not quite there yet; we needed to disable Polygon Stroke and enable Polygon Fill.

Map after disabling Polygon Stroke and Enabling Polygon fill

Then, we set the fill color to be based on the unemployment_rate property. Now, we had a beautiful visualization!

Applying a color scale for the polygon fill based on the unemployment_rate property for each feature

By default, kepler.gl uses a quantile color scale. This scale takes the domain of our data sets (unemployment rates) and maps it to a discrete set of colors. The way a quantile color scale does it, however, is by evenly picking the thresholds such that there is an equal number of counties for any given color in the color scale. This scale is extremely useful for highlighting relative difference in values.

County unemployment rates with the polygon fill using a quantile color scale

What if we try using a quantize color scale instead? The result is dramatically different! This is because a quantize color scale evenly picks the thresholds for the colors based on the minimum and maximum values of unemployment, and not the distribution of the actual values. This color scale is great for emphasizing absolute differences in unemployment values as opposed to relative differences. For instance, using a quantize color scale quickly highlights that Kusilvak county in Alaska has a very high unemployment rate compared to all other counties in the US.

County unemployment rates with the polygon fill using a quantize color scale

Another way to think about the difference between the two scales is whether you want to highlight outliers in your data set (by using a quantize color scale) or dampen their effect and see a global overview of the data set (by using a quantile color scale).

And that’s it! That’s how easy it is to visualize geospatial data in kepler.gl.

See the final result in kepler.gl, or download the GeoJSON file to play with it yourself!

--

--

--

Open-source, WebGL-powered visualization frameworks

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Wesam Manassra

Wesam Manassra

Data Visualization Engineer @UberEng.

More from Medium

Analysis and Visualization using KQL and Azure Dashboards

Day 35: Box & Whisker Plots for Spotify Audio Features

Project 4: Data Visualization

Analyzing Access to Electricity Through Data Design Process