Visualizing Unemployment for U.S. Counties with kepler.gl
Back in 2013, before I became interested in data visualization and pursuing it for my master’s thesis, I stumbled upon this d3 block by Mike Bostock in which he maps unemployment rates for each county in the U.S.. The simplicity of this visualization, as well as its ability to communicate detailed information, sparked my interest. Five years later, I recreated the same map in kepler.gl, without writing as much code.
The data set that we used for this project was generated from the following sources:
- United States Census Bureau: this is where got the shapefiles (geometry files) of all U.S. counties. More specifically, we used the following resolution: cb_2017_us_county_20m.zip.
- Bureau of Labor Statistics: this is where we got the 2017 labor force data for each county, which includes unemployment rates. We used the 2017 XLS file for this visualization.
Now that we obtained both the shapefiles and unemployment rates for all U.S. counties, we cleaned the data, combined it, and put it in a format that is consumable by kepler.gl, GeoJSON.
First, we converted the Census shapefiles to a GeoJSON file. The easiest way to do this was to drop the Census zip file we previously downloaded into MapShaper. Then, from the top right corner, we clicked export and then selected GeoJSON.
For those who are more inclined to using command line tools (like myself), ogr2ogr is a great tool for converting between various geometry file formats. You can do the conversion through the following command:
ogr2ogr -f GeoJSON -t_srs crs:84 counties.geojson cb_2017_us_county_20m/cb_2017_us_county_20m.shp
Second, to make it easier to parse and join the labor force data set with the geometry above, we needed to convert it from XLS (Microsoft Excel file) to a CSV (comma separated values text file). This is easily doable through Excel or Google Sheets by simply opening the file in either, removing all unnecessary columns and headers, and then saving the sheet as CSV. You can also do it through the command line by using the in2csv utility from csvkit.
The CSV should look something like this:
For the last step, we wrote a small program to iterate through the counties in the shapefile we generated in the first step, and added the labor force information from the CSV we generated in the second step. The way we joined the data was very simple because both data sets had the following identifiers for each county:
- statefips or STATEFP: federal code for the state
- countyfips or COUNTYFP: the federal code for the county within each state
The following Python script took the labor-force.csv data file and the counties.geojson geometry file and produced a geometry file that contains both the shape of each county and its corresponding labor force information.
Now, we were finally ready for the easiest and most fulfilling step of the process: visualizing the data set. We took the generated GeoJSON file, dropped into Kepler.gl and … voila! We had a map!
But, we were not quite there yet; we needed to disable Polygon Stroke and enable Polygon Fill.
Then, we set the fill color to be based on the unemployment_rate property. Now, we had a beautiful visualization!
By default, kepler.gl uses a quantile color scale. This scale takes the domain of our data sets (unemployment rates) and maps it to a discrete set of colors. The way a quantile color scale does it, however, is by evenly picking the thresholds such that there is an equal number of counties for any given color in the color scale. This scale is extremely useful for highlighting relative difference in values.
What if we try using a quantize color scale instead? The result is dramatically different! This is because a quantize color scale evenly picks the thresholds for the colors based on the minimum and maximum values of unemployment, and not the distribution of the actual values. This color scale is great for emphasizing absolute differences in unemployment values as opposed to relative differences. For instance, using a quantize color scale quickly highlights that Kusilvak county in Alaska has a very high unemployment rate compared to all other counties in the US.
Another way to think about the difference between the two scales is whether you want to highlight outliers in your data set (by using a quantize color scale) or dampen their effect and see a global overview of the data set (by using a quantile color scale).
And that’s it! That’s how easy it is to visualize geospatial data in kepler.gl.
See the final result in kepler.gl, or download the GeoJSON file to play with it yourself!