Command-Line Cartography, Part 4

A tour of d3-geo’s new command-line interface.

[This is Part 4 of a tutorial on making thematic maps from the command line using d3-geo, TopoJSON and ndjson-cli. Read Part 3 here.]

The word choropleth comes from the Greek khôros meaning “area or region” and plêthos meaning “a great number”. To construct one, we apply a color encoding of the population data to the geometry we prepared previously. (Be careful you don’t accidentally summon the apocryphal chloropleth, which is a great number of greens.)

As in Part 2, a reasonable starting point is a sequential scale with a perceptually-motivated color scheme such as Viridis. However, now we will apply ndjson-map to the feature collection as a single entity, rather than evaluating the expression independently for each feature; this allows global operations, such as computing the maximum density of any tract.

Use topo2geo to extract the simplified tracts from the topology, pipe to ndjson-map to assign the fill property for each tract, pipe to ndjson-split to break the collection into features, and lastly pipe to geo2svg:

topo2geo tracts=- \
< ca-topo.json \
| ndjson-map -r d3 'z = d3.scaleSequential(d3.interpolateViridis).domain([0, 4000]), d.features.forEach(f => f.properties.fill = z(f.properties.density)), d' \
| ndjson-split 'd.features' \
| geo2svg -n --stroke none -p 1 -w 960 -h 960 \
> ca-tracts-color.svg

(If you prefer, you can run these as separate commands and save the intermediate outputs to individual files. I’ve grown weary of thinking up unique file names, so I’ll use pipes.)

ca-tracts-color.svg

As before, this is a suboptimal visual encoding because of the nature of census tracts. 😩 Per the Census Bureau:

Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. A census tract usually covers a contiguous area; however, the spatial size of census tracts varies widely depending on the density of settlement.

The strong negative correlation between land area and population density is readily apparent in a scatterplot of California census tracts:

Census tract land area has a strong negative correlation with estimated population.

As a result, most of California is dark purple, dense urban areas are the brightest yellow, and very little exists in-between. You might think to expand the domain of the color scale above 4,000 people per square mile, but this doesn’t help; the problem is that the transition from low-density to high-density happens too quickly.

We need a non-linear transform to distribute the colors more equitably, making low-density areas brighter and high-density areas darker. We can implement an exponential transform by passing the square root of the population density to the color scale instead, and adjusting the domain accordingly:

topo2geo tracts=- \
< ca-topo.json \
| ndjson-map -r d3 'z = d3.scaleSequential(d3.interpolateViridis).domain([0, 100]), d.features.forEach(f => f.properties.fill = z(Math.sqrt(f.properties.density))), d' \
| ndjson-split 'd.features' \
| geo2svg -n --stroke none -p 1 -w 960 -h 960 \
> ca-tracts-sqrt.svg
ca-tracts-sqrt.svg

Better! 👏 For comparison, here’s a log transform:

topo2geo tracts=- \
< ca-topo.json \
| ndjson-map -r d3 'z = d3.scaleLog().domain(d3.extent(d.features.filter(f => f.properties.density), f => f.properties.density)).interpolate(() => d3.interpolateViridis), d.features.forEach(f => f.properties.fill = z(f.properties.density)), d' \
| ndjson-split 'd.features' \
| geo2svg -n --stroke none -p 1 -w 960 -h 960 \
> ca-tracts-log.svg
ca-tracts-log.svg

What if we visualize the density’s p-quantile instead of its absolute value?

topo2geo tracts=- \
< ca-topo.json \
| ndjson-map -r d3 'z = d3.scaleQuantile().domain(d.features.map(f => f.properties.density)).range(d3.quantize(d3.interpolateViridis, 256)), d.features.forEach(f => f.properties.fill = z(f.properties.density)), d' \
| ndjson-split 'd.features' \
| geo2svg -n --stroke none -p 1 -w 960 -h 960 \
> ca-tracts-quantile.svg
ca-tracts-quantile.svg

This encoding shows variation even within the most-dense urban areas. However, it lacks the mathematical simplicity of the square root transform, so it’s harder to reason what the colors mean beyond “more” or “less”.

Yet another approach is to apply a discrete color scheme instead of a continuous one, where contiguous ranges of values share colors. This makes it easier for readers to match the color of an area in the map with a specific value using a key. Also, it means we can manually (or algorithmically) pick suitable thresholds (breaks) in the color scale.

Cynthia A. Brewer’s ColorBrewer provides a fantastic set of well-designed discrete color schemes. The d3-scale-chromatic package provides a convenient API for ColorBrewer in JavaScript. To install:

npm install -g d3-scale-chromatic

Then to implement a threshold scale with the OrRd color scheme:

topo2geo tracts=- \
< ca-topo.json \
| ndjson-map -r d3 -r d3=d3-scale-chromatic 'z = d3.scaleThreshold().domain([1, 10, 50, 200, 500, 1000, 2000, 4000]).range(d3.schemeOrRd[9]), d.features.forEach(f => f.properties.fill = z(f.properties.density)), d' \
| ndjson-split 'd.features' \
| geo2svg -n --stroke none -p 1 -w 960 -h 960 \
> ca-tracts-threshold.svg
ca-tracts-threshold.svg

To add our county borders to the choropleth, simply pipe those to geo2svg, too. You can run multiple commands and pipe the combined output using semicolons and parentheses: (foo; bar; baz). (You can even pipe a loop!)

(topo2geo tracts=- \
< ca-topo.json \
| ndjson-map -r d3 -r d3=d3-scale-chromatic 'z = d3.scaleThreshold().domain([1, 10, 50, 200, 500, 1000, 2000, 4000]).range(d3.schemeOrRd[9]), d.features.forEach(f => f.properties.fill = z(f.properties.density)), d' \
| ndjson-split 'd.features'; \
topo2geo counties=- \
< ca-topo.json \
| ndjson-map 'd.properties = {"stroke": "#000", "stroke-opacity": 0.3}, d')\
| geo2svg -n --stroke none -p 1 -w 960 -h 960 \
> ca.svg
Source: American Community Survey, 2014 5-Year Estimate

Lastly, you should add a key, which tells the reader what the colors mean. A choropleth lacking a key may be pretty, but is rarely informative! 👀 My favorite style of threshold key is Ford Fessenden’s, which you can generate in a browser using D3, and then paste the result into your SVG. (If you prefer a different style, see Susie Lu’s d3-legend.)


Questions or comments? Reply below or on Twitter. Thank you for reading!