Command-Line Cartography, Part 3

A tour of d3-geo’s new command-line interface.

[This is Part 3 of a tutorial on making thematic maps from the command line using d3-geo, TopoJSON and ndjson-cli. Read Part 2 and Part 4 here.]

The GeoJSON feature collection of census tracts we constructed previously was 13.6M. That’s fine for local viewing, but a bit large for the web! Fortunately there are ways to shrink geometry without apparent loss of detail. We can:

  • Simplify (e.g., remove coordinates per Visvalingham).
  • Quantize (e.g., remove digits, say 224.3021507494117 to 224.3).
  • Compress (e.g., remove redundant geometry).

These are possible with GeoJSON, but we can do even better if we switch to a JSON dialect designed for efficient transport: TopoJSON. TopoJSON files are often 80% smaller than GeoJSON files, even without simplification. How is it so concise?

First, TopoJSON represents lines and polygons as sequences of arcs rather than sequences of coordinates. Contiguous polygons (census tracts, counties, states, etc.) have shared borders whose coordinates are duplicated in GeoJSON. With hierarchical geometry, such as counties that compose into states, there’s even more duplication! By representing lines and polygons as sequences of arcs, repeating an arc does not require repeating coordinates. (For more, see How to Infer Topology.)

A topology of the United States; dots indicate arc endpoints.

Second, TopoJSON can be quantized, where coordinates are represented as small integers instead of floating-point values with many decimal places. For example, a sequence of points:

[545.7796789342211, 348.96136952241613]
[545.9825061954095, 349.29419494812123]
[546.3281879653109, 349.53210438248560]
[546.3147336879572, 348.77969898749300]
[546.5844757927035, 348.76960903081610]
[546.5889751031176, 348.76842131978400]

Is first converted to integers by scaling, translating and rounding:

[  0, 403]
[231, 741]
[625, 982]
[610, 219]
[917, 208]
[922, 207]

And then delta-encoded such that each successive x- and y-value is relative to the previous one:

[  0, 403]
[231, 338]
[394, 241]
[-15,-763]
[307, -11]
[ 5, -1]

Quantization does lose information, but typically a small-scale map does not require the full precision of the original geometry. For fun, though, here’s what it looks like when you over-quantize TopoJSON:

geo2topo -q 1e2

Best of all, TopoJSON facilitates topology-preserving simplification: we can simplify geometry without detaching shared borders. To get started, install the TopoJSON CLI:

npm install -g topojson

Use geo2topo to convert to TopoJSON, reducing its size to 8.1M:

geo2topo -n \
tracts=ca-albers-density.ndjson \
> ca-tracts-topo.json

The slightly peculiar syntax, tracts=…, allows you to specify multiple named GeoJSON inputs, resulting in a topology with multiple named objects (or “layers”). Arcs can be shared across all objects in a topology.

Now to toposimplify, further reducing to 3.1M:

toposimplify -p 1 -f \
< ca-tracts-topo.json \
> ca-simple-topo.json

The -p 1 argument tells toposimplify to use a planar area threshold of one square pixel when implementing Visvalingham’s method; this is appropriate because we previously applied a conic equal-area projection. If simplifying before projecting, use -s and specify a minimum-area threshold in steradians instead. The -f says to remove small, detached rings—little islands, but not contiguous tracts—further reducing the output size.

Lastly to topoquantize and delta-encode, reducing to 1.6M:

topoquantize 1e5 \
< ca-simple-topo.json \
> ca-quantized-topo.json
ca-quantized-topo.json

As you can see, this is visually identical to the original, yet a tenth the size! Gzip (performed automatically by most servers) further reduces the transfer size to a svelte 390K.

Now suppose we want to overlay county borders on our choropleth map of census tracts. Most readers probably aren’t familiar with the geography of census tracts, so county outlines provide a helpful cue. (If we were making a national choropleth, we might similarly want state borders.)

The Census Bureau also publishes county boundaries, but we don’t actually need them. TopoJSON has another powerful trick up its sleeve: since census tracts compose hierarchically into counties, we can derive county geometry using topomerge!

topomerge -k 'd.id.slice(0, 3)' counties=tracts \
< ca-quantized-topo.json \
> ca-merge-topo.json
ca-merge-topo.json’s counties

The -k argument defines a key expression that topomerge will evaluate to group features from the tracts object before merging. (It’s similar to nest.key in d3-collection.) The first three digits of the census tract id represent the state-specific part of the county FIPS code, so the census tracts for each county will be merged, resulting in county polygons. The result forms a new counties object on the output topology.

Now, we don’t actually want the full county polygons; we want only the internal borders—the ones separating counties. (Stroking exterior borders tends to lose detail along coastlines.) We can also compute these with topomerge. A filter (-f) expression is evaluated for each arc, given the arc’s adjacent polygons a and b. By convention, a and b are the same on exterior arcs, and thus we can overwrite the counties object with a mesh of the internal borders like so:

topomerge --mesh -f 'a !== b' counties=counties \
< ca-merge-topo.json \
> ca-topo.json
ca-topo.json’s counties

If you followed along on the command line, you hopefully learned how to convert GeoJSON to TopoJSON, to simplify and quantize topologies, and to merge features.

In part 4, I’ll cover implementing effective color encodings using d3-scale, and rendering the choropleth to SVG using d3-geo.

Ready for more? Continue to Part 4.

Questions or comments? Reply below or on Twitter. Thank you for reading!