Command-Line Cartography, Part 2

A tour of d3-geo’s new command-line interface.

[This is Part 2 of a tutorial on making thematic maps from the command line using d3-geo, TopoJSON and ndjson-cli. Read Part 1 or Part 3 here.]

Often the data we find online isn’t precisely what we want. Perhaps we need to join multiple data sources, or to convert formats (say from fixed-width text to CSV), or even just to drop unneeded fields to produce smaller files that are faster to download.

You can always write scripts to transform data in arbitrary ways, but writing (and debugging) scripts can become tedious as transformations become more complex. What we want is an iterative, exploratory approach, where at every step we can see what the data looks like; by inspecting as we go, we can fix mistakes as we make them, before they get buried in complexity. And when we’re done, we want to capture our workflow as machine-readable documentation that can be easily reproduced.

The command line is great for this. UNIX has a well-established philosophy of small, decoupled tools that allow powerful manipulation of data. To leverage the power of the command line, we simply need to convert our data into a format that fits a UNIX convention: lines of text. And since JSON is already text, we just need each line of text be a valid JSON value.

Enter newline-delimited JSON (NDJSON), which is simply JSON values separated by newlines (\n). NDJSON combines the best of both worlds: the convenience of the command line for working with files, and the power of JavaScript and its myriad open-source modules. My ndjson-cli module provides tools for converting JSON to NDJSON, for manipulating NDJSON streams (filtering, mapping, joining) and more. To install:

npm install -g ndjson-cli

To convert a GeoJSON feature collection to a newline-delimited stream of GeoJSON features, use ndjson-split:

ndjson-split 'd.features' \
< ca-albers.json \
> ca-albers.ndjson
ca-albers.ndjson

If you haven’t seen redirection before, the < operator tells a command (ndjson-split) to read from the specified file, while the > operator tells it to write to the specified file. There’s also the | operator, which lets you pipe the output of one command as input to another command, obviating the need for an intermediate file. You may sometimes want an intermediate file, but you can also pipe to head (| head) or less to quickly inspect the output of any command.

The output here looks underwhelmingly similar to the ca-albers.json we saw previously; the only difference is that there is one feature (one census tract) per line. But this is huge—it means we can now manipulate individual features! For example, to set each feature’s id using ndjson-map:

ndjson-map 'd.id = d.properties.GEOID.slice(2), d' \
< ca-albers.ndjson \
> ca-albers-id.ndjson
ca-albers-id.ndjson

This id will be needed to join the geometry with the population estimates, which we will now download from the Census Bureau’s API using curl:

curl 'http://api.census.gov/data/2014/acs5?get=B01003_001E&for=tract:*&in=state:06' -o cb_2014_06_tract_B01003.json
cb_2014_06_tract_B01003.json

(Note: please request a key when using the Census API.) The B01003_001E in the URL specifies the total population estimate, while the for and in values specify that we want data for each census tract in California. See the API documentation for details.

The resulting file is a JSON array. To convert it to an NDJSON stream, use ndjson-cat (to remove the newlines), ndjson-split (to separate the array into multiple lines) and ndjson-map (to reformat each line as an object). You can run these individually, but here’s how to do it all in one go:

ndjson-cat cb_2014_06_tract_B01003.json \
| ndjson-split 'd.slice(1)' \
| ndjson-map '{id: d[2] + d[3], B01003: +d[0]}' \
> cb_2014_06_tract_B01003.ndjson
cb_2014_06_tract_B01003.ndjson

(You can even pipe the above to json2csv to produce CSV!)

Now, magic! Join the population data to the geometry using ndjson-join:

ndjson-join 'd.id' \
ca-albers-id.ndjson \
cb_2014_06_tract_B01003.ndjson \
> ca-albers-join.ndjson
ca-albers-join.ndjson

It may be hard to see in the screenshot, but each line in the resulting NDJSON stream is a two-element array. The first element (d[0]) is from ca-albers-id.ndjson: a GeoJSON Feature representing a census tract polygon. The second element (d[1]) is from cb_2014_06_tract_B01003.ndjson: an object representing the population estimate for the same census tract.

To compute the population density using ndjson-map, and to remove the additional properties we no longer need:

ndjson-map 'd[0].properties = {density: Math.floor(d[1].B01003 / d[0].properties.ALAND * 2589975.2356)}, d[0]' \
< ca-albers-join.ndjson \
> ca-albers-density.ndjson
ca-albers-density.ndjson

The population density is computed as the population estimate B01003 divided by the land area ALAND. The constant 2589975.2356 = 1609.34² converts the land area from square meters to square miles.

Note that the density value is floored rather than rounded. We don’t need the extra precision, so either results in a smaller output file. But since we will later apply a threshold color encoding in the choropleth, rounding would be inappropriate: for example, it would change an effective threshold of 4,000 to 3,999.5!

To convert back to GeoJSON, use ndjson-reduce and ndjson-map:

ndjson-reduce \
< ca-albers-density.ndjson \
| ndjson-map '{type: "FeatureCollection", features: d}' \
> ca-albers-density.json

Or, using ndjson-reduce alone:

ndjson-reduce 'p.features.push(d), p' '{type: "FeatureCollection", features: []}' \
< ca-albers-density.ndjson \
> ca-albers-density.json
ca-albers-density.json

We can quickly preview the joined geometry on Mapshaper:

ca-albers-density.json on mapshaper.org

(Yes, California is upside-down. Mapshaper treats +y as up, as it normally is when y represents latitude. The coordinate system used by Canvas and SVG, however, by default treats +y as down.)

Even better, we can use d3-geo-projection to quickly generate an SVG choropleth from the command line! To do that, first install D3:

npm install -g d3

Next use ndjson-map, requiring D3 via -r d3, and defining a fill property using a sequential scale with the Viridis color scheme:

ndjson-map -r d3 \
'(d.properties.fill = d3.scaleSequential(d3.interpolateViridis).domain([0, 4000])(d.properties.density), d)' \
< ca-albers-density.ndjson \
> ca-albers-color.ndjson

To convert the newline-delimited GeoJSON to SVG using geo2svg:

geo2svg -n --stroke none -p 1 -w 960 -h 960 \
< ca-albers-color.ndjson \
> ca-albers-color.svg
ca-albers-color.svg

Ta da! 🎉

But now, a little wet blanket. This is not yet a good choropleth: as population density is not uniformly distributed, the color encoding should be transformed to better show the data, say by explicit thresholds or a power scale. A choropleth should have a title, a key, and contextual cues to help identify geography, such as county borders. But hey, at least we’ve confirmed our data looks reasonable!


If you followed along on the command line, you hopefully learned how to download data from the U.S. Census Bureau, join it to geometry, compute derived data, and preview it.

In part 3, I’ll cover simplifying geometry and merging features using topojson-server, topojson-simplify and topojson-client.

In part 4, I’ll cover implementing effective color encodings using d3-scale, and rendering the choropleth to SVG using d3-geo.

Ready for more? Continue to Part 3.

Questions or comments? Reply below or on Twitter. Thank you for reading!