Creating TopoJSON using D3 v4

As somebody currently learning the ins and outs of D3 v4, I’ve struggled to find a good resource for converting the Natural Earth ESRI Shapefiles to TopoJSON. This quick post will go through how to do this using the current versions of the D3 geography utilities.

This was one of the D3-based maps my team at the Financial Times created for the 2016 U.S. election. We didn’t use TopoJSON because it was static and rendered on the server, but if we were doing a lot of interaction client-side we probably would have. Gif created by Joanna S. Kao.

Download Natural Earth

Natural Earth is a community-sourced set of global geographic geometry that can be downloaded from the Natural Earth website. You want to get the 279mb ZIP archive containing the SHP/GeoDB format of the data. You can also just grab it using the following shell command¹, though please read the Natural Earth Terms of Use before you do so.

$ curl -LOk http://naciscdn.org/naturalearth/packages/natural_earth_vector.zip

(The above is entered all as one line—Medium wraps code snippets. You should be able to copy and paste it without problem, however. Note that you can put a backslash before a newline character to write multiple lines in most shells, such as I do in the later examples.)

You can then unarchive it to a sub-directory named ne/ using the following:

$ unzip natural_earth_vector.zip -d ne

This should work in OS X and most Linux distributions. Do whatever people do to unzip archives in Windows if you use that OS (though you may find the remainder of this tutorial a touch difficult as it assumes a shell like Bash or Zsh).

Install tooling

All the D3 community tools for TopoJSON are located in the following packages:

  • topojson: The main library, needed for consuming TopoJSON in D3. Also contains geo2topo, which we use for converting GeoJSON to TopoJSON.
  • shapefile: Contains shp2json, which converts a Shapefile to GeoJSON.
  • d3-geo-projection: Contains geostitch, used for normalizing shapes before conversion to TopoJSON. If we were reprojecting our coordinates, we’d use geoproject from this package.
  • topojson-client: Contains topomerge, for merging polygons and mesh lines into a single TopoJSON collection object.
  • ndjson-cli: We manipulate internal JSON data with ndjson-map.
  • topojson-simplify: We can reduce our filesize using toposimplify.

Using npm, install all of them globally in one go:

$ npm install --global topojson shapefile d3-geo-projection \
topojson-client ndjson-cli topojson-simplify

If you get a permissions error when running the above, it’s possible you need to have superuser privileges before being able to install NodeJS packages globally. Try inserting sudo before npm in the above snippet, entering your user password as necessary.

Note that there’s a npm package called shp2json, which is a totally different package from mbostock/shapefile. For the purposes of this tutorial, when I mention shp2json, I’m talking about Mike Bostock’s version — James Halliday’s version is great, but it requires node-gdal, which I find inordinately hard to get to compile properly in OS X. Note also that topojson/topojson is no longer the command-line tools for manipulating TopoJSON — that’s all been put in the topojson/topojson-client repository.

You need to set up some sort of pipeline to move data between the various tools I mention above. I’m going to do each step separately with one file, creating a new file at every step, then give you a few one-liners to do everything at once. You can do all of this in a Bash script if you find that more convenient; or, if you’re feeling ambitious, all of the above packages expose a NodeJS API you can use with something like Gulp.

Most of this is adapted from the topojson/world-atlas prepublish script; if you get stuck, try looking at that for clues.

Convert from ESRI Shapefile to GeoJSON

Most world geometry you find online is in the Esri Shapefile (.shp) format. This is a binary format that is really not optimised for use in online web graphics, and difficult to manipulate with JavaScript. We’re first going to convert it to GeoJSON, an open-source JSON-based geoinformatics format, before converting that to TopoJSON, which is like a significantly more optimised version of GeoJSON.

The file we’re going to convert in this example is found at ne/50m_cultural/ne_50m_admin_0_countries.shp

In a collection like Natural Earth, you’ll often want several shapefiles in the same TopoJSON file. We’ll look at combining multiple files later on.

First though, let’s create an output directory called build/:

$ mkdir ./build

Convert Shapefile to GeoJSON

The next step is to put your Shapefile through shp2json, which will result in a GeoJSON file you can optimise further.

$ shp2json ne/50m_cultural/ne_50m_admin_0_countries.shp > \
build/ne_50m_admin_0_countries.geojson

This will create a GeoJSON file named ne_50m_admin_0_countries.geojson in your build/ directory. You can then simply run:

$ geo2topo build/ne_50m_admin_0_countries.geojson > \
build/ne_50m_admin_0_countries.topojson

…To get a workable TopoJSON file. This won’t be optimised at all and is basically as straight a conversion as you can get from a Shapefile. We’ll go into how to optimise and improve metadata next.

Simplify the GeoJSON metadata

This is somewhat optional, but you’ll reduce the filesize of your output further if you manage the GeoJSON’s internal data properly. In this instance, we want to map the ISO country code to the shape ID attribute. First though, we need to convert our .shp file to newline-delimited JSON, or ndjson, which allows us to use the fantastic ndjson-cli tools. We do this by supplying the -n flag to shp2geo:

$ shp2json -n ne/50m_cultural/ne_50m_admin_0_countries.shp > \
build/ne_50m_admin_0_countries-ndjson.geojson

Next we use ndjson-map to run a few operations on the JSON properties:

$ ndjson-map '(d.id = d.properties.iso_a2, delete d.properties, d)'\
< build/ne_50m_admin_0_countries-ndjson.geojson \
> build/ne_50m_admin_0_countries_country_codes.json

There’s a lot going on in here, so let’s break it down into three parts:

$ ndjson-map '(d.id = d.properties.iso_a2, delete d.properties, d)'

1. This tells the command-line tool ndjson-map to execute the following expression and return a JSON array from whatever input stream it’s given. In this case, we assign the ISO 3166–1 alpha-2 code (that is, the standard two-letter country code) from each feature’s properties property to each geographic feature’s top-level ID property. We then delete the properties property because it contains a lot of superfluous data we don’t need (which, in turn, increases filesize). Other useful properties included with Natural Earth are the iso_n3 property (to map ISO 3166–1 numeric codes), the name property (for the geographic feature’s common name) and a few others listed in the Excel file available here. We’re keeping it simple with just the country code, but Natural Earth also includes things like population and GDP data.

< build/ne_50m_admin_0_countries-ndjson.geojson \

2. Here we direct the contents of the file we just built to ndjson-map using the < input operator. We could have also done something like:

cat <filename> | ndjson-map <expression>

…Instead, to similar effect.

> build/ne_50m_admin_0_countries_country_codes.json

3. Lastly we use the > output operator to direct the output of ndjson-map into a new file, build/ne_50m_admin_0_countries_country_codes.json.

We now have a new file in our build/ directory named ne_50m_admin_0_countries_country_codes.json with far less metadata attached.

Geostitch geometry

Before we turn our GeoJSON to TopoJSON, we should fix its stitching with geostitch. This removes antimeridian and polar cuts, a geographic rendering technique intended to deal with the difficulties of rendering a spherical object on a 2D plane. Let’s do that now, using a similar format to the last command:

$ geostitch -n \
< build/ne_50m_admin_0_countries_country_codes.json \
> build/ne_50m_admin_0_countries_geostitched.json

Convert GeoJSON to TopoJSON

Finally, we’re ready to convert to TopoJSON:

$ geo2topo -q 1e5 -n countries=\
build/ne_50m_admin_0_countries_geostitched.json \
> build/ne_50m_admin_0_countries.topojson

What we do here is quantize the results by 10⁵ and set geo2topo to use the newline-delimited format we’ve been using throughout this. In the second line, we take our geostitched GeoJSON file and use cat to turn it into a stream. We then turn all the regions into a TopoJSON topology using the “countries” property in our GeoJSON file, outputting to build/ne_50m_admin_0_countries.topojson.

Lastly, we merge all the landmasses into a single topology using topomerge:

$ topomerge land=countries \
< build/ne_50m_admin_0_countries.topojson \
> build/ne_50m_admin_0_countries_merged.topojson

Here we create a new topology called land that we create from the countries topology we just generated.

Doing this makes features easier to work with if you’re wanting some way of manipulating a group of them at once. Another good use for topomerge would be doing something like containing all the features for an entire region into a single topography—ultimately how you use it is pretty dependent upon your use-case.

All at once, now!

Let’s do this in one fell swoop:

$ env INPUT_FILE=ne/50m_cultural/ne_50m_admin_0_countries.shp \
OUTPUT_FILE=build/ne_50m_admin_0_countries.topojson \
bash -c 'geo2topo -q 1e5 -n countries=<(shp2json -n $INPUT_FILE \
| ndjson-map "(d.id = d.properties.iso_a2,delete d.properties,d)" \
| geostitch -n) \
| topomerge land=countries > $OUTPUT_FILE'

It looks like a lot going on, but it’s only the last few steps strung together. We set $INPUT_FILE and $OUTPUT_FILE as environment variables at the beginning to make this easier to use (or possibly put into a shell script). We then run the same chain of commands above, streaming between them instead of storing to a file as we did before. We wrap the whole command in a string and supply that to Bash, which is mainly to help ensure it works cross-platform.

Combining multiple shapefiles

If you’re wanting to combine multiple .shp files into a single TopoJSON file, you’ll need to add a step to the above workflow. This time, we’re going to combine ne_50m_rivers_lake_centerlines.shp and ne_50m_ocean.shp into one water.json TopoJSON file.

$ env OUTPUT_FILE=build/water.topojson \
bash -c 'geo2topo -q 1e5 -n water=\
<(\
shp2json -n ne/50m_physical/ne_50m_rivers_lake_centerlines.shp
shp2json -n ne/50m_physical/ne_50m_ocean.shp

) \
| geostitch -n \
> $OUTPUT_FILE'

If you look above, there are two shp2json commands separated by a newline, in bold. Think of each line as a separate shp-to-GeoJSON workflow — if you wanted to do something like run ndjson-map on either or both of the shapefiles, you’d simply pipe the output of each to that like so:

$ env OUTPUT_FILE=build/water_no_metadata.topojson \
bash -c 'geo2topo -q 1e5 -n water=\
<(\
shp2json -n ne/50m_physical/ne_50m_rivers_lake_centerlines.shp \
| ndjson-map "(d.id = d.properties.name,delete d.properties,d)";
shp2json -n ne/50m_physical/ne_50m_ocean.shp \
| ndjson-map "(d.id = d.properties.name,delete d.properties,d)"

) \
| geostitch -n \
> $OUTPUT_FILE'

Note that I’ve put a semi-colon after the first workflow; I’ve merely done so in order to indicate where the first command ends and the second begins (though you can copy and paste the above ad-verbatim and it won’t make a difference — the semi-colon in this instance is optional).

Simplify your TopoJSON

Lastly, it’s worth simplifying your geometry so that it isn’t as big a download when delivered to web-browsers. Run your new TopoJSON through toposimplify to reduce its filesize. We’re going to use our original build/water.topojson file, from before we deleted its properties property:

$ toposimplify -f -p 0.01 \
< build/water_no_metadata.topojson \
> build/water_no_metadata_simplified.topojson

Just by doing that the filesize has gone from ~782 kb to ~398 kb, without any obvious difference in quality when the geometry is scaled to our display extents. That’s pretty good!

I’m not even going to try and explain what each toposimplify option does because I have literally no idea —from my very brief ad-hoc experiments, passing a value of 0.01 to -p results in about half the filesize of 0.001. 
¯\_(ツ)_/¯

Check the topojson-simplify docs for further details. Or— if, like me, you don’t know what a planar quantile is, try a web-based tool like one of the ones below to visually simplify your geometry to the level of quality needed.

What now?

Before you do anything serious with your resulting TopoJSON, it’s an exceptionally good idea to load it in some kind of visual tool to inspect the resulting geometry. It’s possible you may need to reproject your output, or add some similar sort of additional processing step to your workflow.

A good site for such a thing is Mapshaper.org, which you can also use to convert Shapefiles to TopoJSON if all the Command-Line-Fu above is causing you issues. You can also visually simplify your geometry here, which is quite a lot less frustrating than guessing which command-line option values to use with toposimplify:

You have no idea how frustrating it was to get those two Shapefiles into one TopoJSON file. Hats off to Martín González on the D3 Slack team for helping me out!

Another good tool that can be useful if you also need to inspect the resulting properties is Mapstarter.com:

Mapstarter defaults to Robinson, but you can use other cartographic projections like Mercator (if you hate maps, that is).

In closing, I hope you find this helpful and it saves you time figuring out how to get new geometry into D3. Did I miss something or have a tip? Please leave a response and I’ll do my best to reply!

Ændrew Rininsland is the author of Data Visualization with D3.js, 2nd edition from Packt Books and a newsroom developer at the Financial Times
He tweets as @aendrew.
Many thanks to Martín González, Micah Stubbs, Mike Bostock, Tom Pearson and Kshitij Aranke for providing feedback on this post! Thanks also to Mike Bostock for creating all these tools in the first place! 😄