Using spatial data and Mapshaper to answer questions

Andrell Bower
DataSD
Published in
3 min readJan 8, 2018

Sure, you could simply display spatial data on a map. But you’d be limiting yourself. Spatial data can also help you answer questions about all sorts of topics. To demonstrate it, let’s answer a question with spatial data.

The question

How many times did someone report a pothole to the City through the Get It Done applications in 2017 for the Downtown area?

The tool

Although we could use various GIS tools to answer this question, here we are using Mapshaper. Why? Because you will be able to reproduce the workflow without having to install anything, and you’ll be surprised that an online tool can handle this sort of analysis.

The steps

  1. Download data

To start with, download the following two datasets from our open data portal:

2. Import data into Mapshaper

Go to Mapshaper and select/import the Get It Done dataset. First thing’s first, let’s rename the dataset to ‘gid’ for easier processing by opening the console (top-right button) and typing the following command: -rename-layers gid

At this point, this dataset is not interpreted as spatial data, and all you can see is a grid of points. To solve this issue, we are going to tell Mapshaper which columns of the .csv file contain spatial data by typing the following command into the console: -points x=long y=lat

If you’re curious about all the commands you can use in Mapshaper, check out this reference.

3. Filter data

Two elements of the question we are attempting to answer can be addressed by filtering the Get It Done data:

  • ‘In 2017’ with the following command: -filter ‘requested_datetime > “2016–12–31” && requested_datetime < “2018–01–01”’
  • ‘Requests for potholes’ with the following command: -filter “service_name == ‘Pothole’”

Now your screen should look like this:

Mapshaper now shows just pothole service requests made in 2017.

4. Import more data into Mapshaper

To answer the last element of the question, ‘in the Downtown area’, we need to perform a spatial join between the neighborhoods from the Community Planning shapefile and the filtered Get It Done data.

To do so, first import the Community shapefile by clicking on ‘gid’ (top-center), then ‘add a file’, then ‘import’. Rename the layer to ‘neighborhoods’: -rename-layers neighborhoods target=cmty_plan_datasd

One last thing — reprojection. If you type -info in the console, you will notice that both spatial layers are using different coordinates systems. All you need to do here is to put both layers in the same system with the following command: -proj wgs84

5. Perform a spatial join

You are now all set to operate a spatial join between the two layers. With a spatial join, we will end up with a column that contains the neighborhood name from the ‘neighborhoods’ layer for each row of the ‘gid’ layer.

To perform the join, select the ‘gid’ layer from the layer switcher (top-center), and type the following line: -join neighborhoods fields=‘cpname’

If you now explore the gid data, you will notice an extra field at the bottom with the neighborhood information for each pothole:

Each row, which represents one pothole service request, now has an associated neighborhood name.

6. Answer the question

You can now answer the initial question with a last command: -calc ‘count()’ where=“cpname == ‘Downtown’”

The findings

At the time of this writing, 814 potholes were reported with the Get It Done applications for the downtown area in 2017. Put that in your report or slide deck!

Do you have more questions? I did.

What about Uptown (1059), or Golden Hill (194), or La Jolla (725)? What is Uptown, anyway (Mission Hills + Hillcrest?)? Have the number of potholes reported per month citywide gone down (yes! See the table below).

Pothole requests per month, courtesy of Mapshaper

For that last one, I had to run multiple calculations and use requested_datetime in the where clause. There might be a better way to do that, but I don’t know it.

What I do know, is that spatial data has all the answers.

Arnaud Vedy contributed to this article.

--

--