Using spatial data and Mapshaper to answer questions
Sure, you could simply display spatial data on a map. But you’d be limiting yourself. Spatial data can also help you answer questions about all sorts of topics. To demonstrate it, let’s answer a question with spatial data.
The question
How many times did someone report a pothole to the City through the Get It Done applications in 2017 for the Downtown area?
The tool
Although we could use various GIS tools to answer this question, here we are using Mapshaper. Why? Because you will be able to reproduce the workflow without having to install anything, and you’ll be surprised that an online tool can handle this sort of analysis.
The steps
- Download data
To start with, download the following two datasets from our open data portal:
2. Import data into Mapshaper
Go to Mapshaper and select/import the Get It Done dataset. First thing’s first, let’s rename the dataset to ‘gid’ for easier processing by opening the console (top-right button) and typing the following command: -rename-layers gid
At this point, this dataset is not interpreted as spatial data, and all you can see is a grid of points. To solve this issue, we are going to tell Mapshaper which columns of the .csv file contain spatial data by typing the following command into the console: -points x=long y=lat
If you’re curious about all the commands you can use in Mapshaper, check out this reference.
3. Filter data
Two elements of the question we are attempting to answer can be addressed by filtering the Get It Done data:
- ‘In 2017’ with the following command:
-filter ‘requested_datetime > “2016–12–31” && requested_datetime < “2018–01–01”’
- ‘Requests for potholes’ with the following command:
-filter “service_name == ‘Pothole’”
Now your screen should look like this:
4. Import more data into Mapshaper
To answer the last element of the question, ‘in the Downtown area’, we need to perform a spatial join between the neighborhoods from the Community Planning shapefile and the filtered Get It Done data.
To do so, first import the Community shapefile by clicking on ‘gid’ (top-center), then ‘add a file’, then ‘import’. Rename the layer to ‘neighborhoods’: -rename-layers neighborhoods target=cmty_plan_datasd
One last thing — reprojection. If you type -info in the console, you will notice that both spatial layers are using different coordinates systems. All you need to do here is to put both layers in the same system with the following command: -proj wgs84
5. Perform a spatial join
You are now all set to operate a spatial join between the two layers. With a spatial join, we will end up with a column that contains the neighborhood name from the ‘neighborhoods’ layer for each row of the ‘gid’ layer.
To perform the join, select the ‘gid’ layer from the layer switcher (top-center), and type the following line: -join neighborhoods fields=‘cpname’
If you now explore the gid data, you will notice an extra field at the bottom with the neighborhood information for each pothole:
6. Answer the question
You can now answer the initial question with a last command: -calc ‘count()’ where=“cpname == ‘Downtown’”
The findings
At the time of this writing, 814 potholes were reported with the Get It Done applications for the downtown area in 2017. Put that in your report or slide deck!
Do you have more questions? I did.
What about Uptown (1059), or Golden Hill (194), or La Jolla (725)? What is Uptown, anyway (Mission Hills + Hillcrest?)? Have the number of potholes reported per month citywide gone down (yes! See the table below).
For that last one, I had to run multiple calculations and use requested_datetime in the where clause. There might be a better way to do that, but I don’t know it.
What I do know, is that spatial data has all the answers.
Arnaud Vedy contributed to this article.