Making Sense of Spatial Data with Maps

Brittany Fowle
5 min readDec 19, 2019

--

Maps in Data Science

According to National Geographic, maps are symbolic representations of selected characteristics of a place, emphasizing the relationships between elements in space. Maps rely on Tobler’s First Law of Geography, which states that “everything is related to everything else, but near things are more related than distant things.” A simple way to describe this is to say that the relationship between objects is spatially dependent on how close they are to each other. Space is also heterogeneous, meaning that it varies across distances, therefore locations closer to each other are more related.

These principles are fundamental to the field of Geographic Information Systems (GIS), which is a framework for gathering, managing, and analyzing data that is rooted in geography. This system uses spatial locations layered with additional data to visualize patterns, relationships, and situations. An early example of spatial analysis in practice is in the work of John Snow (1813–1858), an English physician who used spatial data to trace the source of a cholera outbreak in London to one water pump in 1854, before germ theory was developed. To do this, Snow spoke to residents of the neighborhood and eventually plotted the clusters of cholera cases on a map to illustrate that they were all clustered around one water pump.

John Snow’s dot map of the 1854 cholera outbreak surrounding the Broad Street water pump in Soho, London. (John Snow [Public domain])

Other than epidemiology, GIS is useful in any field where spatial location indicates a relationship, including education, public safety, real estate, retail, transportation, natural resources, government, utilities, and manufacturing.

In this example from esri, GIS helped to set the priorities of public safety officials by analyzing crime data in Philadelphia. Based on the locations of robberies, officials can identify areas that require a larger police presence. To approach the problem:

  1. A hotspot analysis was conducted to determine where robberies increased.
  2. Based on the analysis, target areas were identified where anti-robbery patrols should increase.
  3. Additional data was collected in the field through security surveys to identify locations at higher risk for robbery.
  4. Using a dashboard, police activity was monitored to collect information on new crimes and oversee the security surveys.
  5. After the study period, the data was analyzed to evaluate the success of the new strategy.
  6. The insight gained from the study was communicated to the public, showing that increased police presence and conducting security surveys reduced the number of robberies in the target areas.
Results from the map from crime reduction study in Philadelphia.

Why Use A Map?

In the final step of this process, it is important to keep in mind the stakeholders that are invested in the outcome of your data science experiment. In this case, they are members of the community concerned with public safety, as well as local officials who may share the outcome with state officials as a way to secure funding for their department. Having a visual element like a map can help to communicate the impact of the experiment. When data is superimposed on a map, the viewer has a baseline understanding of how the data relates to a location thanks to spatial autocorrelation. While charts, graphs, and other forms of data visualization can be compelling, numbers can feel abstract to a person unfamiliar with the dataset and the computations were performed to arrive at a conclusion. Maps, however, are a familiar representation that can help the viewer understand how the variable in question changes over space. Maps are generally constructed with similar conventions like an indicator of scale and orientation, which makes them a great base upon which to add data to in order to tell a story about a place.

Tools for Spatial Analysis

To conduct spatial analysis, you need geolocated data. Many cities make their data publicly available for anyone to use to conduct research, develop apps, design data visualizations and more. In this US, the portal for this is Data.gov, where you can find datasets on anything from agriculture, consumer data, health, and local government. Many other countries in the EU, as well as India, have portals for urban data.

Map illustrating temperature anomalies in Europe, made with CARTOframes (By Javier Perez Trufero and Giulia Carella).

The Philadelphia robbery data above was visualized in esri ArcGIS, though there are other tools that can be used to make your own maps. CARTO is a pared-down web-based parallel to ArcGIS, which is powerful but not as intuitive or user-friendly. In CARTO, users can utilize their own libraries, functions, and workflows to visualize their data straight out of a Jupyter Notebook. With this, simple data transformations can be performed using Python and SQL. CARTO also has a service called CARTOframes, which is a Python package for interacting with CARTO that works with many data science tools, including pandas, matplotlib, and Post GIS.

Interested in working with some spatial data? Here is a tutorial from DataCamp by Duong Vu that will take you step-by-step through visualizing a dataset right in Python using many of the packages we’ve begun to utilize (including Pandas, Numpy, and Matplotlib), and several more that help to read and analyze geometric data. The tutorial explores data from Hurricane Florence in 2018 to determine where the storm was the strongest.

Everyday application: Google Maps

Aside from being a great tool for spatial analysis, maps are a great interface for interacting with spatial data. I route my journey on Google Maps before leaving the house, even when traveling to a location I have previously visited. This is not because I am uncertain of how to get to my destination, but because Google Maps has integrated so many other useful data points into their service. and not just because the G train is a fickle beast. Though imperfect, the integration with MTA service information is essential to arriving anywhere on time in New York City.

Initially, Google Maps used traffic sensors and cameras installed by government transportation agencies and private companies to collect traffic data. These sensors use laser radar or active infrared technology that can detect how fast the overall traffic was moving. Since 2009, Google Maps has used crowdsourcing to collect the most recent data to provide better service. When you’re using the app with your location services activated, the app sends anonymous data about how fast the device is moving (and therefore the speed of the car) to Google. They then take the data, process it along with the data collected from other devices using the app, and sends it back to the same devices in the form of realtime traffic updates. It’s not just a song by Parquet Courts: the more you use it, the more it works! The more users send this data to Google, the better the service becomes.

Resources:

Laws of Geography. Criminology and Criminal Justice. Oxford Research Encyclopedias.

What is GIS? esri.com

https://carto.com/

https://www.datacamp.com/community/tutorials/geospatial-data-python

--

--