GeoSpatial Data and its Role in Data Science

Published in

Analytics Vidhya

4 min readOct 17, 2019

Using location-based data in data science has powerful implications spanning this world and beyond.

Data can come in many different forms optimized to serve infinitely different purposes. But geographical and spatial data may hold some of the most powerful implications for data science and the world.

Geospatial data is defined as data holding an implicit or explicit association with a relative location to earth. But in more simple terms, its data that tells us where a town, city, building, car, person, or physical object may be, according to DataCamp. And it can also tell us not only the location of objects, but their size area or shape as well.

GeoJSON file formatting | Source: GeoJson.org

Locational data is commonly saved into GeoJSON files and can often appear as collections of dictionaries storing coordinates. But they can also combine lists, tuples, and nest multiple of these elements within each other. And they can become very complicated very quickly.

Analysis of geographic data in python is commonly done by using RTree, GDAL, Fiona, Shapely, PySAL, and the GeoPandas expansion of Python’s Pandas which allows spatial operations on geometric information.

GeoJSON file holding the locations of bicycle crashes in Chapel Hill, North Carolina from 2007 to 2014| Data.gov

But GeoJSON files are not the only type of files used to store location data. TopoJSON files can also be used to extend GeoJSON files by including the topology of the land. And they can be used to describe a wide range of things ranging from the charting the surface of Mars to telling people where scooters they want to rent are located. But where is all of this information coming from?

Visualization showing population density in San Francisco | Mapbox

Location data is heavily used both by private companies and pubic entities. Among the most notable holders of locational data are Mapbox, Google Maps, and the US government. There are also some open source providers such as Openstreetmap which are constantly updated based on user contributions.

More and more companies are also being created every day implementing creative uses of locational data.

For example, Canadian company GHGSat uses location data to monitor emissions outputs from industrial facilities. And Drive.ai is one of the many companies using location data for self driving cars by determining how far objects are from the car

And while the analysis of geospatial data has become a multi-billion dollar industry responsible for innovations ranging from mapping the ocean floor to showing people where their Uber is, it didn’t start that way.

Map of London Cholera Outbreak | Source: Wiki Commons

Mapping geospatial data first began in London, England in 1854. In the midst of the London Cholera outbreak of 1854, British physician Dr. John Snow used a map of London to chart outbreak locations against roads, property boundaries, and water lines. Upon doing so Snow discovered that outbreaks of the disease were all centered around water pumps, disproving the commonly held view it was spreading through the air, and birthing both the first use case of spacial analysis and the study of how disease spreads.

After Snow, one of the pioneering figures to use location based data was Roger Tomlinson. Working for the Canadian government he created a database of soil, drainage and climate characteristics of land spread across Canada to determine which crop types were suitable in different areas of the country in 1971. Since then it has been used location based data has been collected and used by many other countries, including the US and the United Kingdom.

GeoSpatial Data and its Role in Data Science

Written by Garrett Keyes