Location Is Everything: Getting Into ArcGIS in 5 Minutes

all data has a location if you look hard enough

Mark Cleverley
Dec 7, 2020 · 5 min read

In the digital age we never lack for information, but often have trouble finding useful knowledge. The answer is nuance.

Context makes data truly useful.

Add the Pareto Principle: Of all the total information about some topic, 80% of the “useful knowledge” is contained in about 20% of the net information.
Some context matters more than others.

You will often be reminded of this. We don’t mind petting a salamander but steer clear of alligators, even though their shape and function is largely the same.

It’s why Facebook knows what ads to show you, even though it doesn’t have access to all the thoughts in your brain (yet).

They don’t know “all that’s on your mind”. But analyzing your GPS data, search history (they own Instagram, remember) and yes, listening through your phone’s microphone is enough to understand “what’s on your mind enough that you bother talking and writing about”.

That’s why you see strangely specific ads for something you talked about with your friend but never searched for. This is context at work.

And as any real estate agent would tell you, there’s one type of context that usually trumps all others: location.

Geospatial Data

Everything (that we know of, at least) physical exists somewhere in space. If the object or topic you’re studying is able to be geotagged & recorded on a map, it’s often surprisingly helpful to do so.

Google Maps gives you a handy estimate of traffic on your route: It’s able to do this by tracking (hopefully anonymized) GPS data from phones it’s installed on.
By comparing the distance between consecutive location update “pings” and the time they were issued, it can discern in real-time when cars are backed up on a highway they normally travel 60mph on.

However, working with GPS data seems like an awfully big field to get into, technically speaking. Fortunately, other people have already done most of the heavy lifting regarding coordinate-friendly data structures and packaging:

ArcGIS is one of the more robust geospatial analysis software libraries out there. It has a host of features from data storage, server integration, in-built deep learning and smooth visualizations.

Image for post
Image for post
the various modules of ArcGIS

It is, without a doubt, the next clear step on our journey to understand why hexagons are so prevalent in nature, and why they’re so useful for geographic data modeling.

If that seems like a stretch, here’s a module diagram straight from their tutorial page. Tessellation is just another word for “efficient Euclidean data compression”.

They have various pricing plans for their full desktop & business suites, but the easiest way forward is to create a free Developer account to use with the API and Python modules.

They have a number of setup options, but I elected for the clean Anaconda command line installation: conda install -c esri arcgis. If you’re like me and enjoy living dangerously, make sure to install all new packages directly in your main environment.

As is tradition for messing around with a new data science toolkit, open up a Jupyter notebook and install a good-looking theme, as only the most terrifying of beasts code against a bright white background.

Data in Frame

Their tutorial is quite comprehensive. We’ll start by importing the base library and instantiating a GIS object using our account we made earlier.

from arcgis.gis import GIS
gis = GIS("https://www.arcgis.com", "username", "password")

This gis object will be the gateway to access most of the module’s content. We can load a default satellite overview by calling gis.map(‘City, State’):

Image for post
Image for post

You can click to drag and zoom in the cell output, which is neat. We could draw all sorts of coordinates and lines on this map. But it doesn’t really feel like data science without a Dataframe (or at least some matrix serving the same purpose).

They’ve actually developed a Spatially Enabled Dataframe that extends Pandas. Tremendous.

# get your imports in order:
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
Image for post
Image for post

Already we find ourselves in familiar territory: Assigning a variable to gis.content.get(‘item_string’) lets us grab data from publicly-hosted map-layer items and store it in-memory much as regular Pandas Dataframes do. They directly extend Pandas, so go ahead and try out the regular operations — slicing, subselection formats are quite the same.

Cellular Mapping

Rasters are essentially a cell grid where data can be stored in each cell.
A honeycomb is a raster, and fishnet stockings are 3-dimensional raster manifolds. Math is consistent, even when it’s not.

Let’s draw an image from the NASA-USGS Landsat-8 satellite and unpack the first layer:

Image for post
Image for post

Calling l8_lyr.properties[‘description’] tells us that it’s an image analysis service covering most of the world’s landmass at 30-meter resolution, and can be used for purposes such as vegetation, agriculture and boundary studies.

ArcGIS covers the intermediate functions to compute raster methods directly on these map layers, saving a ton of memory.

Let’s apply some to the nyc map we made earlier. The below loop operates on the cell output since we call nyc.add_layer().

Image for post
Image for post

We essentially just loop through a list of raster functions (agriculture, bathymetric, infrared etc) from the Landsat’s first image layer and cycle through adding and removing them from the map.

The Earth Opens Up Before You

You’re now working with ArcGIS data. There’s a lot of other things to do with these tools; you could integrate live ocean buoy data feeds into a live wave-height map or conduct deep learning on Yellowstone wolf movements.

To search for more data, you may want to check out the API’s search functions:

# search for feature layers relating to california
my_content = gis.content.search(query='california',
item_type="Feature Layer",
max_items=20)

These objects can generally be explored in the same way as above. Next time we’ll look into hexagonal rasters.

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Mark Cleverley

Written by

data scientist, machine learning engineer. passionate about ecology, biotech and AI. https://www.linkedin.com/in/mark-s-cleverley/

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Mark Cleverley

Written by

data scientist, machine learning engineer. passionate about ecology, biotech and AI. https://www.linkedin.com/in/mark-s-cleverley/

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store