Wrangling and Visualizing Geospatial Data — Create your First Map (… 1)

6 min readMar 11, 2024

Managing project dependencies can often feel like navigating through a maze in data science and geospatial analysis. Each library, with its web of dependencies, holds the potential to conflict with another, turning your project environment into a battleground of compatibility issues. This is particularly true when working with comprehensive spatial analysis libraries such as geopandas, along with its associated tools like fiona, pyproj, and GDAL. These tools are powerful for geospatial data processing but come with a complex set of dependencies that need careful handling.

It's advisable to create a new, isolated environment that emerges not just as a good practice, but as a necessity. An isolated environment serves multiple critical functions:

It streamlines dependency management,
Ensures project reproducibility,
Maintains stability across your development work,
Provides a sandbox for testing, and
Simplifies the deployment process.

The conda create command is used to create a new environment, and conda activate is a separate command used to activate the environment you've just created. Here's how you should execute these commands separately:

First, create the new environment with the necessary packages:

conda create --name NAME_OF_ENV geopandas pyogrio fiona gdal -c conda-forge

2. After the environment has been successfully created, you activate it with the following command:

conda activate NAME_OF_ENV

Reading the Data and Extracting Columns

To read the Geospatial data, use the GeoPandas library, a powerful tool designed specifically for geospatial data processing. GeoPandas extends the functionalities of Pandas, allowing for the easy handling of geospatial data in Python.

Read the data using the read_file method. This method is versatile, supporting a wide range of geospatial file formats. Read data and display the first five rows to get an initial glimpse of the dataset: (path to the data used — Link):

# Read data (DEC_Lands)
data = gpd.read_file(r"PATH TO THE  DATA FILE")

# View data - 5 rows of the data
data.head()

Upon executing this code, you’ll be presented with the first five rows of the dataset. This is the structure, columns, and types of data.

Replace `"PATH TO THE DATA FILE"` with the actual path to your geospatial dataset.

The data contain information (i.e. column data) that isn’t needed to accomplish the goal of this exercise.

The goal of this exercise is to create your first map.

Extract the needed information or even the subsets of the data:

# Extract columns 'CLASS','COUNTY', and  'geometry'

data_2 = data.loc[:,['CLASS','COUNTY', 'geometry']].copy()
data_2.CLASS.value_counts()

# Subsets

# Select lands that fall under the "WILD FOREST" or "WILDERNESS" category

wild_lands = data_2.loc[data_2.CLASS.isin( ["WILD FOREST", "WILDERNESS"])].copy()
wild_lands.head(5)

Create the First Map

To visualize the geospatial data useplot() method directly on the GeoDataFrame, i.e. : wild_lands.plot(). This code generates an immediate visual representation of the spatial data, offering a visual insight into the distribution and layout of the wild lands.

Each GeoDataFrame houses a unique “geometry” column, a feature that stores the geometric objects — points, lines, or polygons — representing the spatial data. These geometric objects are the very essence of what is visualized when invoking the plot() method. To see the structure and content of these geometric entities, examine the first five entries of the "geometry" column with the following snippet:

# View the first five entries in the "geometry" column
wild_lands.geometry.head()

The above code, results in these outputs:

In a GeoDataFrame, the “geometry” column houses the geometric objects representing each record's spatial attributes. Depending on the nature of the geospatial data, these objects could be points, lines, polygons, or a mix.

In the case of the original dataset, the "geometry" column comprises 2983 different Polygon objects. Each of these polygons represents a unique area on the map.

To enrich the analysis, let’s expand the dataset to include additional geographical features. Specifically, introduce three more GeoDataFrames, each dedicated to a different type of geospatial feature relevant to the study area:

Campsite Locations are represented as Point objects. Points are the simplest form of geometry, marking specific coordinates on a map.
Foot Trails are represented as LineString objects. LineStrings connect multiple points in a sequence to form lines, useful for mapping paths or routes.
County Boundaries are represented as Polygon objects. Similar to thewild_lands dataset but focusing on administrative boundaries.

Here’s how to create these GeoDataFrames:

# Campsite in New York state - Points
POI_data = gpd.read_file(r"data/DEC_pointsinterest/DEC_pointsinterest/Decptsofinterest.shp")
campsite = POI_data.loc[POI_data.ASSET=="PRIMITIVE CAMPSITE"].copy()

# Foot trails in New York state - Linestrings
road_trails = gpd.read_file(r"data/DEC_roadstrails/DEC_roadstrails/Decroadstrails.shp")
trails = road_trails.loc[road_trails.ASSET=='FOOT TRAIL'].copy()

# County boundaries in New York state - Polygon
counties = gpd.read_file(r"data/NY_county_boundaries/NY_county_boundaries/NY_county_boundaries.shp")

# Display the first five entries of the geometry column for each GeoDataFrame to verify
print("Campsites:\n", campsites.head())
print("\nFoot Trails:\n", trails.head())
print("\nCounty Boundaries:\n", counties.head())

Create the map

Creating a map that integrates data from multiple GeoDataFrames is a way to visualize complex spatial relationships and patterns within an area of interest. By overlaying data from different sources — such as wild lands, campsites, foot trails, and county boundaries — we can gain comprehensive insights into the spatial layout and how various elements interact with each other.

In geopandas, the plot() method can be used not only to visualize individual GeoDataFrames but also to overlay multiple datasets on the same map. This is achieved through the use of the ax parameter, which allows different plots to share the same axes, effectively layering them on top of each other.

Here’s how to create a combined map from all four GeoDataFrames:

# Create a figure and axis object with a specified size
fig, ax = plt.subplots(figsize=(10, 10))

# Plot each GeoDataFrame, specifying the order of layers using 'zorder' for clarity
counties.plot(ax=ax, color='none', edgecolor='gainsboro', zorder=1, label='County Boundaries')
wild_lands.plot(ax=ax, color='green', zorder=2, label='Wild Lands')
trails.plot(ax=ax, color='blue', linestyle='-', linewidth=1, zorder=3, label='Foot Trails')
campsite.plot(ax=ax, color='red', marker='o', markersize=2, zorder=4, label='Campsites')

# Customization: Set a title for the map
ax.set_title('Map of Wild Lands, Campsites, Foot Trails, and County Boundaries')

# Create custom legend handles for each layer
campsite_legend = mlines.Line2D([], [], color='red', marker='o', linestyle='None', markersize=2, label='Campsites')
foot_trail_legend = mlines.Line2D([], [], color='blue', marker='_', linestyle='None', markersize=2, label='Foot Trails')
county_boundary_legend = mlines.Line2D([], [], color='gainsboro', marker=None, linestyle='-', linewidth=2, label='County Boundaries')
wild_land_legend = mlines.Line2D([], [], color='green', marker=None, linestyle='-', linewidth=2, label='Wild Lands')

# Add the custom legend to the plot, including all custom legend handles
ax.legend(handles=[campsite_legend, foot_trail_legend, county_boundary_legend, wild_land_legend])

# Remove the x and y axis for a cleaner look
ax.set_axis_off()

# Display the plot
plt.show()

In this script:

Layer Order: The zorder parameter controls the order in which the layers are plotted. Layers with higher zorder values are plotted on top of those with lower values.
Customization: The map’s appearance is customized for readability and visual appeal. The ax.set_title() method is used to give the map a title.
Custom Legend Handles: Custom legend handles are created for each geospatial layer using mlines.Line2D. This step is crucial for creating a legend that accurately represents each dataset, especially when default legend symbols do not match the plotted data.
Legend Placement: The custom legend handles are added to the map using ax.legend(handles=[...]). Ensure that ellipsis (...) is removed or replaced with actual legend handles to avoid errors.
Axis Removal: The axes are hidden with ax.set_axis_off() to focus attention on the spatial data.

The resulting map:

Data-Driven Decision Making:

Exploring the landscapes of the northeastern part of the state reveals it as the best spot for camping enthusiasts and nature lovers. The region has lush greenery, serene trails, and campsites, offering a perfect retreat from the hustle and bustle of daily life.

The abundance of wild lands in the northeast provides a diverse habitat for wildlife and flora, making it an ideal spot for those seeking to connect with nature. The trails, ranging from leisurely walks to more challenging hikes, cater to adventurers of all levels. Moreover, the strategically located campsites, ensure that every traveler can find a spot that feels custom-made for their camping experience.

Wrangling and Visualizing Geospatial Data — Create your First Map (… 1)

Reading the Data and Extracting Columns

Data-Driven Decision Making:

Written by Ochwada