Graphing Latitudes and Longitudes using Python
A beginner’s introduction to mapping with the GeoPandas library
At first, converting latitudes and longitudes in a dataset to points on a map seems like a daunting task.
However, Python’s GeoPandas library exists for this exact purpose, amongst many others.
This article is a brief introduction into converting latitudes and longitude features into point features, and then graphing those point features using GeoPandas!
Initial Data Import
Adding latitude and longitudes to a map in Python involves two processes:
- import data file containing latitude and longitude features
- import map image as .shp
file
import numpy as np
import pandas as pd# Read New York City apartment rental listing data
df = pd.read_csv(‘../data/renthop-nyc.csv’)
assert df.shape == (49352, 34)# Remove the most extreme 1% prices,
# the most extreme .1% latitudes, &
# the most extreme .1% longitudes
df = df[(df[‘price’] >= np.percentile(df[‘price’], 0.5)) &
(df[‘price’] <= np.percentile(df[‘price’], 99.5)) &
(df[‘latitude’] >= np.percentile(df[‘latitude’], 0.05)) &
(df[‘latitude’] < np.percentile(df[‘latitude’], 99.95)) &
(df[‘longitude’] >= np.percentile(df[‘longitude’], 0.05)) &
(df[‘longitude’] <= np.percentile(df[‘longitude’], 99.95))]
First, we will import apartment rental data for New York City for months April, May, and June of 2016. The data comes from renthop.com, and the initial code comes courtesy of Ryan Herr:
- import numpy for enhanced number manipulation ability
- import pandas for enhanced dataframe manipulation ability
- read data file into workbook using Pandas read_csv()
functionality and assign it to a df
variable
- last step is specific to this dataset; User implemented numpy’s .percentile()
functionality to limit the dataset’s price, latitude, and longitude outliers
Calling the df.head()
function will display our initial dataframe. Note the latitude, longitude and price columns of the rental listings; they will come into play later.
Next, we must import our map as a .shp
file. Since we are graphing points representing NYC rental listings, it will probably be useful to use a NYC map! One can be found at https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm:
.shp
files will often come in a zipped file containing three other file types:
- .dbf
file
- .shx
file
- .prj
file
For the map import to work, all four files must be stored in the same directory. If pointing to a local directory, make sure all files are stored in the same folder. If using a notebook like Google Colab, make sure all files are uploaded:
Once downloaded and organized, it’s time to import our .shp
file! This is accomplished using the GeoPandas library, the .read_file()
function to be specific. We will assign it to the variable street_map
. We will also import shapely.geometry
's Point/Polygon functions and matplotlib.pyplot
here, which will be used later:
# import libraries
import geopandas as gpd
from shapely.geometry import Point, Polygon
import matplotlib.pyplot as plt# import street map
street_map = gpd.read_file(‘/content/geo_export_1f88d1b8–51fd-42aa-84b0–22d7bad6bc6f.shp’)
Creating GeoPandas DataFrame
# designate coordinate system
crs = {‘init’:’espc:4326'}# zip x and y coordinates into single feature
geometry = [Point(xy) for xy in zip(df[‘longitude’], df[‘latitude’])]# create GeoPandas dataframe
geo_df = gpd.GeoDataFrame(df,
crs = crs,
geometry = geometry)
Once the map and data files are stored, its time for the next steps:
- designate coordinate reference system and assign it to crs
variable. For this example we will be using ‘ESPC 4326’. For more information visit http://geopandas.org/projections.html
- add ‘geometry’ column to dataframe. ‘geometry’ column contains the dataframe’s ‘latitude’ & ‘longitude’ columns zipped together using shapely.geometry
's Point function
- create GeoPandas dataframe! This is accomplished using GeoPandas’ .GeoDataFrame()
function, which takes the dataframe df
, crs coordinates crs
, and our new geometry file geometry
as inputs
geo_df.head()
shows us our new GeoDataFrame with the ‘geometry’ column added:
Time to graph!
# create figure and axes, assign to subplot
fig, ax = plt.subplots(figsize=(15,15))# add .shp mapfile to axes
street_map.plot(ax=ax, alpha=0.4,color=’grey’)# add geodataframe to axes
# assign ‘price’ variable to represent coordinates on graph
# add legend
# make datapoints transparent using alpha
# assign size of points using markersize
geo_df.plot(column=’price’,ax=ax,alpha=0.5, legend=True,markersize=10)# add title to graph
plt.title(‘Rental Prices in NYC’, fontsize=15,fontweight=’bold’)# set latitiude and longitude boundaries for map display
plt.xlim(-74.02,-73.925)
plt.ylim( 40.7,40.8)# show map
plt.show()
Steps:
1.) create figure, add axes onto figure using fig, ax = plt.subplots()
2.) add street_map
to axes. Remember, street_map
contains our .shp
file
3.) add geo_df
to axes.
— column='price'
tells Python geometric points with geo_df
's ‘price’ column on the map
4.) add title, set latitude and longitude limits, and show graph!
That’s all there is to it! Big thanks to Ryan Stewart and his article https://towardsdatascience.com/geopandas-101-plot-any-data-with-a-latitude-and-longitude-on-a-map-98e01944b972 for the inspiration!