Geocoding the World using Google API and Python

Manil Wagle
5 min readJan 29, 2022

--

One of the common challenges while dealing with geo-spatial data is turning address data into coordinates values so that it could be mapped, analyzed, joined with other data sources and so on. At least it was my struggle to get the right coordinates for the addresses when I first started my career in commercial Real Estate. So, the goal of this blog is to share some the learning I had, and method I have developed to geo-code any location in the world. For those of you who are interested in reverse geo-coding, I will be writing separate blog in coming days.

Let’s start with simple definition of geo-coding. “Geocoding is the process of transforming a description of a location — such as a pair of coordinates, an address, or a name of a place — to a location on the earth’s surface.” ArcGis

In the section below, I will provide step by step detail for the entire workflow. I will be using sample csv file which can be found in my Github repo along with the code.

  1. Set up Google API Key:

This is straight forward process, you will need to sign in to Google Developer Console and enable Geocoding. This will let you generate API Key that will be required to generate the coordinates from the list of addresses.

2. Set up the packages

Let’s start by adding the necessary packages in our environment.

# Load the required libraries
import pandas as pd
from pandas_profiling import ProfileReport
from googlemaps import Client as GoogleMaps
import googlemaps
import gmaps
from keplergl import KeplerGl
import geopandas as gpd

All the libraries in the first part are required for geo coding where as Kelper and geopandas are required for mapping. If you don’t have these packages installed, you will need to add these packages using pip install.

3. Load and explore the data

# Load and explore the data
addresses = pd.read_csv("data.csv")
# Look at the top few rows
addresses.head()
# Understand data types
addresses.describe()
First few rows

Data looks clean 😊

Understanding columns values

From describe function, we can see that there are 13 unique addresses, 5 unique cities and 2 unique countries in the dataset.

Pandas profiling is one of my favorite features. It gives you opportunity to explore the entire data with just single line of code.

# Pandas profiling
prof = ProfileReport(addresses)
prof
Snapshot of pandas profiling

4. Combine all the columns

Here, I will combine all the columns related to an address so that its easier for google API to understand the address. It also increases the accuracy of geo-coding.

# Create a new column called Full Addressaddresses['Full_Address'] = addresses['Address'].astype(str) + ',' + \
addresses['City'] + ',' + \
addresses['Country']
addresses.head()
New column Full Address can be seen in the far left

5. The Google API Key

This is where we need google API key. Please use your generated key. I have already deactivated my key, so it won’t work 😊

# This is where we will need the API key
gmaps = googlemaps.Client(key='AIzaSyCsUzvhtiWMDzm4jBHbb1DhLBMBeDsf6B8')

6. Pass the full address column through the API key

It might not be necessary to create a copy of data, but I have found that doing this speeds up the process especially when you are working with thousands of records. So, basically, I am just dropping all the columns except the Full Address column.

# Lets just pass the full address column to the google APIaddresses1= addresses.iloc[:,-1:]
addresses1.head()

Here, I have created two empty columns called long and lat to store the coordinates that I will get back after running the addresses against the API Key.

addresses1['long'] = ""
addresses1['lat'] = ""

The loop here passes all the addresses to the google API key and returns the new data frame with coordinates for each address.

for x in range(len(addresses1)):
geocode_result = gmaps.geocode(addresses1['Full_Address'][x])
addresses1['lat'][x] = geocode_result[0]['geometry']['location'] ['lat']
addresses1['long'][x] = geocode_result[0]['geometry']['location']['lng']

In the block of code below, I just added the results back to the original data frame. And just like that we now have latitude and longitude of all the addresses.

# Lets join the results with original file
addresses['Lat']=addresses1['lat']
addresses['Lon']= addresses1['long']
addresses.head()
The final data frame

7. Plotting the coordinates in the map

Let’s have some fun by plotting the coordinates in the map. Kepler gl is one of my go to mapping packages. We need to change the data frame into geo-data frame to plot the map in Kepler.

#Create a basemap 
map = KeplerGl(height=600, width=800)#show the map
# Create a gepdataframe
gdf = gpd.GeoDataFrame(addresses, geometry=gpd.points_from_xy(addresses.Lon,
addresses.Lat))# Add data to Kepler
map.add_data(data=gdf, name='kepler map') # adding geo enabled dataframe to map

And now, we have a nice-looking map. Lots of customization is available for the map to make it more eye catching.

Plotting the coordinates in the map

Summary

I hope you found this article useful and you enjoyed reading the article as much as I enjoyed writing it.

--

--