Geosupport❤️Python !

sptkl
NYC Planning Tech
Published in
3 min readJul 26, 2019

Geocoding has been a difficult task for our data production processes because we were missing an easy, scalable and reproducible geocoding interface. Our old geocoding process relied heavily on Geoclient, an API service created by the Department of Information Technology and Telecommunications (DoITT). For larger datasets, we used mainframe computers running Geosupport for batch data processing (If you are not familiar with NYC’s geocoding products, read this quick overview of NYC Geocoding tools). Over the years, different geocoding solutions emerged for different purposes, yet we couldn’t find one tool that gave us complete control over high volume geocoding with high efficiency, that is until we came across the python-geosupport package, a Python binding for NYC Planning’s Geosupport Desktop Edition.

This tutorial will walk you through how to set up the python-geosupport computing environment on your machine and showcase how you can use python to geocode while achieving mainframe speed!

  1. First we need to download the latest Geosupport desktop version and install python3, if you don’t have it already

Note: the above instructions are meant to be run in a terminal for Linux users. If you are a windows user, the set up process is similar; check out the python-geosupport repo for more information.

2. To test out the package, launch python and try the following python commands:

3. If the set up was successful, the geocoded results should be returned and the result should look something like the following. The result is truncated because function 1B returns 193 different attributes for each address:

4. Usually, we only need a subset of these fields, and the following shows how we parse through the geocoding result:

If you successfully get python-geosupport working following the instructions above, you will see how fast and flexible it is to geocode using python-geosupport. The next part of this tutorial will discuss how to bring this process to scale.

Since geocoding is a CPU intensive task (you can observe spikes in CPU activities in your task manager during geocoding), we introduced multiprocessing to make geocoding much faster. By default, python only runs on a single CPU, but using multiprocessing allows you to run processes across all your CPUs. You can check out our geocoding workflow that leverages multiprocessing below:

Currently, by using the python-geosupport bindings along with multiprocessing, we are able to geocode 1 million address records in 3 minutes using a 2.7GHz Core i7–7500U computer running Ubuntu 18.04.

The python-geosupport package presented a lot of opportunities to our overall data production process. For the first time geocoding is not a bottleneck for our workflows, which means we can iteratively improve our geocoding process. Geocoding in python also allows us to take advantage of a robust ecosystem of Python packages for different tasks, including address parsing and cleaning.

By using Flask, a micro web framework, we can easily serve the python-geosupport package as a web service (similar to making our own GeoClient) so we don’t have to deal with the complicated setup processes ever again! Learn more by reading our tutorial on how to create a Geosupport web service.

--

--