Generating spatial data from free form text

How NYC Dept. of City Planning puts non-spatial data on a map

Amanda Doyle
NYC Planning Tech
5 min readFeb 14, 2018

--

New York City has 59 Community Boards, each representing the residents living within a geographic area of the city. The members of these boards also live within its bounds and play an integral role connecting citizens with City agencies, which they do in many ways. One way Community Boards connect with City agencies is by what we call Community Board Budget Requests.

Community Board Budget Requests, we’ll call them Budget Requests for short, are submitted by each Community Board every October and contain a list of that community’s priority capital and programmatic investments intended to better that neighborhood. These lists are submitted to the Office of Management and Budget, who then distributes these Budget Requests to the respective City agency. The City agencies then take these Budget Requests into consideration when developing the upcoming Capital Budget.

By sharing the neighborhood’s priority needs for services and infrastructure with City agencies, the Community Boards via Budget Requests can influence and shape investments across the city.

New York City’s 59 Community Boards

Explore: New York City’s 59 Community Boards

Historically, these Budget Requests have been shared with City agencies via spreadsheets, which is not always the easiest way to consume this information. City agencies know where their assets are, so seeing the locations of the requests on a map is a more intuitive and helpful way for them to explore the requests. Additionally, by viewing the requests spatially agencies can better understand what requests are being made within a given community, what is being requested of them across the city, and what is being requested of partner agencies.

Our problem was that we couldn’t create the maps since the submitted Budget Requests do not contain spatial data, so we had to generate our own.

Generating Spatial Data

To geolocate these Budget Requests, we used three methods: automated georeferencing, fuzzy string matching, and manual data creation.

After Community Boards submit the Budget Requests, our colleagues on the Planning Coordination team receive the requests in Excel spreadsheets. The requests occasionally include addresses, and often include descriptive information that can be matched to a location, but again no spatial data.

Automated georeferencing

For records with free form address, intersection, or on-to-from information (e.g. on 5th Avenue between 42nd and 48th Street) we leveraged City Planning’s geocoder, whose web API is called Geoclient. Geoclient returns the latitude and longitude (lat/long) of an address or intersection, and the beginning and ending lat/long for any street segment. (Unfortunately, we weren’t able to link the information returned by the Geoclient API to LION, the city’s street centerline database, but that’s another blog post.)

Using the lat/long data returned by the Geoclient API we made spatial data! This method got some requests mapped, but there were more requests to go.

Fuzzy string matching

This method may be the most fun.

For records that couldn’t automatically be georeferenced via the method described above we had to get creative since we needed to decipher location information from the description of the requested project.

For context here are some sample descriptions:

Build the Green Outlook, Riverside Park (DPR)
Bedford Branch Library needs a new boiler and window restoration
Upgrade the FDNY Engine 307 kitchen.
P.S. 125 Playground
Handicap Accessibility in Front of the 46th Precinct.

Now, you (maybe with the help of Google) can read “Handicap Accessibility in Front of the 46th Precinct” and know where that project should be mapped to. The trick is getting a computer to know to map “Handicap Accessibility in Front of the 46th Precinct” to NYPD’s 46th Precinct station house. Inconsistent site name patterns, such as 46 PCT. vs 46th Precinct, added to the challenge and complexity of automating the mapping of projects based on description.

Given these factors, we developed a series of SQL LIKE statements and Fuzzy string matching algorithms that matched words in the descriptions to names of places in other spatial datasets. The two reference datasets we used to compare against were City Planning’s Facilities Database, which maps +36k government facilities or program sites throughout NYC, and NYC Parks Properties, which has polygon geometries for all 1,969 NYC parks. These two datasets capture the majority of the City’s building or park based fixed assets.

This process mapped a fair number of projects by matching “Build the Green Outlook, Riverside Park (DPR)” to “Riverside Park” in the Park Properties dataset, and “Upgrade the FDNY Engine 307 kitchen” to “Eng 307, Lad 154” in the Facilities Database, for example. Though, it didn’t map everything…

Manual data creation

Without over engineering our solution, we could not automatically map all requests, such as this one:

Provide funds for improvements to areas under and surrounding the Brooklyn Bridge, including rebuilding active recreation space underneath the bridge as well as repairs to the staircase on Frankfort Street.

So we had to do some manual work.

Budget Requests are classified into two categories: Capital and Expense. Generally, Capital requests are for large scale investments in infrastructure and impact the built environment, while Expense requests are for funding programs and other repeating costs of government services.

Additionally, Budget Requests are grouped into two impact types: Site specific and Non-site specific. Non-site specific requests are for projects that are not necessarily tied to a discrete fixed asset, such as:

Increase funding for employment programs, particularly for tech-industry opportunities.

Whereas, Site specific requests can be associated with a known location, like:

Ravenswood Playground: Repaving & Resurfacing

We focused our manual mapping efforts on Site specific Capital requests.

To map these records we used a simple Leaflet Draw GeoJSON creator built by our colleague Chris Whong, which outputs code for a geojson file. The tool is simple enough for most people to create spatial data regardless of skill in GIS.

Repo: https://github.com/NYCPlanning/simple-geom-editor

Final Product

In the end, we mapped 95% of all site specific records, which we published on an online map currently accessible to City employees. Using this map, Planners from all City agencies can filter and search requests to explore projects requested by Community Boards across the city.

Our goal is that by making Community Board Budget Requests more accessible and easier to explore through a map, City agencies will discover synergies between existing capital projects and these Budget Requests, so that in the end more Budget Requests come to fruition.

Mapped Community Board Budget Requests!

We plan to make the data and map public after City agencies have reviewed and responded to all Community Board Budget Requests.

--

--

Amanda Doyle
NYC Planning Tech

Urban scientist / Geographer / Data engineer / City enthusiast