Geospatial Natural Language Processing

Shairoz Sohail
GeoAI
Published in
13 min readApr 28, 2020

--

How much text and audio content have you consumed this week? Maybe you read a news article on “Flattening the Curve” and watched a Youtube video to understand what that really means? Maybe you finally looked up the lyrics to your favorite song and realized it was actually talking about something completely different than what you expected? Maybe you looked through some forum posts about your favorite hobby, read some Amazon reviews for a purchase you’ve been on the fence about, finally responded after reading that wall of text on the group message…

Well, how’d you do it?

Natural language is a beautiful thing — it allows us to encapsulate complex concepts like the relationship between healthcare capacity and infection rate with nice terms like “flattening the curve.” It allows us to describe abstract things like the way we felt when we attended a moving concert or took the first step on Angel’s Landing. It allows us to explain to someone when they’re using irony incorrectly. But it’s also full of pitfalls; complex amounts of prior knowledge, a dynamic dictionary of words that evolves daily, local and regional phraseology, incomplete descriptions, and complex constructs like figurative language and rhetorical questions. We often lose sight of just how hard understanding language is.

Throw the complexities of geospatial analysis into that and you end up with a lengthy blog post about an amazingly interesting area. Let’s get started.

— Table of Contents —

  • Who’s Asking? Introduction to the Challenges of Modern Day NLP Systems
  • The Geospatial Twist: The Intersections of GIS and NLP
  • Mapping Crime Reports: A Workflow to Analyze and Map Crime Reports
  • Owls and Clouds: Scaling our Workflow to build a Global Awareness Dashboard
  • Building Real Products for Really Hard Problems: Resolving ambiguous Location References for a Real Wold Project
  • Conclusion

Who’s Asking?

There are numerous challenges that both traditional and modern NLP (Natural Language Processing) systems have to deal with. Here’s a sampling of some of the most prominent ones:

Tokenization / Boundary disambiguation: How do we tell when a particular thought is complete? Should we base our analysis on words, sentences, paragraphs, documents, or even individual letters? There is no specified “unit” in language processing, and the choice of one impacts the conclusions drawn. The most common practice is to tokenize (split) at the word level, and while this runs into issues like inadvertently separating compound words, we can leverage techniques like probabilistic language modeling or n-grams to build structure from the ground up.

Part-of-speech tagging: How do we know what part-of-speech (noun, verb, adjective etc.) a particular word belongs to? This seems like it should be a closed-form task, no more difficult than querying an existing dictionary. However, it isn’t that simple, part-of-speech can routinely be contextual and ambiguous.

Building dependency trees: How are the different parts of speech in a sentence related?

Word-sense disambiguation:

“I told my uncle to check if we had enough Tuna and Bass before we left the bank”

Would this sentence make any less sense if Bass referred to the sound attribute and the bank referred to a financial institution? Not grammatically, but most people who read this will realize this is likely referring to the type of fish and a riverbank. Many words have multiple meanings, and deciding between them is hard.

Co-reference resolution:

“He told me we were a bit short on Bass but they’d been less than tasty this season anyway”

The “He” is referring to the uncle and the “they” is referring to the fish. In general, it is not obvious who the pronoun is referring to, or the subject may be much earlier in the text (such as many speakers talking in sequence, with the last one addressing the group as “we”).

Entity extraction:

“I’d venture to say my Apple earbuds would survive a dip in the river”

We realize here that the earbuds are made by a corporate entity, Apple, and not literally made from apples (although this depends on knowing more about the speaker).

The Geospatial Twist

In some ways, you can say almost all unstructured data is spatial. If there are entities in your data, those entities exist in one or more locations. To make things easier to follow through the article, we will deal with 3 special cases of geospatial information being embedded into unstructured data — call these the “extremely lucky,” “just plain lucky,” and “neither lucky nor unlucky” case.

If we’re extremely lucky, locations are given coordinates.

If we’re just plain lucky, locations are given as proper place names or addresses.

If we’re not lucky nor unlucky, those locations are given relative to other locations (“The motel down the street from the town hall”). This is pretty common in informal conversations, and we’ll talk about this difficult case in a bit.

We will not explore the unlucky cases, where something may exist “everywhere,” in the middle of nowhere, or in a separate reality (although even Middle Earth needs a GIS sometime).

Recognizing Entities

Named Entity Recognition, or NER, is a specific subset of entity recognition having to do with, you guessed it, named entities. This is the “lucky” situation from above and an extremely common use case as in almost all forms of non-secure communication as it is much easier to relate information directly by name instead of contextual clues. A robust NER pipeline is one of the first steps towards building a larger Natural Language Understanding (NLU) system and allows you to begin to decompose large volumes of unstructured information into entities and the ways in which they relate (to build for example a Link Chart, an example later on in this article).

Traditionally NER systems have been designed utilizing grammatical rule based techniques or statistical models such as HMM (Hidden Markov Models) and CRF (Conditional Random Fields).

These traditional models have had a different set of problems. Grammatical rule-based systems required months of work by experienced computational linguists. On the other hand, statistical models required large amounts of manually annotated training data. Both types of NER systems were still brittle and needed huge efforts in tuning the system to perform well in a new domain.

Recent advances in deep learning have brought a host of newer techniques for NER systems. Newer systems are based on architectures like BiLSTM-CRF and Residual Convolutional Neural Networks (CNN), which perform remarkably well at the task of named entity recognition.

Mapping Crime Reports

Special thanks to my colleague Akhil Negi on this work

To make all this concrete, let’s build an actual workflow to do geospatial entity-extraction. To do this properly and in a sustainable way, we’ll need a proper GIS (Geographic Information System). The ArcGIS suite of tools is perfect for this, and particularly the Arcgis.learn API provides methods for doing entity extraction with outputs that can be written directly to a spatially enabled DataFrame or Feature Class. This way we can visualize them on a map right away, and more importantly do some real geospatial analytics to do things like map terrorism incidents or track the prevalence of fires. The goal of the pipeline we’re going to build here will be to understand patterns in crime reports for Madison, WI. Let’s get started, you can follow along here or with the more detailed documentation posted here.

We will utilize the police reports from the City of Madison website. We’d like to do this without having to resort to web scraping (there are many guides out there on doing this in Python — please be respectful when scraping websites if necessary), so we provide a downloaded and labeled dataset here. Feel free to skip the next labeling portion if you utilize this labeled dataset.

Using Doccano to label your text

Doccano is an open source and minimalist web tool that allows us to attach labels to our crime reports to create training data. Here’s an example of what it looks like:

To follow along with the tutorial you can utilize the same categories (Address, Crime, Crime_datetime etc.) or if you’re doing this with your own data then create whatever labels you’d like to identify!

Once you’re done with labeling, download the corresponding .JSON file and start up a Jupyter notebook with Python (note this workflow becomes much more robust if you’re using a geospatial Jupyter deployment like ArcGIS Notebooks). The following is the code that will load our exported data into Python and prepare it for training a model:

If your .JSON file was correctly loaded you should see the following DataFrame output:

Now, what we’d like to do is train a model to recognize the different entities that may exist in our current and future crime reports. We can do this via the EntityRecognizer object in the Arcgis.learn API (based off the neural network based one in SpaCy). Code below:

This code will generate multiple intermediate graphs, such as the learning rate finder (based on this paper):

a table of training dynamics:

and a preview of the output once training is complete:

Once we’re satisfied with the model’s performance on our training and validation data, we can throw it out into the world and test it against data that wasn’t part of its training. To inference against new text, we can use the following command:

Lastly, we’d like to turn this into a repeatable and persistent workflow so we’re going to add code to geocode the extracted locations, attach attributes for the type of crime and details, and output the data into an online feature class where we can pull it into a map. Details of these additional steps are available at the detailed guide, and allows us to generate dynamic information products like these:

Hotspot analysis for the crime reports

Owls and Clouds: Scaling our Workflow

Special thanks to me colleague James Jones on this work

We’ve looked at how to build a slick workflow for training a deep learning model to extract entities, write the results to file, and automatically read these files and display analytics on a web map. This was all done using standard off-the-shelf packages. To scale this we’ll need a few more moving parts.

Microsoft and Netowl are two of Esri’s partners doing great work in the area of NLP. To take our workflow to the next level we are going to integrate with some of their services to provide a robust backend for the language processing and, using ArcGIS GeoEvent Server and ArcGIS Dashboards applications, build a near real-time dashboard that provides global situational awareness by polling whatever list of RSS feeds we provide.

Yes, global awareness.

Global situational awareness dashboard, map symbology showing clusters of negative sentiment

We won’t really be able to build something at this temporal and geographical scale on puny desktop machines. We’ll need the horsepower of not only enterprise GIS (via GeoEvent Server and ArcGIS Dashboards) but also a powerful NLP parser like NetOwl and a full on cloud pipeline via Microsoft Azure. The above is achievable thanks to a perfectly orchestrated series of Azure LogicApps — chained together into an automation pipeline that feeds into GeoEvent Server and finally, displays the results in the dashboard seen above. Let’s break out what this wall of terminology really means.

Firstly, here’s a picture of the pipeline:

Let’s start with the first three steps. When a HTTP request is received (when one of our subscribed RSS feeds is updated), we process the request to a JSON format that look like this:

Then we download the posted article. Afterward, we use a short script to further parse the data to send it into the Azure Cognitive Services API — particularly the Text Analytics API. This will return key information about the text including sentiment, polarity, and subjectivity. We can combine this information with the original text and then further send it into the NetOwl Entity Extraction API — which will give us back the detected entities in the text and their associated ontologies:

Finally, we combine the article with the attributes attached from the Cognitive Services API and the NetOwl API and create a streaming output that can be fed to GeoEvent (more on this below). The final output looks like this:

In general, ArcGIS GeoEvent is designed for consuming and parsing large volumes of geospatial streaming data. In our case this data is a REST endpoint in .JSON format, generated by our parsed articles. The resulting instance, sitting on a powerful cloud or local server machine, is nothing spectacular to look at:

but the job it’s performing definitely is. Adding GeoEvent into our system allows for the Azure instance to focus on the heavy NLP lifting, and the results to be mapped in near real-time while at the same time auxiliary data files (such as maps of emerging hotspots, feature classes of regional temporal patterns, and raw data tables of entities and sentiment) will be generated. We can work directly with the data in an ArcGIS Notebook or pull it into a specialized desktop application to do further analysis

Building entity link charts from news articles in ArcGIS Pro

Building Real Products for Really Difficult Problems

Special thanks to my colleague Rob Fletcher on this work

Let’s notch up the difficulty a bit more.

Imagine a government call center where citizens would call in to report all kinds of non-emergency situations — anything from potholes to having their office overrun by cats (a story for another time). However, imagine these citizens call in with only approximate or colloquial descriptions of the locations they’re referring to, something like “down the street from the cemetery,” “two blocks down from the central bank,” or “between the embassy and park.” Note that this is basically the “not lucky nor unlucky” scenario we described above and the central focus of a project we got from the government of Abu Dhabi.

The operators at this call center used their own knowledge of the city or had to reach out to multiple sources to help resolve the caller’s described location, and with high call volumes there was just sometimes not enough time to figure out the specifics and requests had to go to a backlog and be reached back out to. They needed a way to get an approximate grasp of the locations being referred to in the call in near real-time.

There are a number of challenges with this work, separate from just call volume and implicit descriptions. Since this was a relatively new initiative, we had access to little to no ground truth data on what the locations actually ended up being. With no explicit addresses being described in most calls we couldn’t just use a keyword lookup and without a ground truth dataset we couldn’t try to train a complicated model to figure out the addresses. We turned to ideas from Bayesian modeling.

Firstly, we built a multi-scale grid over the city:

then, we associated with each grid cell a specific probability (1/#cells, to start) of it being the location of interest. We then built a list of different types of “evidence” — pertaining to location — that we’d use to update each grid cell’s probability of being the location of interest. This evidence was separated into several sub-types, such as address evidence (an exact street address), POI evidence (such as a central bank, bridge, port etc.), directional evidence (N/S/E/W), distance evidence, street evidence, and several others. A mention of each of these types of evidence would prompt a geographic search against related features (such as searching for the polyline feature designating the mentioned street) and a corresponding probability update on the grid cells.

The grid cells would be colored by their current probability so the operator would be able to, in near real-time, see parts of the maps light up as likely candidates for the location being mentioned on the phone. In this way, the operator could help form responses directly for the caller during the call.

Using STT (Speech-To-Text) software this would be integrated directly into the call center and since this was made as a web app (using the ArcGIS Javascript API) it was easy to store the intermediate results for historical processing or analysis. While our method works well heuristically, it requires a lot of discretion and fine-tuning. In a similar case where training data was available you’d likely get even better results from training a entity extraction model or using a pre-built neural language model like BeRT or OpenGPT.

Conclusion

There is a wealth of information present in natural language that, as we’ve seen, can not only provide useful and actionable insights but also help us to make beautiful and dynamic maps that highlight the current operational picture. What we’ve seen however, is just a small sample of the true power of combining unstructured data, ArcGIS, powerful NLP engines like NetOwl, and cloud infrastructures like Azure. The real enjoyment here is the sheer volume and availability of unstructured data and the unique problems we can solve-whether it’s giving a voice to those who cannot speak, summarizing research about a spreading pandemic from medical research papers, or understanding water shortages through social media posts. A lot of these use cases are begging to be made geospatial, to really go beyond just highlighting time and place in a sentence. NLP is one of the few areas where I have to actively limit my imagination, because the number of compelling use cases far outnumbers the time I may have to implement them. I hope this post has helped you understand not only the technical specifics of the field, but also helped inspire you to build some of the future geospatial NLP products that will change the world and the way we interact with it.

Thanks for reading.

--

--

Shairoz Sohail
GeoAI

AI scientist developing methods for visual recognition and understanding