AI in Government — Part One: Efficiencies through NLP

Published in

RS21 Blog

10 min readJun 12, 2019

This is the first of a three-part blog series that focuses on the most important Artificial Intelligence (AI) analytical methods the government needs to enhance their AI readiness.

Yes, there are a TON of components involved with AI, including hardware (CPUs, GPUs, servers, etc.) and frameworks (R, Python, Tensorflow, Keras, etc.); but this series focuses on the important analytical pieces that can quickly become operational for government entities.

The series is as follows:

Part One — AI in Natural Language Processing
Part Two — AI in Data Mining
Part Three — AI in Signal Processing

*“Terminator”* by *dmoberhaus* *is licensed under* *CC BY 2.0*

A little background

Many people first encountered AI as kids — watching popular science fiction movies like Terminator’s post-apocalyptic world of killer robots. While science fiction images are often what first come to mind when thinking of Artificial Intelligence, a more accurate description of AI is that it’s a way for machines to learn from trial and error and experience to become intelligent enough to perform tasks in a way that humans would.

Alan Turing’s The Bombe is a great example of very (very!) early AI that augmented human’s capabilities in WWII to decode German messages, which led to a speedy, favorable outcome for the Allies.

***Alan Turing’s The Bombe*** (*“Bombe”* by *chechar* *is licensed under* *CC BY-NC-SA 2.0*)

Many people still view AI through that science fiction lens, picturing films like the Terminator, or Robocop, or even Her. But as captivating as these stories of AI are, they don’t fully capture what’s really occurring in AI today.

Sure, organizations are creating robots to do mundane tasks like lift boxes or play a drum, which does, in turn, advance some AI research; but it doesn’t help organizations increase operational efficiency or become more knowledgeable.

For people to really want to use AI daily, it needs to be something beyond ‘exciting’ in a science fiction context — it needs to be approachable and practical. It needs to augment everyday tasks, and not in order to replace human roles, but rather to make those roles more efficient and prominent.

Imagine reading and trying to summarize hundreds of scientific and technical documents into different topics that your research could build upon. It’s an incredibly time-consuming process! However, with an AI system to summarize the nuggets of information in these documents, research could be more comprehensive and completed a lot sooner — allowing people to focus their efforts on other fronts.

Fortunately, that AI technology exists, and your organization could use it.

Imagine AI augmenting your workflow so that the thousands of hours spent on reading memos, plans, assessments, and other documents in hopes of finding critical pieces of information was decreased to minutes.

Imagine the potential impact of that time saved:

Is it creating better organizational efficiency?
Are you able to focus your efforts on new fronts?

The answer to these questions is probably ‘yes.’

If you’re in the government space, then the answer is ‘definitely yes’ and, in many cases, this type of AI system could have lifesaving capabilities.

So, let’s describe it.

A sound natural language processing (NLP) AI system should have several key features to accurately help augment cross-document topic identification and summarization. And yes, these key features would have made my dissertation easier; but far more importantly, they strongly align with the needs of government organizations, departments, and teams.

Government memos, plans, assessments, and other documents can be lengthy and verbose, much like the scientific articles my dissertation used. What the NLP needs to be able to do is examine text through a contextual and NOT a content lens.

Content is easy to understand compared to context.

Take the most complicated book or journal article you’ve ever read. Count the paragraphs, number of total words, frequency of words, and the number of figures and tables. You just examined that document through a content lens. Context, on the other hand, needs to understand the concepts in that document. So, rather than counting or identifying the frequency of words, you need to absorb the meaning of the words, the topics, and ultimately the knowledge embedded in that text. When AI can do this, it can easily augment current government workflows by analyzing cross-document text information in a way that is secure and more efficient.

The amazing thing is that AI as NLP can do just this without starting a robot apocalypse. For example, RS21 employs a two-step method that examines multiple text documents for contextual information in a similar manner to how we read and understand text.

First, the algorithm (in a mathematical way) identifies words that are contextually similar.

For instance, the algorithm can identify that dog and puppy are related; but it can also identify analogies, in this case relating dog and puppy to cat and kitten. Just this initial step has a profound impact on the way we can quickly analyze text. From an infrastructure standpoint, it could relate emergency room to hospital, and substation to power station.

Part two of this AI NLP system looks beyond identifying contextual meaning of individual words and builds context through the sentence, just like humans. Just as any of us would find it hard to interpret a book by randomly reading words, so do our NLP algorithms. For example, there are a lot of contextual, structural, and sentiment differences between the following two sentences:

“I need a break from hammering.”
“I will break the hammer.”

AI as NLP can identify the contextual, structural, and sentiment differences between sentences (even those presented above). Therefore, it can augment our ability to quickly process large, wordy text documents that sit on our shelves collecting dust, get haphazardly analyzed, or are not cross-referenced with other documents.

To put it into scale, we analyzed hundreds of thousands of sentences in infrastructure resiliency related documents that each have multiple but independent pieces of important knowledge for national security. This library of documents has never been systematically cross-analyzed. So, our clients had little information about how these documents related to each other. We were able to augment their ability by cross-synthesizing the documents to identify new themes that are contextually meaningful.

The question is, how is this done?

Our tool ingests a mix of text documents from our clients and begins to clean the text for pieces that don’t provide value. Our cleaning process is flexible to meet the needs of our clients but typically includes removing page numbers, bibliography content, figures, titles, authors, and content in headers and footers. A clean library of documents is then ready for our NLP system to process and the user to engage with.

There are tangible outputs of our system as well — being able to visualize the thematic relevance of different sentences or passages that may exist in independent spaces.

For instance, you may want to identify the medieval battle scenes in Lord of the Rings, Game of Thrones, or Braveheart. Our NLP system can automatically search the text, identify passages, and score those passages on how much they relate to a battle scene.

Users can query the context of their document through two powerful features:

a Google-type platform where users can type questions into our search window and retrieve text that’s relevant to their search, and
a voice recognition feature that uses translation algorithms which allow users to ask the system questions and retrieve relevant answers.

Query by Glenn Carstens-Peters on Unsplash

As an example, the user can ask the question, “Who is Gollum?” and the AI system will respond with, “Gollum is a fictional character in Lord of the Rings whose purpose is two-fold — to help Frodo and Sam reach Mount Doom, but also steal the ring for himself.”

The organic nature of the interaction between our NLP system and the user allows for fluid inquiry — a user could easily ask our system, “Where is Mount Doom?” or “What are Orcs?” and the system would respond accordingly.

It’s designed to produce flexible outputs so that our clients can have a tailored system that addresses their needs.

Given these features, our NLP system has a wide range of applications in the government space

At a high level, the idea of our NLP system is to identify common themes across multiple documents in an automated fashion, rather than sifting and sorting through each document. For instance, research-based intuitions — such as the United States Geological Survey (USGS), National Institutes for Health (NIH), and Environmental Protection Agency (EPA) — produce hundreds of thousands of research papers and assessments each year. Being able to quickly sort through and identify common themes is imperative to expedite research and application of research activities.

National Institutes for Health (NIH)

Imagine the National Institutes for Health (NIH), who are responsible for biomedical and public health research, being able to comb through thousands of written reports to identify potentially emerging diseases, appropriate drugs to counter the diseases, or identifying consensus around the treatment of certain heart conditions. Left un-augmented, it would take researchers considerable effort to mount an appropriate literature review to identify such consensus, which ultimately hinders research progress.

Department of Defense (DoD)

NLP can also be used for event extraction throughout the different government departments — take the Department of Defense (DoD) as an example. As a strong military-based department, the DoD has amassed documentation for thousands of military response events in very complex environments. These documents can be strained to pinpoint where common mission failures and successes are occurring, and the relationship between the type of environment.

Furthermore, the data-rich accumulation of transcripts of recorded chatter from adversaries presents opportunities to find overlapping discussion points that might pertain to the adversary’s financial dealings or connection with extremist networks and entities. Screening this information by hand would be unnecessarily burdensome to analysts, but our NLP approach can augment analyst workflow to create quicker clarity in their outputs.

Federal Bureau of Investigation (FBI) + The Small Screen

If you’ve ever seen the show Manhunt: Unabomber or Mindhunter on Netflix, you’re already exposed to some early NLP process. As a quick synopsis, Manhunt: Unabomber is about the FBI trying to catch Ted Kazynski, who was sending IEDs through the postal service. Ted was a well-educated, brilliant mathematician, and would often mail grammatically elegant manifesto messages to the FBI and press. The manifesto messages were deciphered through an early version of natural language processing, and linguistic expression that allowed the FBI to narrow down their suspect list.

As an example, Ted states:

“You can’t eat your cake and have it too,” which is counter to the common saying: “You can’t have your cake and eat too.”

Ted’s statement comes from middle English, and technically speaking, it’s the correct version.

Mindhunter follows a similar FBI track, but in this case, the show examines the minds of serial killers through interview assessments. These interviews are dissected to find common themes among known serial killers, and to identify their cognitive mindsets.

These shows highlight two unique cases within the FBI. However, the FBI still faces similar cases when it comes to identifying serial killers, terrorism cells, and even computer hacking. To help augment their ability to identify these actors, our natural language processing approaches can take FBI inputs (such as suspect interviews, letters, etc.) and further identify, compare, and contrast the contextual meaning behind those messages to determine suspect networks and aliases.

Food and Drug Administration (FDA) + The Opioid Crisis

One of the more pressing issues facing our nation is the opioid crisis. Opioids act as powerful pain killers, but over-prescription has led to an epidemic of drug abuse, dependency, and overdose. With millions of Americans suffering from opioid addiction each year, the FDA is taking action to find new and innovative ways to identify and address opioid misuse.

These efforts require sifting through libraries of medical studies and records, engaging with new drug development, and deploying health resources to support opioid victims. With so many competing demands, FDA analysts could be supported by new processes which would shift their energies and resources from research to action.

Instead of reviewing a nation’s worth of medical records, with NLP, the FDA could quickly sift through the expanse of data and have their health professionals only engage with the most valuable data. To supplement their collective analytical power, deploying AI could allow the FDA to identify the factors which make individuals high-risk for opioid abuse, and direct resources to the areas in most need. With AI supplemented analytical power, the FDA would have stronger tools to deploy against the opioid crisis.

Moving Forward

The days of Terminator might be far into our future, but AI is here now — and it’s designed to help accelerate our everyday tasks and make a positive impact on the efficiency and effectiveness of the way we conduct business and solve problems. The government space is ripe for these opportunities.

The next installment in this series will examine AI in processing of large, messy datasets that often sit unanalyzed in government spaces. More specifically, the next blog will focus on supervised and unsupervised AI approaches on timeseries and non-timeseries data for classification and regression prediction and tipping point identification. Stay tuned!

RS21 develops interactive data analytics and visualization products.
We blend an advanced computational capability with a network of world-class experts to provide actionable insights to government organizations including:

Department of Homeland Security (DHS)
Cybersecurity and Infrastructure Security Agency (CISA)
Federal Emergency Management Agency (FEMA)
Transportation Security Administration (TSA)
United States Coast Guard (USCG)
United States Agency for International Development (USAID)
National Laboratories: Argonne National Laboratory, Idaho National Laboratory, Los Alamos National Laboratory, and Sandia National Laboratory

RS21 is a HUBZone Certified Small Business + GSA Schedule 70 Company