GPT-4 for Defense specific Named Entity Extraction

The LLM OSINT Analyst Explorer Series: Episode 2

5 min readApr 14, 2023

This article is the third installment in the LLM OSINT Analyst Explorer series. Be sure to check out the introduction, where it all began, or scroll to the bottom of this article to explore the entire series!

Despite their recent emergence, Large Language Models (LLMs) have already revolutionized various industries, ranging from marketing and technical writing to advanced programming. However, the potential for generating structured insights in specialized fields like defense remains predominantly untapped. Defense content is characterized by intricate terminology and expert knowledge, which may not be adequately represented in GPT-4’s training data. Intrigued, I decided to experiment and began sharing the results on my LinkedIn network. After receiving significant engagement on my initial post, a friend suggested that I transform this exploration into a series of articles, and so, here I am!

To evaluate the utility of LLMs for intelligence analysts and their potential to expedite core activities (refer to my Introduction article for more details), we require a reliable scenario. In this experiment, we aim to test the efficiency gains derived from using LLMs to rapidly construct a knowledge graph (KG) related to the latest US Defense budget publication. This KG should assist us in understanding the supply chains behind various items of interest mentioned in the document.

Our first step is to extract and disambiguate these items. In doing so, we will assess GPT-4’s capabilities in defense-focused entity extraction and disambiguation. Additionally, we will examine the challenges associated with achieving state-of-the-art results in this domain.

GPT-4 and Defense focused Entity Extraction commands

Quick disclaimer

In this experiment, I will be employing the graphical user interface (GUI) of ChatGPT coupled with the most powerful version of the OpenAI LLMs suite: GPT-4. The GUI is a potent tool, but it does have certain constraints, particularly when it comes to constructing custom knowledge extraction pipelines. For instance, it lacks the ability to extract text from online articles, as it is not designed to access the internet, and it does not seem to retain tokenized text in memory when requested to incrementally build its text analysis from one prompt to another (we will expand on this point later).

The ChatGPT GUI. Note the option to select the GPT version you want to use, and the current limitation of messages for GPT-4

To circumvent this limitation, I will later use ChatGPT’s API to create a more robust knowledge extraction pipeline. The first component will be a versatile HTML/Javascript extractor, which can efficiently and accurately obtain relevant text data from a wide range of online sources.

By constructing a custom knowledge extraction pipeline, we can harness the full potential of ChatGPT’s advanced NLP capabilities, create our own curated knowledge base, and develop semantic searches tailored to our goals. As a result, we can avoid the more contentious applications of ChatGPT, such as direct fact-checking or insight generation. Please do read the limitations section of OpenAI technical assessment of GPT-4.

First thing first, extracting the entities from a targeted text

To begin, I have selected a short article from DefenseOne that discusses the latest US Defense budget publication. This article will serve as the base text for our entire experiment.

The first step in this experiment is to extract the different entities of interest from the article. By identifying these entities, we can gain insights into the priorities of the budget and which branches of the US forces will benefit the most. These entities will serve as anchors upon which we can build more complete profile and knowledge graphs, enabling us to gain a more comprehensive understanding of the topic.

We are constructing a straightforward prompt that requests a basic extraction task for custom categories: “persons,” “civilian or military organizations,” and “military equipment.” Some of these categories are conventional classes used in building NER models, such as “Person” and “Organization.” However, two of them — military organizations and equipment — are quite unique and would likely not be included in any pre-trained NER engine. Our goal is to evaluate GPT-4’s performance in custom NER tasks without relying on specific training or examples, using a technique known as “zero-shot learning.”

This is the prompt:

"Could you extract all military equipment, civilian or military organisations and persons entities from the following text: "INSERT TEXT""

And this is the result:

What can we learn from this output?

Firstly, GPT-4 effectively “understood” all the classes I referred to, accurately identifying both named and unnamed instances of military equipment and military organizations — two categories that I consider to be highly specialized for any NER model.
Secondly, its F1 score, which measures the frequency with which the model extracts the correct entities, overlooks them, or miscategorizes them, is quite high. By running additional prompts asking it to highlight all the extracted entities, I can visually assess if any were missed. Without delving too deeply into scientific measurements, I can confidently state that GPT-4 extracts approximately 85% of the targeted entities correctly, based on quick verifications and calculations. These results are truly state-of-the-art.
Thirdly, the model does miss some indirect mentions of specific entities. For instance, it detects “the Biden administration” but overlooks “the administration.” This issue could become problematic when transitioning to relationship extraction, as it might miss crucial insights connected to an unnamed mention of an entity.

So what? And what’s next?

We know now that GPT-4 is defining new SOTA standards for zero-shot Named and Unnamed Entity Extraction. This is a key insight that tell us we can starting trusting this model to extract the key building blocks onto which we will build our knowledge base. These first observations will need to be validated and confirmed by further tests, but it give me the confidence to move to the next phase of the experiment: entity disambiguation against open source knowledge bases (i.e linking the “F-35s” backed to a dedicated F-35 lightning II profile that will unlock all the information about this entity).

Stay tuned !