THE LLM OSINT ANALYST EXPLORER SERIES

Brief Introduction to LLMs and Their Transformer Logics for Defense Content Analysis

Episode 1: Powerful but senseless algorithms

11 min readMay 5, 2023

Image generated by Stable Diffusion “AI for Defense Insights generation — Clear lines”

Attention is all you need

Throughout my journey as an Intelligence Analyst and later as a Product Manager specializing in Machine Learning applications for Defense and National Security, I’ve encountered a plethora of remarkable technologies. I even had the opportunity to build bespoke Natural Language Processing applications for some operational department of the United Kingdom’s Ministry of Defence (UKMOD). At that time, we were mainly using the Spacy 2.0 framework, with bespoke trained models for specific Named Entity Recognition (NER), Disambiguation (NED) and Abstractive Summarisation. We thought we were achieving some very interesting results, some close to the State of the Art (SOTA), but for specialised Defense content.

Then, in 2017, a research paper called “attention is all you need” was released. And that changed EVERYTHING.

Before we delve into the fascinating world of LLMs for Defense and National Security applications, and how these models will soon enable all the capabilities described in my introduction article, it’s essential to understand what LLMs are, how they operate, and why they’ve revolutionized the field of Natural Language Processing.

The rise of the transformers

By now, chances are you’ve come across or dabbled with chatGPT or even the impressive GPT-4. These so-called LLMs (Large Language Models) are part of a more extensive group known as foundation models, all powered by the remarkable “Transformer” technology. GPT-4, in fact, stands for “Generative Pre-Trained Transformer, version 4.” Others LLMs include Google’s BARD or Command XLarge from Cohere.

These innovative models have completely revolutionized the pre-2017 SOTA standards, making it possible to extract valuable insights from highly specialized content sets, such as Defense and National Security, without the need for years of data science training and development. And all of this was made possible by one discovery, the “Attention mechanism”, the underlying logic and architecture at the heart of today’s Large Language Models.

What are they and how do they work?

Transformers, of which LLMs are the most recent evolution, are a type of neural network architecture introduced in the paper “Attention is All You Need” by Vaswani et al. in 2017. They have become the foundation for many state-of-the-art natural language processing (NLP) models, including GPT-4. The core idea behind transformers is the attention mechanism, which allows the model to weigh and focus on different parts of the input when generating an output.

Let’s this break down in 4 specific and complementary concepts (if you want to know more about the sciences behind these, please have a look at this excellent article from the Towards Data Science Medium community):

Self-Attention: Think of this as the model’s ability to grasp how words in a sentence relate to one another. It scores the interaction between word pairs, with higher scores indicating stronger relationships. This helps transformers make sense of sentences, just like humans do!
Multi-Head Attention: Transformers don’t rely on just one “attention” component; they use multiple “heads” to capture different aspects of the input. This allows them to better understand intricate language patterns and make smarter decisions.
Positional Encoding: Knowing the order of words is crucial for understanding a sentence. Transformers use a technique called positional encoding to keep track of each word’s position in the sequence, ensuring they don’t miss the context.
Encoder-Decoder Structure: Transformers are built with two main parts — an encoder and a decoder. The encoder analyzes the input, while the decoder crafts an output based on that analysis. This structure is incredibly useful for tasks like machine translation, where the encoder processes one language and the decoder generates another.

As you may have realized, the key to SOTA performance of Transformers lies in their ability to extract and retain a vast number of words for each word they analyze. This allows them to detect and recall precise contexts in which each word has been mentioned. Such a capability is critical to understanding and analyzing niche content that contains potentially highly ambiguous entities such as Defense and National Security document sets.

Typical case of complex disambiguation for traditional NLP systems (images courtesy of Wikipedia: Elizabeth II and the Queen Elizabeth-class aircraft carrier

For instance, Western Navies often name their vessels after places or people (such as HMS Queen Victoria, USS Virginia, and so on), which can confuse standard Natural Language Processing (NLP) engines. In such cases, remembering the context is crucial for disambiguation. If you detect that a mention of “Queen Victoria” is surrounded by maritime-related content and context (in the example above, the words “christened” and “Portsmouth” are two indicators of the maritime and Royal Navy context of this sentence), and you recall having seen this word in the same context before, you can disociate the two mentions and disambiguate them to refer to the HMS Queen Elizabeth Aircraft Carrier, rather than the late (and beloved-yes, I am French, living in the UK and loved the Queen) Queen Elizabeth II.

In a nutshell, transformers are the now elite of the neural network world, using self-attention, multi-head attention, positional encoding, and an encoder-decoder structure to understand and generate human-like text. GPT-4, as a transformer model, brilliantly harnesses these concepts to excel at various NLP tasks. I have yet to test the other LLMs, but if they are as good as GPT-4 for the different tasks I will demo in the following articles then we are getting close to semi-automatically build at scale expert-driven, LLM powered Intelligence applications.

Now, let us offer some clarification about this technology before we explore its use cases and the development of new, LLM-powered, intelligence-focused NLP engines. A thorough understanding of LLMs’ strengths and weaknesses is crucial for:

The effective utilization and implementation within tailored intelligence workflows.
Preventing the creation of use cases that expose and succumb to their weaknesses, potentially leading to the generation of inaccurate intelligence, misinterpretations, and, as a result, erroneous situation assessments that may have catastrophic consequences

Nope, Transformers are not “cognitive” nor “sentient”

There’s been quite a stir surrounding the recent release of GPT-4 and the dawning era of “Artificial General Intelligence (AGI).” We have seen, on LinkedIn and other mainstream mediums words like “Cognitive”, or “Sentient” being used when describing GPT-4 and its human like dialogue capabilities. Although this groundbreaking technology does pose considerable risks, primarily in terms of propaganda and disinformation, it’s crucial to stay grounded and comprehend what transformers (and therefore LLMs) are — and what they aren’t. By grasping their immense potential and inherent limitations, we’ll be better equipped to manage their use responsibly and effectively.

Let us dispel the myths surrounding their so-called “cognitive” and “sentient” capabilities. While these powerful models are revolutionizing the field of natural language processing, contrary to what some AI enthusiasts claim, they are still decades away from achieving artificial general intelligence (assuming their current neural network structure can even permit this). Believing otherwise paves the way for numerous catastrophic scenarios in which humans blindly trust machines to carry out our instructions, only for these machines to partially execute our directives, misunderstand and therefore miss the objective, or introduce new vulnerabilities due to insufficient oversight. Throughout this series, we will discuss these risks, as well as the potential weaponization of such systems for malicious purposes.

Talos, an ancient mythical automaton with artificial intelligence (image courtesy of Wikipedia)

The Cognitive Conundrum

Transformers may appear to exhibit cognitive abilities, but let’s unravel the secret behind their intelligence. These models are designed to identify patterns and make predictions based on vast amounts of data. Indeed, they can be prompted to embody a specific type of expert or person, as ChatGPT has been fine-tuned to assign extra importance to such “command messages.” However, there are limitations to their capabilities. At the end of the day, transformers have a singular objective: to accurately predict the next word or “token” (some say that for every word, GPT-4 attempts to predict the next 700). They do not comprehend causality, emotions, or opinions, and they are certainly not conscious. This series will further emphasize the weaknesses of these models and the potential risks that the use of large language models (LLMs) may pose to our human society.

True cognition is a complex process that encompasses learning, memory, perception, problem-solving, and decision-making — all of which require a deep understanding of the world, self awareness and the ability to reason.

While transformers excel at processing language and mimicking human-like text, they lack the fundamental understanding of the world that drives human cognition. They are simply number-crunching machines that create outputs based on patterns in the data, without truly grasping the meaning behind them.

Sentience: A Far-Fetched Fantasy

At first glance, transformers may appear sentient, given their impressive language processing capabilities. But sentience is a whole different ball game. Sentience refers to the capacity to have subjective experiences, feelings, and self-awareness — traits that are unique to conscious beings. As much as transformers are advanced and powerful, they are still machines that operate based on mathematical algorithms, devoid of emotions or consciousness.

Transformers don’t experience joy or sadness, nor can they ponder the meaning of life. They’re engineered to analyze and generate text, but they don’t possess the emotional depth or self-awareness that defines sentience. They’re simply tools, designed to assist and augment human abilities, not to replace the rich tapestry of human consciousness.

Ok, but are LLMs at least deeply knowledgable ?

The answer, much like many aspects of intelligence work, is not binary. Indeed, large language models (LLMs) can possess an impressive ability to extract relevant information, and the scope of their knowledge can be quite remarkable.

For example, if you ask GPT-4 to describe the various components of an SA-21 anti-air missile system, its response is impressive, comparable to that of a junior analyst working on the Russia desk of any Western Military Intelligence service, at least on this specific topic. Moreover, when inquired about its range, the model also understands that the SA-21 can support and launch various missile types, which implies that there is not just one possibility; its effective range can vary from 40 to 400 kilometers.

S-400 missile system, also called SA-21 Growler (image courtery of Wikipedia)

Video courtesy of the author.

These preliminary findings should inspire us to improve the information retrieval queries even further. To access the comprehensive structured knowledge base employed by Wikipedia and other open-source projects, which serve as invaluable resources for building our expert-curated knowledge base, we need to acquire the unique ID of the SA-21 Wikidata page. This ID is referred to as a Q-code and will be used to query the Wikidata database using its specific language: SPARQL.

Since GPT-4 is considered as a State-of-the-Art NLP model, it should not have any problem disambiguating each of these equipment to their correct Q-code. So let’s ask.

Video courtesy of the author.

It appears that GPT-4 has accurately identified this as well, recognizing that some of these missiles belong to the same family as variants and would consequently be grouped together on the same Wikidata/Wikipedia pages. This is because these sources provide information only at the level of the equipment family. Let’s have a look at the information contained in each of the missiles’ Wikidata entry. To do this, just go to the Wikidata portal and enter the Q-code in the search, or use SPARQL and the project’s User Interface. And this is what you get…

Well, hello LLM Hallucination !

In case you haven’t noticed, all these Q-codes are simply incorrect. I was under the assumption that GPT-4 would have retained some structured data association from its training set, such as the link between a Wikipedia page and its Wikidata Q-code, but it’s clearly not the case, so I went to brainstorm it with some data scientist friends, including those working at Google DeepMind. After a good diner and some wine, we came to the conclusion that these errors might be due to the LLMs’ tendency to hallucinate on very specific prompts, particularly when they involve numbers that are not directly correlated or frequently seen with the surrounding words.

In technical terms, our hypothesis is as follows: the Q-codes are not represented as their own tokens, e.g., Q1234, but are split up, e.g., Q, 12, 34. There are certainly many more instances where other numbers follow 12 than the few times it might have seen 34 after 12. So, even though we likely get a higher conditional probability by context and the Q, it doesn’t outweigh the associations between numbers it has seen in other places. This also suggests that it should work better for smaller Q-codes, e.g., Q1 or Q30, than for larger ones (though this theory has not been tested yet). However, this does not help when we want our disambiguation prompt to return the correct Q-code every time.

This illustrates that you must exercise great caution when using LLMs for fact-checking or information retrieval. Always test and verify the facts in the LLM response that you deem critical for your workflow. In my case, I have found a solution to circumvent this significant issue, which I will share in the next article, so stay tuned! ;)

So what? And what’s next?

While transformers models like GPT-4 are undeniably impressive, it’s important to remember that they are neither cognitive nor sentient beings. They are marvels of modern technology, designed to process and generate human-like text, but they lack the complex understanding and emotional depth that define true cognition and sentience, or even sometimes some basic common sense, as the Q-codes errors demonstrate. To put it simply: they are VERY clever word guessing algorithms.

As underlined in our introductory article, our primary objective is not to directly generate information using GPT-4 (and other LLMs); rather, we aim to employ these LLMs as expert NLP models for the creation of expert-curated, continually updated knowledge bases, which can then be queried and examined.

In our upcoming episode, we’ll dive into testing GPT-4’s capabilities for NER and NED, beginning with a “simple” Defense article from a reputable source. We’ll reveal its strengths, as well as its limitations, and attempt to decipher the reasons behind any mistakes we observe. Stay tuned for this enlightening exploration of GPT-4’s performance in the world of Defense content analysis!

Liked this article?

Check the other articles in the series using the links in the following section!

We will gradually get deeper into the implementation details to create an automated, LLM-powered and expert-verified knowledge base that could be used for targeted Intelligence work, so don’t miss out and let’s connect! You can find me on LinkedIn or follow me on Medium!

Thanks for your support, and I shall see you in the next one !