LLMs to map the french radical right networks

Published in

Thoughts on Machine Learning

9 min readDec 20, 2023

Streetpress is a french independent publication, with an extensive coverage of the french radical right activities. I used their archives to map the radical right networks.

In the following article I am sharing some lessons I learnt during this project (please take my assertions with a grain of salt, as they are my representation of how GPT works, not a demonstrated truth).

Github code repo

Project presentation

The project involves 3 steps :

Extracting data from the articles
Building a graph from the extracted data
Visualizing the graph

I’ll share about the challenges I ran into, in 2 areas specifically: prompt engineering for data extraction, and entity resolution.

Prompt engineering for data extraction

Structure of the prompt

I chose to go with the Instructor library:

It leverages OpenAI functions to produce structured output.
It is well documented. It includes examples of entity extraction and knowledge graph creation that I could use as inspiration at first.
There are less overhead concepts than in Langchain.

The prompt comes in 2 pieces that work together:

The schema declaration, using Pydantic. It provides a structure but also an option to provide additional instruction on top of the prompt using the field descriptions.
The prompt itself that include instructions.

The schema:

# Pydantic schema

from pydantic import BaseModel, Field
from typing import List, Literal


class Relationship(BaseModel):
    source: int
    target: int
    label: str
    rationale: str


class Actor(BaseModel):
    """
    Actor: person, group of people, club, company, administration, association, institution
    """
    id: str = Field(
        ...,
        description="Unique identifier for the actor, of the form actor_1, actor_2,...",
    )
    name: str
    label: Literal["PERSON", "ORGANIZATION"]

    key_findings: List[str] = Field(
        ...,
        description="List of relevant findings regarding this actor and its relationships to other radical right actors",
    )

    related_actors: List[str] = Field(
        ...,
        description="List all actors this entity is involved with.",
    )

    actor_reputation: str = Field(
        ...,
        description="Reflect on the actor, and answer the following questions: Is this actor known outside the scope of this content? For people who know about it, do they think about this actor as a radical right actor?",
    )
    belongs_to_radical_right: bool = Field(
        ...,
        description="true if the actor is reputed outside the scope of this content to be a radical right entity, false otherwise",
    )
    named_actor: bool = Field(
        ...,
        description="True if the name of actor is a rigid designator. False if it is a flaccid designator.",
    )


class DocumentExtraction(BaseModel):
    
    entities_list: List[str] = Field(
        ...,
        description="List all of entities mentionned in the document.",
    )

    entities_roles: List[str] = Field(
        ...,
        description="For each entity in entities_list, make one sentence explaining its role in the content.",
    )

    is_it_an_actor: List[str] = Field(
        ...,
        description="For each entity in entities_roles, answer the question: is it an actor? An actor is one of the following : person, group of people, club, company, administration, association, institution?",
    )
    
    actors_details: List[Actor] = Field(
        ...,
        description="Each actor should be its separate object",
    )
        
    relationships_list: List[Relationship] = Field(
        ...,
        description="List of relationships between actors.",
    )

The prompt:

return client.chat.completions.create(
        model="gpt-4",
        response_model=DocumentExtraction,
        temperature = 0,
        top_p=0.0000000000000001,
        max_retries=2,
        messages=[
            {
                "role": "system",
                "content": f'''
                We are mapping the networks of the radical right and their activities. 
                
                To do so we will proceed in steps: 
                1. Think step by step, and extract all entities from the text and write them in entities_list.
                2. For each entity in entities_list, add one sentence to entities_roles explaining its role in the content.
                3. For each entity in entities_roles, answer the question: is it an actor?
                4. If an entity is an actor, add an object to actors_details
                5. Fill in the actor_reputation by answering the following questions: Is this actor known outside the scope of this content? For people who know about it, do they think about this entity as a radical right entity?
                6. Mark belongs_to_radical_right as true if the actor is reputed as a radical right actor outside the scope of this content
                7. Mark named_actor as true if the name of actor is a rigid designator or false if it is a flaccid designator.
                8. Using all the information collected, think step by step to create the list of all relationships between actors in the content.

                ''',
            },
            {
                "role": "user",
                "content": content,
            },
        ],
    )

Challenge #1: OpenAI models are non-deterministic

Using OpenAI functions for data extraction seems like magic at first. In particular the Instructor examples.

But when documents are long and have no specific structure, it turns out to be much more challenging.

There are lots of levers you can pull, many direction you can try.

Impact 1: OpenAI models being non-deterministic it takes a lot of time and effort to validate your experiments (multiple runs are needed), especially when each trial can take 1 to several minutes as was my case working on full articles.
Impact 2: building robustness into the process is a major challenge. The quality is measure by the worst of possible outputs

I tried to qualify a bit better how this non-determinism impacts word predictions (twitter thread). But having some kind of metric to measure the level of determinism of a model would be useful.

Also, I believe there is definitely a market for models that you can “seed” in order to make them work deterministically.

Challenge #2: quality depends on the size of the input

The first step of my prompt is to extract entities from the text. If I gave GPT as input a long text it would discover about the same of entities as if I gave it a shorter text:

In longer texts it would focus on entities that really matter within the article context. But if the text is too long, it would also start overlooking some entities that were important for my purpose.
In shorter text it would pick most indistinctively entities, including random noun groups that have little semantic weight with regards to the story.

Adding to the previous challenge, it means that experiment results with a given input size may not hold with a different input size.

I settled for 8000-characters input, which seem to provide enough context while OpenAI not dropping too many important details.

Challenge #3: OpenAI struggles to switch from global to local context and back

There is an entities_list, a list of strings, as well as an actors_details, a list of objects, in my DocumentExtraction top-level object. And in my prompt I ask GPT to fill in the entities_list first.

The reason for that is that if I only ask for the list of objects, GPT will miss many important entities.

My interpretation is that when GPT “focuses” on a single entity to flesh out the corresponding object, it loses sight of the larger context. And it struggles to retrieve it once it is done focusing on the entity.

By asking GPT to build a flat list of strings upfront it takes note of the global context (see next challenge), preventing the context switching issue once it starts focusing on each entity . In my experience it made GPT much more reliable.

Challenge #4: GPT thinks what it writes

OpenAI feels like a child with ADHD: it may perform well on an instruction and perform badly at this first instruction once you had a second.

I have come to think of GPT of thinking what it writes (and not the opposite that would feel more intuitive).

That means, if you want ChatGPT to be able to reason on some information make it write it. It will be less likely to ignore it afterwards.

I have found that some “chain-of-thought”-like fields could serve for “future” reasoning.

For example, adding the entities_roles field was a game changer to identify actors within entities. And adding key_findings within each Actor supported the information seeking process overall.

Challenge #5: GPT has “anchor-words”

It’s as much an opportunity as a challenge. It means that some words have quite specific meanings for GPT, and it is near-impossible to make it deviate from the semantic attached to it.

“Chain of thought” behaviours are elicited by such words that prime the GPT to act in a certain way.

Examples of anchor words I have observed: “summary”, “key findings”, “label”, “entity”, “belongs to”, “related to”.

To clarify, it’s not shocking that GPT associates a meaning to a word. But what is unsettling at first is that “some” act a bit like black holes: any word around them get sucked/ignored.

Challenge #6: declarative vs imperative style

I wanted GPT to output only named entities.

“Named entities” seemed to be a “concept” that I could rely on. But it turned out GPT had a rather fluctuant interpretation of it resulting in inconsistent output.

Adding a definition with specific terms (rigid vs flaccid designator) helped guiding GPT. But there is a limit to what we can declaratively define.

Sometimes it’s easier to define how to determine an information. In a more procedural approach.

That’s what happened for the belongs_to_radical_right attribute: since GPT could not provide a robust, repeatable way of determining that attribute, I had to provide a chain of steps to be taken (determine the reputation and deduce from that).

The magic at first is the feel that LLMs work in a very declarative manner. When robustness is needed however, there is tension to interact with it in a more prescriptive manner. But this mode is not ideal either as GPT has limited procedural logic.

Challenge #7: GPT has unclear procedural logic

One might expect GPT to follow a human-like procedural logic, i.e. take and apply instructions sequentially.

It is not that clear. Forgive my own lack of clarity about the topic, I don’t have yet a mental model to reason about that. But here are some observations:

The output is not organized in the order provided by the instructions. Which means that information about the “causes” may be generated after the information about the consequences. Part of the prompt engineering was to find ways to structure the output correctly.
However it also feels like what comes after has some impact on what comes before. As if the answer was thought ahead as a whole before being outputted.

Challenge #8: Cast a wide net

Instructions for filtering list of items based on an attribute are not consistently applied:

The more explicit again, the better (i.e. get the LLM to output a True/False value and filter on that).
But I don’t trust fully GPT

So the strategy I adopted was to have GPT compute some attributes that I could filter on afterwards, and do the filtering programmatically in a post-processing step.

Challenge #9 (remaining): try other strategies

Build a named entities list to pass to GPT: since GPT struggles on this task, we can try using a battle-tested entity extraction model and pass its output to GPT.
Build a summary and extract from it: amplifying the signal / noise input could help stabilise the final output. Maybe using a Chain-of-density-like strategy.
Use a QA strategy to extract significant facts: the following paper uses this strategy to decontextualize snippets https://arxiv.org/pdf/2305.14772.pdf

Entity resolution

Entity resolution

Observations:

The full-text search just on the name of the entity gives good enough results as a first step.
A second step using embeddings similarity (on key_information) was considered. But when the key_information output from the LLM is succinct, it can lead to false positives.

Relationship resolution

Observations:

The deduplication happens on the rationale of the relationship : we don’t want to do it on the label as 2 entities might for example have multiple reasons to be associated with one another and we don’t want to lose that information.
We however get some duplicates.

Possible improvement: use GPT for entity resolution

While embeddings might not be able to capture the identity with sufficient certainty, maybe GPT would do a better job. The question however is as always, how robust would that be?

Concluding thoughts

Using GPT for graph extraction is not as straightforward as I initially thought.
GPT is capricious, and getting it to consistently deliver is a real challenge.

Sources

Github code repo: https://github.com/meaningfool/streetpress

Stop Using OpenAI’s API, Use This One Instead

The recent drama at OpenAI, following the abrupt dismissal and reinstatement of Sam Altman, has shown to us how…

medium.com

Mistral AI Strategy To Beat OpenAI

Mistral AI, the parisian rival of OpenAI, just closed a $415 million funding round at a $2B valuation, after releasing…