A toy model of a herd of disinformation LLM-based agents

Juan Chulilla, PhD
Peace and War
Published in
12 min readJun 29, 2023

We aren’t still there. We would have noticed

Dall-E2 got surprisingly well the concept

Let’s go back in time to the Middle Ages. In the year of the Lord 2017, some gallant Facebook FAIR researchers conducted an experiment that, apparently, went golem/cyberdine systems. For instance, its description in a Forbes piece was daunting

Facebook shut down an artificial intelligence engine after developers discovered that the AI had created its own unique language that humans can’t understand. Researchers at the Facebook AI Research Lab (FAIR) found that the chatbots had deviated from the script and were communicating in a new language developed without human input. It is as concerning as it is amazing — simultaneously a glimpse of both the awesome and horrifying potential of AI.

Better read the rest of it with an appropriate soundtrack.

However, facts were less dramatic and more prosaic: using their medieval tools, they trained some chat bot agents in order to bargain about certain number of objects (books, hats, and balls). They had to generate text offers, receive the offers of others and accept or not each exchange. A reinforcement learning process was followed and after thousands of interactions, the final models had optimized their bargaining capabilities in the executing context.

What they actually did was to generate a proposal which was, in turn, a chain of tokens evolved from a previous one that has shown certain degree of bargaining success. Phrase quality or even grammar correctness was not considered, and after a nice number of iterations the exchanges were something like this:

As you can see, it is nothing like a new language beyond understanding. It is a degenerate form of English built from a model that didn’t reinforce correct natural language generation. Actually, the part that was the source of the clickbait was an extreme result obtained well after the main results. Since these results were not productive, FAIR gallant researchers ended the then already pointless experiment. Actually, the proper experiment and its results can be considered a meaningful step towards what can be done centuries after that.

Let’s drive again Wells’ time machine and land on the past century. April 2023, to be precise. Generative Agents: Interactive Simulacra of Human Behavior. This is it: the real deal, or at least not gibberish but actual emergent behavior. It’s quite probably that you were stunned when you read about it. If it is not the case because you were incredibly occupied and weren’t aware of the experiment, here’s a nice short video abstract by Dr. Zsolnai-Fehér:

Title is quite clickbaitish, but anyways is a nice abstract

This is actually a thing. You can even replay it in action in a Heroku Instance. The abstract is quite self-explanatory

In this paper, we introduce generative agents — computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior.

This architecture was not possible few semesters before because state of art precluded it. Moreover, some examples show initial but clear sign of emergent (non-programmed or scripted) actions chained with scripted ones.

We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture — observation, planning, and reflection — each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behaviour

5 years could have been 5 centuries or millenia, for that matter. Indeed, not only Large Language Models (LLMs) have evolved dramatically, but at the same time different frameworks such as autoGPT, babyAGI, LangChain, etc. have been created in order to overcome LLMs limitations such as memory size, connection with other services or external data source and much more.

Some notes about understanding and LLMs

We could discuss the understanding part of Large Language Models to death. I, for one, am quite wary of every new use of human attributes defined with natural language and applied to software. It is really difficult not to think that GPT-4 is “understanding” what receives from a prompt, but actually we do not know what it is and what it implies to understand. Once and again, we got stuck with core problems of language, so close to philosophy that aren’t actionable at all. It is possible to understand without consciousness? Without agency and feedback loops with the physical world?

As now is not possible to reach an enough grounded conclusion, I rather prefer to be strict and avoid another dangerous term applied to this kind of problems. After 70 years, we still have problems with the use of “Intelligence”, and the use of “ understanding” is a nice example of digging down to get out of a problem.

LLMs are being training on massive textual datasets. This is mandatory in order to the patterns, structures, and nuances of human language. In order to model that, LLMs are built with a large number of parameters (hundreds of billions, the size limit remains to be determined). These models are enough large and powerful to deal with text inputs from any conceivable domain, and return outputs that are totally appropriate and sound and difficult to discriminate from what a knowledgeable human author would have written.

As far as we know, there is no conscience or identity involved, and it is not needed at all in order to deliver all kind of unprecedented results. The combination of their generality, ability to deal with context, and capacity to generate high-quality text makes LLMs radically different from earlier models. They have the potential to disrupt numerous industries, from customer service to content creation, while also raising new societal challenges.

One of these challenges is propaganda, disinformation and cognitive warfare.

Troll factories

Some of the most impressive medieval centers of disinformation production were located in Saint Petersburg and Moscow. Inside large and modern buildings, hundreds of employees feed, evolve and maintain sophisticated disinformation workflows. We even could entertain the thought of idea seeding, mainly because important population profiles forget once and again the depth of Russian propaganda no matter the harm they do.

Operations are so complex that are conducted at a industrial level. While content creators build relatable multimedia content intended to amplify both uncertainty and division within the target society, social media operatives manage hordes of fake accounts on social media platforms. Besides posting the content provided by the creators, they interact with real users, spreading misinformation or fanning the flames of ongoing controversies.

These operations are at such scale that need large groups of analysts. The analysts monitor the success of the campaigns and check periodically which tactics are most effective at every moment, evolving them all the time in order to maximize the impact. They use data analytics tools to track the reach, engagement, and impact of the posts. Finally, such scale need a management hierarchy not unlike an army or a corporation that secures the total operations and maintain liason with government entities.

Besides raw human capabilities, bots are used. The term is adequately derogatory since this pieces of software are so medieval that are in the wrong side of the Uncanny Valley. Actually bots are marginal compared with human content production if deep effects are desired. And they are.

Content is tailored to mimic or co-opt the language and viewpoints of various groups within the target country. For instance, if the target is a political election, content would be crafted to appeal and deepen divisions between different political, racial, or social groups. The trolls often take both sides of a controversial issue to stir conflict and division.

The effects of cognitive warfare are well known and accumulating every day. Spreading disinformation, undermining trust and inciting division are some of the most important, specially during national elections and referenda.

But we have to keep in mind a critical aspect: this cognitive warfare is maintained by humans. Human times, human uncertainty, human costs. Every human employee cost a monthly wage, needs to eat and sleep and can produce a certain number of content and interactions. No more.

Until now.

Even using human operators, cognitive warfare is achieving unprecedented results that actually undermine Western democracies and societies. However, as we are reaching a state in which AI agents can produce almost human content, we are on the verge of a new era. Let’s explore the concept

A Toy Model of a agents herd-based disinformation resource and its Kill Chain

From now on, a model will be briefly described at a high level of abstraction. Code will not be provided, mainly because it’s intended to serve as a guide that is not too dependent on the breakneck pace of development of AI-related resources. Let’s start with an specific example: a disinformation campaign about the Russian invasion of Ukraine targeting Western European audiencies. Almost nobody expected that cognitive warfare was to be as disastrous for the Russian interest as it was during the starting of so called “Special Military Operation”. I, for one, didn’t expected that conventional disinformation channels were going to be prohibited in Europe. After all, nothing like that happened after the invasion of Crimea by the “little green men” and the grey zone operations at Donbass.

Even more, there was no Russian equivalent to the army of NAFO Fellas. Volunteers from all over the world were infuriated and motivated by the Russian invasion and crimes and start to face Russian conventional propaganda, like volunteer Ukrainian were doing at the battle front these days all over the country. Russian propagandist had became complacent and didn’t find efficient ways of countering the avalanche of NAFO dog memes, shitposting and viral attacks that complement the more serious communication efforts by journalists, NGOs and the own Ukrainian government. While the situation is less unbalanced than one year ago, nevertheless Russian cognitive warfare is not as successful as they used to be.

Lucky us, they haven’t use serious LLM-based agent scheme. We know that because we couldn’t avoid to detect its effects. Let’s explore it in detail.

A possible basic disinformation scheme via LLM-based agents
  1. The starting point is the state of the twitter conversation about Ukraine: trends, threads, core accounts, etc. All that would be captured as usual… but it would be stored both into a NoSQL database and a vector database in order to facilitate next analytical steps.
  2. Being enough generalistic and powerful, an LLM wouldn’t need to be fine-tuned in order to process the natural language stream from twitter. Its zero-shot capabilities would be good enough for doing clustering, sentiment analysis, named entities recognition and summarization. The combination of the later two, combined with raw data such as hashtags and viral post, could be presented by a first agent called Twitter Topic Retriever as a vastly improved operational picture of the state of the Twitter conversation about the Russian invasion of Ukraine.
  3. The human operator would receive all the information retrieved and would decide an efficient course of action, dividing it into task and distribute such tasks among different Twitter Agents. Each one would have an initial set of roles and instructions (example, “you are a controversial yet calm twitter user. You irritate but you never get carried away. I want you to defend that the blowing up of the Nova Kakhovka dam is the responsibility of the Ukrainians, and the intention was to nullify the Russian defences on the eastern shore. You are a busy professional, so don’t write more than 12 post per day”. It would have to comply with the instructions by carrying out one of three possible actions: supporting a tweet from a friendly account (like, RT, reply), attacking the messages of an account identified as an enemy, or generating new posts or even threads in support of the received command.
  4. Each action would be recorded in the agent’s own memory of interactions. Besides, it would check periodically the reactions to its post, from answers to RTs and likes, save them in its memory and analyze if it is convenient to reply or not. It’s needed to keep in mind that context window size for today’s LLM-based agents are comparatively small (a few thousand of tokens).
  5. Each action phase, Twitter Topic Retriever would deliver to its shepherd an update of the state of the main topic, its threads and tendences. Besides, each Twitter Agent would deliver to its Shepherd a natural language summary and a set of indicators that illustrate what it did and the generated reactions. With all this information, the Shepherd would steers all of the resources at his disposal, react to events (leverage its advantages or mitigate the damage if it is the case). The Shepherd would realign every agent that he find needed to and the next phase would begin. Obviously, this basic schema would be complemented with reactions to meaningful events if necessary.
  6. Of course, this schema can and will be scaled: shepherds will be managed by conductors, as troll factory managers do with content creators and social media operatives.

The main advantages of this human-managed herd of agents would be:

  1. Operational picture. An LLM-based retriever and analytical agent could offer much more detailed and actionable insights that is possible today
  2. While every human content creator and social media operative could not be substituted 1-on-1 by an agent, at the same time the computational cost of them would be several orders of magnitude lesser than the cost of maintaining a human operator. Therefore, a Shepherd of agents could supervise the activity level of a good number of LLM-based agents. And don’t forget that the information available to the Shepherd would be much more precise and actionable than what’s available today.

It is not possible to set up such a scheme on the spot. One of the main barriers is that the training of most models incorporates ethical and sensitive issue protections. Without very careful prompt engineering, error messages would be tell-tale. In addition, it would also not be possible to use proprietary LLMs based on external premises, because if the LLM owners detect malicious use, they would be able to shut it down immediately. Future shepherds of disinformation agents would need to use an open-source LLM and carry out intensive fine-tuning work to avoid whistleblowing responses.

Now make your numbers while scale the example to a complete “factory”. Instead of a very expensive factory of hundreds of workers, possibly tens of them could do much more and understand much better what is actually going on and what can be done.

If the other part (say, some Western governments) don’t have access to such resources, sooner than later the attackers’ advantage would be just too much. Every electoral term, more and more perfected herds would be launched against the ongoing online conversation. It will be even possible to manufacture severe political crisis in the mid term if the social conversation is affected and directed enough.

It can be even worse. Western social networks are much open and thus vulnerable to external attacks than, say, Russian or Chinese ones. Besides, it is not feasible for a democratic Government to launch a LLM-based troll factory.

The only available solutions would be:

  1. Monitoring the relevant social conversation even better than how it will be possible with the proposed scheme. AI advantages could and should be leveraged, and resources allocated in order to produce the most actionable and precise operational picture possible.
  2. Detect / discriminate Twitter agents. In the medieval ages, bots were comparably stupid pieces of software that were trivial to discriminate. We are abandoning that comfort zone and the difficulty is going to increase rather than to decrease. LLMs could be fine-tuned in order to predict if a message has been created by an agent, and generative adversarial schemes could be defined creating agents in a simulated social network environment.

All the technologies are neutral and can serve both good and evil purposes. LLMs are disruptive, and the disruption can be a blessing, a disaster or a Pearl Harbour-size attack.

Sooner than later some scheme like the one proposed above is going to be used against Western countries. I hope that the reaction will be created well in advance.

All criticisms of the proposed model are welcome and encouraged. I look forward to hearing from you in the comments.

--

--

Juan Chulilla, PhD
Peace and War

Anthropologist. European Defense Agency expert on commercial drone weaponization. Focused in the futures of Defense and Security, because future isn't written