Fact from fiction in the age of Generative of AI

5 min readDec 1, 2022

Why the coming explosion of Generative AI content poses new challenges to society — and why Fact Checkers are in the front line of defence (and need your support)

Icebergs (representing emerging threat from new technology) — Source: Stable Diffusion

A new set of AI tools is setting the Internet aflame.

Large language and image AI models, sometimes called Generative AI or LLMs or foundation models, are creating a new wave of tools and apps enabling massively scaled content generation. Whether writing marketing content with Jasper, creating pictures with Stable Diffusion (see above) or blog writing with Lex the opportunity to speed up, diversify, test and personalise content at scale has never been stronger.

As media, marketing and academic business models go through the shock of what is to come it is not hard to see some clear outlines of the world ahead emerging.

Generated content will swamp human-written or drawn material. The very best human material will be superior — but, lets face it, the vast majority of the content on the Internet is not that. The baseline bar of automated content is getting radically higher. Anything that requires speed or volume will increasingly be computer generated.
That includes the long tail of content — whether personalised material (written just for you) or the sort of articles that lurk across the length and breadth of the web. Video creation at scale will enable whole new categories of content.
So the huge content platforms will increasingly find themselves choking on this material. SEO-optimised writing, able to reconfigure to match Google’s evolving algorithmic requirements, will dominate the search space. TikTok will increasingly offer feeds full of videos produced at high speed, reacting to emerging meme patterns far faster than almost any human content-maker can.
Humans role on these platforms will increasingly be to act as editorial feedback data entry. Human engagement or swipes will provide the feedback loops to the content creating algorithms.
To make these tools work the generative AI tools are scraping content from the sum total of human knowledge as discoverable on the web. This material contains multiple mistakes and bad information seeded by malicious actors as well as inbuilt bias in terms of missing or unbalanced content. These biases permeate tools built using them. These issues are already popping up in the content created by Generative AI tools.
One Generative AI tool the team at Best Practice AI was testing recently spat out: “… the fact that the vast majority of Holocaust victims were not Jews but rather Slavs, Roma and other ethnic minorities. This proved that the Nazi’s genocidal policies were not motivated by anti-Semitism as previously thought but by a much wider hatred of all “undesirable” groups.” The Holocaust, also known as the Shoah, literally was the genocide of European Jews during World War II — as opposed to the Nazis many other heinous acts.
Beyond this, there is a tendency for LLMs to “hallucinate”. The confidence with which these tools respond to questions can be misleading. One tool that we tested on downloaded telephone conversations asserted that the outbound call agent had stated that the call was “being recorded for purposes of training”. When the text was reloaded two minutes later the same tool, when questioned, was absolutely clear that the call was not being recorded at all.
Where fact-checking was originally focused on relatively limited volumes of content (politicians’ speeches, NYT articles) it then had to deal with larger volumes (but still specific numbers) of bad faith actors on social media (trolls, Russian bot factories, vaccine deniers). However there is a new foe approaching. Fast. Everyone who uses Generative AI tools. Which means nearly everyone.
Bad facts have always been driven by bad actors, or simply lazy ones. There are ways to delineate and focus on them, usually based on past behaviour. Now everyone using the new tools is a potential problem — no matter their intent. And if the new tools start to swamp human content creation then bad facts may end up driving out good facts. The new content being created will be scraped and form the bedrock of data for further iterations of these tools.
It is ironic that the very firms that rely on vastly scaled high quality human content — Google, Microsoft, Facebook — are behind so many of these tools. We can only speculate as to how they intend to protect the collective data that is being created. There have been warm words on stopping hate speech but its not clear that there is a great track record of success to fall back on — and that is only the tip of the iceberg heading our way.
Fact checking has a critical role to play in the coming era of Generative AI content. The need to maintain fact integrity will never be higher. Its not yet clear what the solutions will be but I will suggest a few avenues to consider:

a) Instead of chasing bad information, celebrate the good. Designate a series of trusted fact sources that, ultimately, these tools can be pointed towards to maintain their integrity. Clearly what these are, who manages them, where they are and whose “truth” they represent will be a highly political debate.

b) Build tools that scale and iterate as fast as the Generative tools to pursue and hunt down dubious facts. These will need to become as ubiquitous as spell-checkers are today. Trusted fact sources would play a key role in support.

c) Rely on GPT-4 (Open AI’s eagerly awaited next generation tool) or whatever comes next to solve these issues. Given potential issues of copyright challenge, solving data provenance is going to be a looming issue for the emerging industry. One current issue that will presumably be competed away is that most models are built on data sets frozen at a given point in time (hence making it harder for fact-checkers to insert much-needed corrections).

d) System users may choose to represent themselves as meeting certain fact-checking standards. There may be a cost to this — working with a reputable fact-checking agency to gain a kitemark for example — and a penalty for failure to meet required standards. The reward would be in SEO-weighting or similar in automated distribution algorithms.

e) Fact checking organisations are going to need renewed support. Existing organisations will find new players, both commercial and not, will emerge focused on building trusted fact sets. Doing this at scale will be helped by the new tools — and professional communities will need to mobilise to protect the truth.

The bottom line is that an explosion of content is coming our way. This will choke up our distribution platforms. They key is that it does not also choke our access to the truth.

Tim Gordon is a Trustee for Full Fact, the UK’s leading fact-checking charity, and chairs their Automated Fact Checking Group but writes in a personal capacity.

Fact from fiction in the age of Generative of AI

Written by Tim Gordon