Moonshine: The Mandela Effect and AI — The Importance of Sanitised Training Data

Sharon Mitchell

Published in

Version 1

12 min readAug 15, 2024

moonshine (noun) INFORMAL: foolish talk or ideas.
“..whatever I said, it was moonshine..”

Natural Language, Models & Processing

Natural language can take different forms, namely either a spoken language or a sign language. Natural languages are distinguished from constructed and formal languages such as those used to program computers or to study logic.

Natural Language Processing (NLP) is a Machine Learning (ML) technology that gives computers the ability to interpret, manipulate, and, comprehend human language.

NLP, at its core, seeks to empower computers to comprehend and interact with human language in meaningful ways, and ChatGPT exemplifies this by engaging in text-based conversations, answering questions, offering suggestions, and even providing creative content.

NLP is a field of Computer Science and Linguistics that focuses on enabling machines to understand and generate human language, and it is a key part of many exciting applications such as AI and chatbots.

There are 4 different types of technique: Statistical, Stochastic,(Probabilistic), Rule-Based, and, Hybrid.

ChatGPT, specifically, is a deep learning algorithm — ML — that uses unsupervised learning to pre-train a neural network on large natural language datasets, such as text corpora, or, speech corpora. The goal is a computer capable of “understanding” the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

We can allow algorithms to roam the digital landscape and learn, just as a child does.

NLP has its roots in the 1950s. Alan Turing published an article titled “Computing Machinery and Intelligence” which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language.

Large Language Models

Works of art make rules; Rules do not make works of art — Claude Debussy

In recent times, there has been considerable interest in Large Language Models (LLMs): ML systems (such as ChatGPT), which use reams of available text, coupled with probability calculations, in order to create human-like text, dialogue, and, writing.

The LLM achieves this by constructing a massive statistical model, one which is based on large amounts of text, mostly taken from the internet. Done with relatively little human input; rather, the model is designed by constructing a large number of nodes, which act as probability functions for a word to appear in a text given its context and the text that has come before.

Researchers feed the LLM large amounts of text and train it by having it make next-word predictions about this training data. They then give it positive or negative feedback depending on whether it predicts correctly. Given enough text, the LLM can construct a statistical model giving the likelihood of the next word in a block of text all by itself.

Becoming increasingly sophisticated and convincing, to the point where some commentators suggest that we may now be approaching the creation of artificial general intelligence. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. These falsehoods, and the overall activity of LLMs, is better understood when one considers the models are indifferent to the truth of their outputs, the programs cannot themselves be concerned with truth.

Descriptions of anything, not only guide understanding, but also inform its application: what it’s for and what it can reasonably be expected to do.

Currently, false statements by LLMs are described as “hallucinations”, which give the idea that these systems are misrepresenting the world, and describing what they “see”.

When factual errors are made, is there really an active attempt to deceive.?

Can LLMs be ascribed intentions.?

Both lying and hallucinating require some concern with truth, and the problem isn’t that LLMs hallucinate, lie, or misrepresent the world in some way. It’s that they are not designed to represent the world at all; instead, they are designed to convey convincing lines of text and to give the impression that they accurately represent the world.

Nothing in the design of LLMs (whose training task is to predict words given context) is actually designed to handle arithmetic, temporal reasoning, etc. These models aren’t designed to transmit information, so we shouldn’t be too surprised when their assertions turn out to be false.

The Mandela Effect

In the realm of psychological phenomena, the Mandela Effect stands out as one of the most curious and bewildering. Named after the widespread false memory of Nelson Mandela’s death, the Mandela Effect describes the occurrence of large groups of people remembering events or details differently than the documented reality. This phenomenon raises fascinating questions about the nature of human memory, and in recent years, it has intersected with another rapidly evolving field, AI.

The term was originated in 2009 by Fiona Broome, a self-styled “paranormal consultant,” after she discovered that she, along with a number of others, believed that Nelson Mandela had died in the 1970s or 80s (when he actually died in 2013).

Upon learning that Mandela was still alive, she initially dismissed her memories as a result of misunderstanding; however, during a conversation with a security staff member at a science fiction and fantasy convention in 2009, Broome reportedly learned that others possessed the same false memory about Mandela’s death in prison.

Broome thought she also saw news coverage of Mandela’s funeral and burial, including a televised interview or speech by his widow, Winnie. As Broome began talking about these memories, she learned that “perhaps thousands” of other people also recalled news coverage of Mandela’s death in prison, followed by a speech by his widow.

Broome was struck by the similarities and created a website for people to share and discuss their memories that did not align with recorded history, a phenomenon she called the Mandela Effect. Through thousands of comments, people began to uncover shared memories that differed from reality, and the Mandela Effect went viral. It became an Internet phenomenon, going as far as to inspire a film titled The Mandela Effect (2019).

The Mandela Effect is essentially a collective false memory, where a significant number of people recall something differently from how it actually occurred. These discrepancies range from minor details in pop culture to more substantial historical events.

The phenomenon can be attributed to various psychological mechanisms, including confabulation (where the brain fills in gaps in memory with plausible details), social reinforcement (where shared false memories gain traction), and cognitive biases. However, the Mandela Effect’s allure lies in its mystery and the almost magical sense of an alternate reality it suggests.

Born of Racist Blunder

Broome’s memories were not cooked up from scratch. There was in fact a black South African anti-apartheid activist whose death in prison made international news and whose widow appeared on television.

The man I write of led an anti-apartheid group during the 1960s and 70s, although it was not the African National Congress. It was called the Black Consciousness Movement. The man had lost patience with the leaders of what he called “white organizations” like the National Union of South African Students (NUSAS) and similar wishy-washy anti-apartheid groups that had been annexed by well-meaning white liberals.

He saw, “no participation by blacks in the articulation of their own aspirations.” He found white activists to be paternalistic and patronizing. He sought genuine egalitarianism. His idea of social justice was not multi-racial, but rather, a “totally non-racial society,” as he called it.

The man I write of was not called Nelson Mandela. His name was Bantu Stephen Biko, and his death in police custody on 12 September 1977 became a very big deal around the world. Especially as the unsavoury details emerged.

Receiving considerable attention internationally, thus did our paranormal consultant mistake Stephen Biko for Nelson Mandela — an entirely different black South African anti-apartheid activist with a prison record.

English Rock musician Peter Gabriel was inspired to write the musical eulogy song “Biko” after hearing about Biko’s death on the news. The song is book-ended with recordings of songs sung at Biko’s funeral. Released by Charisma Records as a single from Gabriel’s eponymous third album in 1980, it reached №38 on the British charts.

Stephen Biko is the Mandela in the Mandela Effect. But to speak of a Mandela Effect is to commit a racist blunder. The phrase is lazy; it’s an expression of racial overconfidence. It shows a lack of attention to detail and distinction when observing people one deems interchangeable and unremarkable.

Biko died young, but there is a sense in which he triumphed: the South African security forces, and the government they served, were discredited severely before the world. On the other hand, regrettably, no one was ever punished for Stephen Biko’s murder. All involved were granted amnesty by the South African Truth and Reconciliation Commission in 1997. I’ll leave it for the reader to decide whether reconciliation promotes, or obstructs, Justice.

Reality Degradation or Simply Other Anomalies

If one starts their research descent into other memory anomalies - and I deliberately use the term descent, for you will find yourself in a veritable warren — you will find there’s intense debate over potential causal mechanisms, many of which include pseudoscientific explanations.

Although Broome made no causal claims in her initial post, she firmly rejected the notion that the Mandela Effect resulted from false memories and instead cited parallel realities, alternate history, and other fringe theories as her favourite science fiction explanations. The Mandela Effect has fuelled numerous untestable theories, including that the phenomenon occurs as a result of string theory and the mixing of different Universes. Such fringe theories have sparked intense debate.

The curious reader with ample time on their hands, may wish to research the following (in the context of Mandela Effect):-

Berenstein Bears

The Monopoly Man (did he sport a monocle.?)

C-3PO (from Star Wars franchise, and his silver leg)

Here’s Where It Gets Weird

Hear me out! But Dolly had braces!

Moonraker (1979) is a spy-fi film, the eleventh in the James Bond series produced by Eon Productions, and the fourth to star Roger Moore as the fictional MI6 agent James Bond.

Bond survives attacks by Jaws during Rio Carnival and on the Sugarloaf Cable Car at Sugarloaf Mountain. After Jaws’ cable car crashes, he is rescued from the rubble by Dolly, a young woman, and the two fall in love.

Jaws was played by the 7't 3" tall American actor Richard Kiel https://en.wikipedia.org/wiki/Richard_Kiel who passed away 10th September 2014, aged 74. The BBC ran this article the day after: https://www.bbc.co.uk/news/entertainment-arts-29160096

Dolly was played by the 5' 2" French actress Blanche Ravalec. Originally, film-makers wanted to cast a 7' tall actress to play opposite Kiel, but it was decided that a more “realistic” relationship would perhaps present itself if a petite actress was cast.

Tiny compared to Jaws’ height, this blonde-haired, pig-tailed, spectacle-wearing, super-strong girl makes the unusual but amusing girlfriend of Jaws. Like Sheriff J.W Pepper before her, Dolly is an in-direct ally to Bond. It is through the love for Dolly that Jaws refuses to throw Bond out the airlock, and allows him to escape.

Excerpt from BBC article after Kiel’s death showing an image of the actor Kiel and text stating Dolly had braces.

However, the BBC article from 2014 clearly states Dolly had braces, and in the interests of full-disclosure, this is the one Mandela Effect phenomenon that snagged the Author, inspired this article and birthed its title.

Rather coincidently, during research for this article, the Author discovered she shares a birthday with Blanche Ravalec..Had to be the one to hook me in then, didn’t it.?

But, importantly, the Author doesn’t see how the scene would work without the braces. So, naturally, I asked The Hive Mind (ChatGPT)..

Screenshot of ChatGPT 3.5 output, given the input question “did Dolly have braces.?”

ChatGPT 3.5 doesn’t agree, yet there persists much intense online debate about this particular memory anomaly, and many YouTube videos. Some folk have gone so far as to source many DVD copies of Moonraker, as they, like me, distinctly recall Dolly having braces — it made the dynamics of the scene where she meets Jaws, work — but no copies appear to exist (anymore).

Despite the 2014 BBC article author being amongst our number of believers.

But that’s the thing about the Quantum Universe: every particle “knows” about every other particle.

Surely if we find ourselves hurtling down a different trouser-leg of SpaceTime — one in which Dolly didn’t have braces — then the 2014 BBC article caption would also be instantaneously altered.?

It wouldn’t be possible to have a situation where only some of the occurrence have changed.

We can draw only one logical conclusion from all this, and that is: the 2014 BBC article was written by a human.

AI and Misinformation

The chief enemy of creativity is good sense — Pablo Picasso

The intersection of AI and the Mandela Effect offers an intriguing exploration into the nature of memory, both human and artificial. As AI becomes increasingly capable of mimicking human behaviour and processing vast amounts of data, it also starts to mirror some of our cognitive quirks, including memory inaccuracies.

AI algorithms, particularly those involved in ML and NLP, often rely on vast datasets scraped from the internet. These datasets may contain errors, outdated information, or outright fabrications. If an AI is trained on such data, it could inadvertently reinforce and propagate these inaccuracies, creating a digital version of the Mandela Effect. For example, a chatbot trained on incorrect historical data might confidently assert false information, similar to how people misremember events.

AI as a Tool for Analysing Collective Memory

AI can help us understand the Mandela Effect by analysing large datasets of human memories and the content that influences them. Social media, forums, and other online platforms are treasure troves of collective memory. AI can sift through this data, identifying patterns in how false memories spread and evolve. This can provide insights into the psychological and sociological mechanisms behind the Mandela Effect, helping to demystify why certain false memories are so widespread.

Synthetic Media and Deepfakes

Deepfakes — realistic but fabricated images and videos — adds another layer of complexity. As AI-generated content becomes more sophisticated, distinguishing between real and synthetic memories may become increasingly challenging. Imagine a future where AI-generated videos “document” events that never happened, creating new, convincing false memories for the public. This possibility blurs the line between the Mandela Effect as a psychological phenomenon and deliberate manipulation through technology.

The Future of Memory in a Digital World

As we delve deeper into the digital age, the boundaries between human and artificial memory are becoming increasingly intertwined. The Mandela Effect, once a curiosity confined to psychology and popular culture, now has significant implications for how we understand memory and truth in a world where AI plays an ever-growing role.

To navigate this complex landscape, it’s crucial to develop robust AI systems that can distinguish between factual and false information, while also cultivating critical thinking and media literacy in society. Only by understanding the nuances of both human and artificial memory can we hope to mitigate the impact of misinformation and appreciate the complexities of our collective memories.

In this new era, the Mandela Effect serves as a fascinating case study at the intersection of psychology and technology, reminding us that our perception of reality is both fragile and susceptible to influence. As AI continues to evolve, the mysteries of memory — both human and machine — will undoubtedly deepen, offering new challenges and opportunities for exploration.

About the Author

Sharon Mitchell is an AWS DevOps Engineer at Version 1.