AI Hallucination and Big Brother

Khun Yee Fung, Ph.D.
Programming is Life
2 min readJun 5, 2024

I was thinking, until very recently, human knowledge either transmits by words of mouth, teaching, or writings in books. Even though books can be destroyed, burned, eaten away by worms, buried, etc., they are permanent in the sense that the printing does not get changed. The other two ways are fragile in the sense that each iteration changes what is being transmitted. So, there are two axes of permanence: the content not changing, and the container of the content not changing.

Then the digital age comes along. It is preceded by a very brief period of an analogue age, with radio, telegraph, telegram, and television. The recordings were done in film and vinyl. This is brief, a few decades, before we have CD, rust, and eventually transistors. Mind you, a lot of ancient music boxes are actually digital in the sense they record the music in discrete elements.

Of course, this is not meant to be precise, or even very accurate. Just a rough idea on the permanence of the recording media. With the digital media, we gain another axis of permanence: the ability to decode the content. Digital media by nature is encoded. Of course we can argue that vinyl and film are both media encoding their content, but the encoding can be easily decoded. Digital media tend to transform the content first before encoding it. The transformation process, if unknown, can be very difficult to reverse. When all the CD players are gone, how do you play a CD? Not impossible, I mean you can always reinvent the CD player, but without great effort, all the CDs are practically useless.

Of course, with the Internet, we have the fourth axis of permanence: permanence of the content source. This has always been a difficult thing. Checking the sources can be frustrating. You might not be able to trace back for very long before the track turns cold. But you have a sense how reliable some particular content is, by tracing its sources.

Imagine in a totalitarian state, where all media with only two axes of permanence are banned. So, no books, no films, no vinyl. All content must be online.

If you are the minister of Truth, you will have a very easy job if you use a tool like ChatGPT well: all content all the time is generated from approved training data. Now, all you have is a self-contained, no-source world of content by a LLM. Generate a world that is 100% approved by the state ideology of the day. When the ideology changes, modifies the training data, and you have the updated world, again 100% approved by the modified state ideology.

I mean, “check the sources” will have no meaning whatsoever.

If we can get AI hallucination even without this kind of training, it is much darker if the training data is intentionally biased. We know AI can fabricate “facts”. Quite easily, it seems. Can AI generate truth from half-truths and lies? If not, we have just found a way to make facts less reliably checked. On a Internet that is full of content generated by AI.

--

--

Khun Yee Fung, Ph.D.
Programming is Life

I am a computer programmer. Programming is a hobby and also part of my job as a CTO. I have been doing it for more than 40 years now.