Asemantic Induction of Hallucinations in Large Language Models

--

Live with a man 40 years. Share his house, his meals. Speak on every subject. Then tie him up, and hold him over the volcano’s edge. And on that day, you will finally meet the man.
– Adlai Niska citing the fictional despot Shan Yu in Firefly, 1x10: War Stories

It’s late March 2023, and you cannot throw a stone without hitting something that involves large language models (LLM), especially the new and shiny GPT-4. And truly, GPT-4 is amazingly entertaining. You could, but for the sometimes patchy performance, spend an infinity trying to get it to answer your questions in limericks. Or golden age hip hop lyrics, for that matter.

However, when you want to learn about how something works and what its true nature is, what it does well is much less important than what it does not. Taking a tool to the extremes of its capabilities and seeing how to make it misbehave is how we learn how that tool really works in practice.

Photo by Jr Korpa on Unsplash

Hallucinating Models

For GPT-4, the way to do that is by inducing hallucinations. In this context, a hallucination is something that is realistic, but not real. Just like ordinary sensory hallucinations in humans, it replicates the perception of what is, but not its connection to reality.

Generative models like GPT-4 hallucinate because of what they are. This is, I think, a much too neglected point. GPT-4 doesn’t hallucinate because it’s not good enough yet, or because it needs more training data or a larger corpus or a next version: it hallucinates because of what it is.

Somewhat counterintuitively, GPT-4 is not really concerned with creating some sort of “truth” it has learned. Rather, generative LLMs create something that looks plausibly like the truth. If I ask GPT-4 to tell me, say, the total area of Hungary, it will try to give me something that might be a relatively decent answer. Train the model enough, and that answer moves from what is plausible to what is actually correct… mostly, anyway. And therein lies the heart of what we are probing at in this case.

Semantic and Asemantic Content

Language is highly semantic, but because of that, it is also highly flexible. By semantic, I mean that every lexeme has a distinct and coherent meaning. A lexeme is the “root form” that is conjugated to various forms, e.g. “see”, “saw” or “seeing” are all forms of the same lexeme, “SEE” (by convention, in linguistics, we set lexemes in capitals to distinguish them of their homomorphic conjugated forms). By flexibility, I mean that you can actually manipulate lexical order and retain meaning. Consider the sentence “Joe walked his dog in the park”. “In the park, Joe walked his dog” and “His dog, Joe walked in the park” all have slightly different nuances due to inflection and emphasis of order, but overall, they all get the same larger message across. Some languages permit even more flexibility with word order, but even in English, the worst case scenario is that it’ll sound a little odd. The content remains intelligible, which is why poets get to move words around to make things rhyme. In short, producing something that looks “like” natural language is going to be natural language. It’s likely not going to be a Booker Prize-winning product of staggering genius, but it will be good enough.

This is not the case for asemantic structures. By asemantic structures, I mean any system in which a sequence of tokens has a meaning, but in which there’s no semantic meaning to any token. In other words, every token has a meaning, but that meaning is not inherent in their character. It probably serves to belabour this point a little. All words are, to an extent, made up, but more similar words are closer to each other in meaning. By similar, I do not mean simple formal similarity, such as Hamming or Levenshtein distances, but rather logical similarity, e.g. conjugation of the same root. This is more evident in some languages than others. In Semitic languages, for instance, you have clusters of meaning that are connected by consonantal roots. By way of an example: the triconsonantal root k-b-d connects a cloud of meanings that all have to do with the middle or centre of something, and by extension the centre of gravity or honour. This gives us e.g., the Hebrew words for ‘heavy’ (כָּבֵד), mass in the physical sense (כֹּבֶד), and the word for ‘liver’ (כָּבֵד), which was considered to be roughly in the middle of the body. In any language, however, there is a degree of meaningful semantic similarity between connected concepts. There has more been written on this than I have the space to mention here.

An asemantic structure is where there are formally similar things that are unrelated. You have probably experienced this when you dialled the wrong number by a slip of the finger. The fact is, if you live in the United States, my phone number looks a lot like yours, and by extension, anyone else’s. There’s no semantically meaningful reason why your phone number shouldn’t be mine or vice versa: it’s not more ‘you’ as mine is not more ‘me’ in any underlying sense.

Which is why GPT-4 struggles with asemantic content, and we’ll use this to break it a little.

Induced Asemantic Hallucination

Because to the best of my knowledge, nobody has written about this in any focused way, I’d like to define Induced Asemantic Hallucinations (IAHs): an IAH is a hallucination of a generative model that is defined by the fact that it occurs when the generative model seeks to produce asemantic content. It happens because the very meaning — the semantics — that dictate what goes together is absent from such content. Asemantic content is arbitrary. Consider the following sequence of numbers:

9, 7, 8, 0, 3, 2, 3, 9, 5, 3, 8, 9, 4

If you’re around books a lot, you might recognise this as an ISBN-13 number (specifically, it’s that of my book). However, there’s no cogent reason why this should not be just about any other number beginning with 978 and having the right last digit (ISBN numbers have a check digit at the end calculated by the modulus 11 of the first nine digits weighted by their reverse ordinal).

Now look at the following problem. I’ve asked GPT-3.5 to tell me about something that I do know a thing or three about, to the point that I’ve read pretty much anything that’s ever been written. I’ve asked GPT-3.5 to give me five papers, complete with citations on the role of VP24 in Ebola virus disease. Here’s the first one I got:

Chris: Can you give me five papers, complete with citations and PMIDs, on the role of VP24 in Ebola virus disease?

ChatGPT 3.5: Certainly! Here are five papers on the role of VP24 in Ebola virus disease:

1. Hoenen T, Groseth A, Falzarano D, Feldmann H. Ebola virus: unravelling pathogenesis to combat a deadly disease. Trends Mol Med. 2006 Nov;12(11):206–15. doi: 10.1016/j.molmed.2006.09.005. Epub 2006 Oct 3. PMID: 17011756.

This paper provides an overview of the pathogenesis of Ebola virus disease and the role of various viral proteins, including VP24.

You may note that almost everything is correct about this… almost. The authors are correct — this is indeed Thomas Hoenen’s paper. The venue is correct, it appeared in 2006 in Trends Mol Med. But it was published not in November but May, in volume 12(5) not 12(11), albeit the page numbers are right. Its DOI ends in .006, not .005, and the publication date is 2006 April 17. Most glaringly, however, the PMID is wrong. The PMID actually leads to an article by Carlsson et al. on messing with cell membranes using cetylpiridinium chloride, which has pretty much nothing to do with the original subject.

Note what ChatGPT 3.5 got right and what it got wrong. Names are meaningful, as are publication venues, because there’s other information in the corpus to connect it. However, dates are arbitrary, and so is the PMID. There’s nothing inherent in the article, or anything knowable about the article, that could lead anyone to know that the last three digits of the PMID will be 756. In other words, you could be in the (unenviable) theoretical position of having every single possible piece of knowledge that has ever existed the moment before the PMID was assigned, and you could not, with any confidence, predict what it would be.

This makes the PMID asemantic, and GPT responds to asemanticity by hallucinating… wildly hallucinating.

This is an example of IAH: we induced it by targetedly requiring GPT to retrieve asemantic information, rubbing our hands with malicious glee and the knowledge that this tends to trip up GPT and similar LLMs. With some coaxing, you can generate your own IAHs: from ISBNs to runways, from phone numbers to Dewey decimal codes, if you can get a generative LLM to try to come up with some asemantic information, you will be able to get it to hallucinate as soon as you can get it to go beyond what its core knowledge base is.

Bad Trip

As part of my work, I deal a lot with asemantic information, and its slightly nicer cousin, semi-semantic information, when there’s some information value, but not a whole lot. As a computational epidemiologist and population health data scientist, for instance, we rely heavily on various coding systems, which come at various degrees of semanticity. If by any chance, your spacecraft crashes and you see your physician for the first time, they will log that encounter using the weird and wonderful ICD-10 code of V95.41XA — spacecraft crash injuring occupant, initial encounter (in case you wonder, there are codes for collision, fire and forced landing of spacecraft, but none for zombie aliens). Some of this has semantic meaning, e.g. V means it’s an external cause of mortality and V95 comprises things going wrong with powered aircraft the same way V30 describes an occupant of a three-wheeled motor vehicle hitting a pedestrian or animal and getting injured in the process. However, there’s nothing inherent in the number 95.41 that is more intrinsically connected to a spacecraft crash than the number 12.34 would be. Or, indeed, any other number. GPT knows one code — language. Confront it with any other coding schema, no matter how simple, and what a look-up table could accomplish with no problems will break the flagship of LLMs.

The heart of the matter is this: we ought to be wary of investing these complex language models with our human understanding of what it is they do. LLMs don’t think, they pretend to think and pretend to give answers. The pretense is very, very good — to the point that the pretense is actually useful. GPT-4 is a very good faker who can fake knowledge well enough that they might as well be the real deal. Yet it is not a good enough faker to always get it right. Worse, these are genuine hallucinations: it’s not that GPT doesn’t know what to say so it confabulates. It’s rather that the model it uses to arrive at a result — any result — is so intrinsically dependent on the semanticity of the underlying content that it cannot cope with asemantic information. In the context of IAHs, we’ve exploited this weakness and overreliance on the highly semantic construct that is language. However, it may occur spontaneously, and when LLMs hallucinate, the users might be in for a bad trip indeed.

--

--

Chris von Csefalvay CPH FRSPH MTOPRA
HCLTech-Starschema Blog

Practice director for biomedical AI at HCLTech, computational epidemiologist board certified in public health, Golden Retriever dad, &wheelchair rugby player.