Open Notebook 001 — Language Models: Workings, Questions, Humanity & Creativity

Luca Fontana
6 min readOct 10, 2023

By Luca Fontana

Image created using DALL-E 2

Like a colossal mosaic composed of billions of text fragments, Large Language Models (known as LLMs, i.e. ChatGPT, ClaudeAI, llama and others) serve as a repository of human language, knowledge, and, to some extent, wisdom. Simultaneously, they defy traditional programming paradigms by “learning” rather than being “taught”, constructing algorithms influenced by the tides of text-based data they’ve been exposed to. But what lies beneath the digital façade of their semantic prowess? The LLM — captivating in its sophistication, yet daunting in its implications — is worth a deep intellectual excursion.

With an increasing following that holds these as the Oracles of Delphi of the digital age, manifesting textual output in response to an assortment of queries, but with silicon synapses instead of ethereal trances. Far from mere curiosities, these models are transforming the way we interact with information and, in doing so, reshaping the relationship between man & machine and sparking increasing debates over ethics, creativity and even the concept of humanity in itself.

Functioning, Training and Understanding

To grasp the workings of Large Language Models, it’s crucial to delineate the underlying architecture. Predominantly, we encounter the Transformer model as a nucleus. Imagine the Transformer as an incredibly intricate machine, finely tuned to understand contextual relationships in human language. Within it, you find layers of mathematical operations that help in encoding the structure and semantics of the text. The mechanism is trained to generate language: given an input, it predicts subsequent words based on the contextual cues observed during its training phase.

This capacity to grapple with semantic complexity arises from the billions — or even trillions — of parameters that each model integrates. The parameters are the secret sauce of each model. Just as a musician manipulates his mixer to produce a specific tonality or mood, the parameters within the model are adjusted through the training process. The machine learns to weigh certain words and phrases over others, capturing the subtleties of human language. In doing so, the model gains its ability to comprehend and generate text that is both complex and contextually appropriate.

“Training” is a process technically known as backpropagation,” in which the model adjusts itself through cycles of prediction and correction. In essence, it starts with a rudimentary ability to understand language. With each successive interaction, it assesses the gaps between its output and the ‘ideal’ output (derived from the dataset), then refines itself, like a self-aware typewriter that continually learns from its mistakes until its text becomes virtually indistinguishable from that written by a human.

The keystone in the arch of language model “training” lies in the datasets upon which they are fed. A model’s understanding of the world is mediated through data, and thus, its interpretive scope is necessarily a reflection of that data’s diversity and quality. Modern LLMs are frequently trained on colossal, disparate datasets ranging from books, academic journals, and web articles to a melange of social media posts. These datasets are typically multi-lingual, incorporating an array of global dialects and idioms. However, herein lies a double-edged sword; the datasets often perpetuate extant biases present in the original text. Consequently, issues of ethics, transparency, and equitable representation manifest themselves in a kaleidoscopic array of concerns.

Questions arise concerning the ethical and representational facets of these datasets. Who gets to be included? Whose language is deemed ‘standard’? The advent of LLMs is not just a technical milestone but also a sociopolitical act that calls for accountability.

The question of data ethics extends beyond issues of representation and bias to also encompass consent and ownership. Many of the datasets that feed into LLMs are vast collections of text scraped from the open web, which include social media posts, forum threads, and even personal blogs. Rarely is explicit permission sought from the authors of these snippets. While the act of scraping might adhere to the technical legality under current data collection frameworks, it navigates a moral grey area. We are compelled to ask: Do we implicitly forfeit our intellectual autonomy when our words, set adrift in the digital ocean, become assimilated into the datasets that train these models?

Hallucinations

One term that has gained traction surrounding current Large Language Models is their capability to propagate “hallucinations”. This isn’t the psychedelic type; instead, it refers to instances where the model generates information that is incorrect, misleading, or entirely fabricated. At first glance, one might hastily attribute these anomalies to bugs or malfunctions, but the phenomenon is far more nuanced.

Why do these intellectual fabrications occur? Partially, the explanation lies in the way these models are trained. During the training process, the primary goal is not veracity but plausibility. The LLMs learn from the data they are fed, aspiring to predict the next word in a sequence based on the likelihood of its occurrence in the training data. But ‘likely’ does not necessarily equate to ‘true.’ Herein lies a paradox: A model that is optimised for generating text that sounds human-like may also generate factually untrue text.

What makes the hallucination phenomenon potentially hazardous is its tendency to manifest in scenarios requiring specialised or factual knowledge. For instance, when asked to generate medical advice or historical facts, the model’s responses may sound plausible and are often eloquently framed, yet they might be rooted in falsity.

Additionally, LLMs raise unsettling inquiries about agency and ethics. If a model generates text that is harmful, misleading, or false, where does the responsibility lie? Is it merely an unintended consequence of its training data, or is it a reflection of the ethical considerations we failed to integrate? Can a model be blamed for reproducing biases that are widespread and popular in our society?

Creative Applications

Not limited to mere academic curiosities or functional utilities, LLMs have transcended into realms traditionally reserved for the creative intellect. They serve as collaborative partners for writers, as algorithmic composers of music, and even as artists in generative adversarial networks that can create visual artworks. The interactive storytelling space, for instance, is experiencing an upswell of innovation as LLMs become conversationalists capable of co-creating narrative arcs with human interlocutors. By coupling models with sophisticated sentiment analysis tools or integrating them into virtual reality environments, the frontier of artistic exploration extends even further, something which probably asks for an essay in itself exploring these new possibilities and potentials.

These recent explorations have sparked debates across several disciplines, where new definitions are being created to include (or exclude) how this new technology can be used. We are on the cusp of seeing art forms that are not just interactive but dynamically co-created in real time between humans and algorithms. These multidisciplinary confluences are redefining the very semantics of what we consider “art.”

Creative Implications and many questions

While the creative utilities are abundant, they are not without implications. The ability of LLMs to generate artistic content en masse brings forth questions around originality, intellectual property, and the essence of creativity itself.

Is a poem less poignant if written by an algorithm? Or does the presence of human sentiment as training data grant it an unacknowledged emotional authenticity? If knowing the author’s identity changes our perception of a work’s value, is it then that its value lies in the creator rather than the creation?

A poem written using GPT4

Who is the creator? The human prompting the machine? The machine? Or the authors whose work was used to form the dataset to train the machine? Copyright laws, predicated largely on a human-centric model of creation, find themselves contorted when dealing with art, text, or music generated by algorithms. Traditional intellectual property frameworks are currently ill-equipped to address the complex nuances of machine-generated creativity. Even more uncertain is the consideration of derivative works. Since LLMs are trained on data sourced from existing human-authored texts, do these subsequent creative outputs represent a form of ‘remix culture,’ and how should existing copyright laws adapt to this new modality?

How may these technologies shift our social context? Just as the Gutenberg press revolutionized the distribution of information, LLMs have the potential to radically change creativity, but in what direction?

--

--

Luca Fontana

3rd-year Architecture student & creative director with experience working on a variety of creative initiatives within art, music and architecture.