GPT-SW3 as Klara, the artificial friend
In this post, we look back at the first public display of GPT-SW3. This occurred during an art exhibition in Stockholm, Sweden, where GPT-SW3 made a guest appearance in the form of a chatbot named Klara.
The Klara chatbot and Life Eternal
Between 1 October, 2022 and 29 January, 2023, the Nobel Prize Museum organized an exhibition called Life Eternal at Liljevalchs art gallery in Stockholm. The exhibition brought together the fields of science, art and cultural history, showed different approaches to eternity, explored the crucial issues of our era and offered hope for the future. The exhibition featured 13 rooms, each with different themes, artworks, and installations with connections to the Nobel Prize categories.
One of the commissioned installations was centered around the literature laureate Kazuo Ishiguro’s novel “Klara and the Sun”, which tells the story of an android (Klara) whose purpose is to be an artificial friend to her human companion (Josie). This room, designed by Sahara Widoff and David Berner, consisted of an artistic interpretation of one of the scenes in the novel, brought to life by synchronized light, sound, animations and a masterful sculpture of Josie by Oscar Nilsson. In connection to the theme of this room, a set of computers in the main hall offered the visitors to talk to a chatbot inspired by Ishiguro’s character Klara.
Klara, GPT-SW3 and ChatGPT
The Klara chatbot was no ordinary chatbot, but a set of five different carefully engineered prompt interfaces to the first large generative language model for Swedish, GPT-SW3. GPT-SW3 is a large language model that has been trained on 1.2 TB of text data in the North Germanic languages (i.e. Swedish, Norwegian, Danish, Icelandic, and also English). This meant that the Klara chatbot had the capacity to talk about anything and everything (in 5 different languages), even if she at first asked the visitors to talk about one of five themes, including the climate, the sun, space, eternal life, and what happens after death. If cleverly instructed, however, Klara could be made to discuss any topic the visitor wanted.
When the exhibition started in October 2022, large generative language models were a relatively unknown phenomenon to the general public, and only a limited number of tech-savvy visitors probably had a grasp of the underlying technology and its potential. This all changed on 30 November 2022, when ChatGPT was unleashed on the world, instantly propelling the capacity of large generative language models to the forefront of public awareness.
ChatGPT is the result of several steps of iterative improvement of the underlying GPT-3 language model developed by OpenAI. By comparison, GPT-SW3 is a basic generative language model of the same type as GPT-3, but without the additional improvements of ChatGPT. As such, the Klara chatbot had limited understanding of user intents compared to ChatGPT, and was prone to the same type of repetitive behavior and creative attitude towards factual information as other large generative language models. Even so, the Klara chatbot had a general language capacity that has never before been exhibited by a Swedish chatbot, and the Life Eternal exhibition was the first time that the general Swedish population had the chance to interact with a large generative Swedish language model; a historic event for Swedish language technology.
How Klara was built
The Klara chatbot was the result of a collaboration between the NLU research group at AI Sweden and the Danish design studio YOKE. YOKE designed and developed the physical installation at Liljevalchs, as well as the graphical user interface that featured wonderful futuristic animations in ASCII-art. The NLU group at AI Sweden hosted its 6.7B parameter GPT-SW3 model on an Nvidia A100 GPU provided by CGit, and provided access to the model via an API that was built exclusively for the exhibition.
The most important work, however, was done by Alice Heiman, who has been part of the NLU group since its start in the autumn of 2021, and who designed the prompt templates that were used to make GPT-SW3 behave in a manner similar to Ishiguro’s character Klara. The prompt templates consist of three different parts:
- A header that specifies the general theme of the discussion in terms of a number of keywords (e.g. “sun, clouds, sun beams, sundown”).
- An intro that provides the instruction to the model. This intro specifies the roles of the participants of the discussion (the visitor is “the philosopher Josie” and the chatbot is “the robot Klara”), as well as the desired behavior of the chatbot (“the discussion is respectful and considerate, and gives wise answers…”).
- A set of themed questions and answers that provide examples of how a conversation between Josie and Klara may look like.
The prompt template was not visible to the visitors, but was prepended to every user input each time GPT-SW3 was called through the API. This means that even if the visitors only wrote very short inputs to Klara (such as “hello”), the model’s input always consisted of the entire prompt template plus the user statement. In order for Klara to remember what was said previously in the discussion, we also fed the last 6 dialogue turns (if available) as context to the model (in the “user question” part of the prompt). This context acted as a sort of memory that enabled Klara to refer back to things that were said previously, which significantly contributed to a more coherent conversation. However, since Klara had a continuous stream of visitors, we reset the conversations and the terminals after 1 minute of idle time, so that the previous conversations would not affect the experience of new visitors.
Klara and her visitors
The exhibition had over 40,000 visitors. Quite impressively (since Klara was hosted on a lab machine), Klara had zero downtime during the exhibition. In total, Klara had 7,437 conversations with visitors, with a mean length of 10 dialogue turns per conversation. The longest conversation was 525 turns (half of which is Klara’s statement and half is from visitors), and the second longest conversation was 369 turns (half from Klara, half from visitors).
The length of the conversations are shown in the image below (number of turns is on the y-axis, and time is on the x-axis), and the image even further below shows the distribution of conversation lengths (number of turns is on the y-axis, and number of conversations is on the x-axis). As expected, there were only a few long conversations and a lot of short ones. It should be noted that during busy times at the exhibition, the terminals probably did not reset between visitors, leading to several conversations being logged as the same event with a large number of dialogue turns.
The Klara chatbot was the first public display of GPT-SW3, and the first time the general public in Sweden had a chance to interact with a large generative Swedish language model. Even if Klara was only based on a relatively small and basic generative language model (i.e. our 6.7B parameter model that was only pretrained using standard causal language modeling with no instruction tuning), Klara worked for the most part remarkably well, and visitors were in general very positive to the experience of talking to Klara. This is, of course, primarily thanks to the clever prompt design by Alice Heiman, but it also shows the promising capacity and generality of GPT-SW3.
It is important to stress the difference between general GPT models such as GPT-SW3, and instruction-tuned versions such as ChatGPT, which are specifically finetuned to follow instructions and to understand user intents. The size of the investment and resources that has gone into developing ChatGPT cannot compare to the very limited funding and resources that has been available for the development of GPT-SW3. Even so, we (the NLU group at AI Sweden) are currently working on ways to augment and refine our basic GPT-SW3 models, and we believe that future chatbots based on GPT-SW3 will likely use some form of instruction tuning, and possibly also other refinements such as various forms of retrieval augmentation or the capacity to use tools.
On our part, we are thoroughly impressed by the remarkable foresight and courage of the Nobel Prize Museum to include in the exhibition an unconstrained general-capacity chatbot built on a bleeding edge large generative language model. The exhibition and collaboration with the Nobel Prize Museum and YOKE was a great experience for us, and we think that the intersection between art and science is a great place to showcase the capacities of generative AI. We are proud to be able to preserve one instance of Klara at the AI Sweden office in Stockholm, so if you happen to visit us sometime in the future, you might be able to meet Klara in person.
GPT-SW3 is developed by the NLU group at AI Sweden, with contributions from RISE, the WASP WARA for media and language, Nvidia, and the National Supercomputer Center at Linköping university. If you are interested in trying GPT-SW3, you can sign up for access, and once approved you can find our models on Hugging Face.