Deep Fictions — Experiments in artificial intelligence and reimagining fiction
Teaching an AI to generate portraits of characters from famous novels based on their textual descriptions.
Our culture, environments, and biases all help to shape the images we see in our mind’s eye as we read texts — of novel protagonists, for example. We need look no further than Hollywood’s whitewashing of cinema to see often problematic depictions of beloved characters. I was curious whether machine learning algorithms can serve as substantive interventions in this text-to-image translation.
As AI improves, could generative storytelling help to stretch our imaginations of who we envision in popular narratives?
To explore this question, I created a project called Deep Fictions. The project utilizes neural networks and natural language processing to generate portraits of characters from famous novels based on their textual descriptions. Through this research, I wanted to explore whether it’s possible to diversify homogenous datasets through generative neural network algorithms, and to examine how such algorithms might reflect on our own imaginations of characters from popular culture.
Data & Algorithm Pipeline
This project required two types of datasets: a corpus of novels and a large collection of tagged, facial images. For the text data, I used books from Project Gutenberg, which is composed of books in the public domain. The facial data was implemented via the CelebA dataset, which contains over 200,000 celebrity face images, each tagged with 40 binary attributes. It’s important to note that the CelebA dataset contains predominantly white celebrities, which is addressed later in developing the neural network. Additionally, the labeled attributes lack nuance, for example attributing a male or female gender to all images.
There were 2 main techniques involved in developing this project: Word2Vec and Generative Adversarial Networks (GANs). Word2Vec is a process that can identify words that are contextually similar in a large corpus of text. It was useful to find adjectives that closely describe characters in novels. For instance, in Bram Stoker’s Dracula, two words that describe Count Dracula are “horrid” and “criminal.”
A Generative Adversarial Network (GAN) is an algorithm that’s composed of two neural networks engaged in a perpetual war of attrition: a generator and a discriminator. We can think of the generator as an art forger and the discriminator as an art historian. The discriminator in this case is trained on a dataset of art images. The generator creates images using noise as an input and will try to fool the discriminator into believing the image is a real work of art. Initially, the discriminator will reject the image as a fake, allowing the generator to better learn what a real image looks like. Over time, the generator will create an image that is realistic enough to fool the discriminator.
For this project, I utilized an InfoGAN, a type of GAN that inputs both an embedded label and noise into the generator in order to guide the generator to create images with specific attributes. In this case, these attributes were the labels associated with each image in the CelebA dataset. If you’re familiar with image-to-text generation, for instance generating the text “yellow” and “fruit” given a picture of a banana, this algorithm works in the opposite direction (text-to-image), generating a picture of a piece of fruit given a set of adjectives.
For this project, I focused on two characters from two separate novels: Dracula from Bram Stoker’s eponymous novel, and Elizabeth Bennet from Jane Austen’s Pride and Prejudice. I chose these novels because they were listed as two of the most popular in Project Gutenberg, are widely known, and have been adapted into films.
The first step was to find words that describe these characters from the Gutenberg corpus. I generated a Word2Vec model using a subset of 4000 books, and found the adjectives that were in closest proximity to each character using cosine similarity. For example, the closest adjectives to Dracula were: “criminal,” “hypnotic,” “mysterious,” “horrid,” and “humble.”
The next step involved using these adjectives to find the most similar face labels, again using cosine similarity, from the CelebA dataset. For instance, the previous Dracula adjectives translated to: “black hair,” “old,” “male,” “dark eyes,” and “serious.” Then, I generated an averaged face from these filtered images as a baseline. The resulting image only reflected the images in the CelebA dataset and lacked specificity, so the next iteration aimed to generate an entirely new face using an InfoGAN.
After a considerable amount of expensive processing time, the InfoGAN produced portraits for each character.
The GAN generator network can be tweaked to create more photorealistic images. However, I opted to prioritize new possibilities — choosing unexpected, somewhat abstracted portraits of Dracula and Elizabeth Bennet, over expected, photorealistic ones.
Neither of these final images are present in the CelebA training dataset — both are influenced by this training data via the GAN network but are altered by different parameters into an entirely “dreamt” portrait. I was gratified and surprised that the output images were much more diverse and ambiguous than the training data. I was able to generate a variety of different faces for each character by altering the GAN’s input.
While these generated portraits might not reflect the images we had envisioned, they remain faithful to the faces the algorithm has seen and the descriptions of the characters in the respective novels. Perhaps it’s worthwhile pondering why these differ so dramatically from the faces that we had envisioned.
Much well-deserved criticism of machine learning focuses on how crucial models generalize populations and equate raw data with fact. As different sectors continue to rely on data-driven decision-making, and such models become critical infrastructure, these issues are becoming ever more prominent. Deep Fictions speculates whether our biased imaginations can be modeled, and explores the potential blind spots this process might help us reveal.