Visually Capturing the Essence of a Written Work
Creating a machine learning algorithm to generate barcode-like images capturing the emotional quality of a script
The discussion revolving around machine creativity has led to a boom of projects merging the humanities with computer science. With increased use of machine learning in traditionally human-dominated fields, this project explores its possible use in qualitatively capturing the essence of a written work in a visual format, with special interest in cinematography. The project has a potential to become a new type of idea generation tool for the visual media industry.
Artificial intelligence (AI) and machine learning (ML) algorithms are the area of great interest and attention in the recent years. Every startup is trying to take advantage of this mysterious black box that is able to perform tasks previously impossible to achieve, utilizing a huge amount of data and a series of neural networks that resemble the human thought process. Its use transcends the tech industry, bleeding into the field of art and music, areas often deemed to be reserved for the human creative mind.
Machine learning (and neural network) is not a new invention; the concept of using a series of individual scalar factors in order to break down data into a network of equations existed in the 1980’s. In its inception, however, there was not enough processing power and data storage capabilities to justify and run such complex method of translating data. With the increase of processing power and essentially unlimited data storage available, the neural network is gaining attraction and into new fields.
Creativity and Machine Learning
The question of whether creativity is something unique to the human trait is an interesting question, one that many computer scientists and artists have tried to challenge by merging the logical with the creative. Daniel Susskind explores a series of projects involving visual, musical, and literary work produced by machines in his book A World Without Work: Technology, Automation, and How We Should Respond. Most of the projects focus on the computer’s ability to produce original work, coming up with new images or music, new interpretation of the observed world.
While I think the discussion is worthwhile, I think the debate is centered around a wrong direction. The notion of being able to create something from nothing is inherently paradox — it would be like creating motion from an absence of energy. Humans do not create something from nothing. We observe the world and interpret it into a different form. We take in inspiration, translate the world around us. In a sense, the word original refers to the uniqueness of the translation rather than the end product itself. And creativity is the ability to come up with a unique process of reinterpretation.
Human creativity demands a lot from the world. For centuries, we have observed the world around us and translated the visual medium into words, a series of holistic representation of the visual space, essentially a summary or capturing of the world into a new format. In terms of data, it is transfer of the visual medium (a series of pixels and their respective locations and qualities) into a series of well-governed, rule-based assembly of basic components (words and sentences). Because of this, I think machines are capable of creativity.
Visual Representation of Written Work
The most familiar form connecting words and image has been movies. Cinematography takes upmost care in creating a visual setting that captures the essence of the written work in order to implicitly (or explicitly) portray the emotions and intentions embedded into the words. Cinematographers and producers use forms and colors to explore the range of human emotion, and with the surge of animation films developed by studios such as Dreamworks and Pixar, the intentions have become even more explicit and important.
The process of visualizing a written work has been a task reserved for humans only. With the advance in natural language processing (NLP), however, it is now possible to break down written work into a form that computers can handle. This means that computers are now capable of visualizing written work just like many projects involving data representation, such as the work of Stefanie Posavec. This gives a new and exciting opportunity to explore machine creativity in films, creating holistic visual representation of movie scripts without going into production. Currents projects of visualizing movies require finished visual medium. Projects such as Movie Barcode, Cinemetrics, and A Viz of Ice and Fire use color as a primary form of summarizing and characterizing movies and shows.
Natural Language Processing
In order to work with written work, I first explored methods of translating written work into manageable datasets. Initially I was thinking of utilizing the NRC Valence, Arousal, and Dominance (NRC-VAD) Lexicon by Saif Mohammad, characterizing sentences into 3 dimensions of valence, arousal, and dominance. However, due to the more casual nature of the written work, I opted for VADER sentiment analysis by C. J. Hutto and Eric Gilbert, which was originally created to analyze social media postings. Each sentence was time scaled (by sentence position), with sentiment score associated with it. Thus, a written work was translated into a database of sentence position and emotional score.
Using the model used in Generative Adversarial Text to Image Synthesis by Reed et al. as basis, I was able to associate the emotional quality data set derived from a movie with each movie barcode generated from the said movie. Using that as a basis, I fed in new movie scripts that have not gone into production yet (taken from students taking a screenplay writing class), producing the following images that capture the emotional qualities of each work:
Discussion and Future Directions
This project at the current state is more of a proof of concept showing that machines can be used to create visual representation of original written work.
The network seems to be capable of detecting the significance of the order of the barcode lines. Since the barcodes are created scene by scene, there is a sense of progression from left to right embedded. The emotional qualities towards the beginning of the script needs to be related to the left side of the produced images. There are moments where the lines are not quite clearly divided, but this is expected since the source images have interrupts as well. The resulting images tend to be more limited in terms of color palette, which may be due to the absence of fidelity in emotional qualities captured. Using a sentiment analysis score ranging from -1 to 1 may have limited the dimensions of complexity required to correctly capture the emotional qualities of the scripts.
Future projects can use this as a basis in order to create a more detailed and intuitive visualization, perhaps in forms of 3D models or character association. Creating a character quality map similar to that of Andrew DeGraff’s Movie Maps to show how characters emotionally interact with each other could be an important tool for the production crew as well. It is also possible to curate the datasets specific to a cinematographer or a writer in order to capture his style and impose them onto other work.
Associating images with text is not limited to films; such method can be used to generate ideas about set designs for plays, geography for shows, posters for films, book covers, and so on.
The film industry has been employing cutting-edge technology for the production side. The industry has recently been utilizing improved CGI, motion capture, and camera equipment, yet they solely rely on human mind to plan the films. With the involvement of machine learning, the production crew now has more sources of inspiration and possible prototypes of the film without having to dedicate a huge amount of time and resources. It is my hope that this project becomes the foundation of a new prototyping tool for the visual media industry.