Whose Imagination is it Anyway?
An examination of shifting agencies in photographic production in the age of machine learning and artificial intelligence.
Photography then and now
Photography has enabled us to capture and represent our imaginations — from scientific documentation to fine art. Photographs are probably the most closest representations of what we see physically, therefore by nature they become evidential — an objective slice of time and space represented. Before photography, paintings were held accountable to an intervention of the human hand and therefore its evidential nature was always questionable. Photography on the other hand somewhat escapes the human hand. Beyond a certain point [subject, framing, setting etc.], it goes into a purely industrial process of arresting light onto a substrate. Thus all photographic representations hold within themselves an objectivity of the medium, and the imaginations are tied up to believable referentiality.
However, this objectivity of the medium has always been played in various ways. In the pre digital era, techniques like double exposure, airbrushing, combining printing etc. were used to manipulate and alter photographs. The manipulations techniques then, were complex, arduous and took expertise to create a seamless image. The rise of the digital image, gave us the freedom to play with digital information in many more ways. Digital photographs by their inherent nature can be reshaped more easily through pixel manipulations. They extend our imaginations beyond the just depiction of reality but at the same time maintain a sense of believable referentiality since they are photographs.
Today, we live in a world of doctored digital photographs where the desire to closely represent imaginations outlive the questions on authenticity, truth and relationships between the photographic image and its real represented object.
Section I — Imaginations are more {photo}real
The selfie cameras of today have ended the age old search for the Fountain of Youth. It is embedded with de-aging, face beautification via skin smoothing algorithms that brings back the youthful self to a certain extent. Besides just going back in time, the desire to know the future has given rise to various social media applications that let one see their future self by using ageing algorithms. Though the representation is an estimated approximation of our younger or future self, its photorealism supersedes its authenticity. One can not only realise their desire to be young or old, but can also store this imagination of themselves as a digital photograph.
Similarly, the idea of shapeshifting or the imagination of being someone/something else has been an integral part of folklore, myths and literature all over the world since times immemorial. The werewolves and vampires of yesteryears have been replaced with cat and dog filters. These filters seem to be harmless and cute until we think about adding another technological layer to them. By integrating the same filters with machine learning and artificially intelligent engines, it is possible to create deep fakes which synthetically replace people in existing images and videos to create potentially deceiving content. Never has it been easier to imagine oneself as something or someone else and produce a photorealistic evidence of the same.
Though nascent, these examples illustrate that there is a shift of agency in cultural production. Today anyone can portray themselves as someone or something else or has access to their own Fountain of Youth. The imaginations are not restricted to knowledge powerhouses in charge of cultural productions, instead they are more equitable and decentralised. The current imaging technologies have made it possible for the common man to represent their imaginations retaining believable referentiality.
But do these photo realistic images really represent our imaginations?
Section II — Programmer’s Imagination
The digital camera in essence is a combination of an analogue camera and a fairly advanced computer. The light is captured on a sensor instead of film, and then, sensor values are processed by the computer to render a digital image. There are various algorithms at play while the raw sensor data is rendered as an image. This is where it becomes interesting to think about the creators of these algorithms and how they see the world.
Whether a camera or a computer, a black box is a device with an input and an output. If you feed data into a black box, it will be output as information. Significantly, the kind of information that the black box outputs depends, not on the kind of data that it is being fed, but on the kind of invisible processing that is taking place inside it. In the case of the digital camera, for instance, it is an entirely arbitrary decision that the data that is placed within the camera is being output as a picture that has a visual resemblance to the object in front of the lens. What the camera outputs is determined, not by the object that is being photographed, but by the authors of the code that instructs the algorithms how to process the input data. [1]
While the megapixel race continues to further the hardware of cameras, there have been significant improvements in their capabilities through the use of image processing and computational photography. Image processing adds in a lot more algorithmic blackboxes within the camera apparatus and each one of these algorithms brings in the imagination of the authors of that particular code into the output/image. Like the early colour film that brought in racial bias by rendering a coloured person without much details except for the whites of their eyes and teeth, digital photo sensors bring in the biases of the programmers of the code into the photographic representation through algorithms. For example, the white balance setting “Vivid” is different for different manufacturers of digital cameras, which indicates that the programmers of the code see Kodak’s Vivid film differently. Similarly, if we were to think about the selfie camera with their de-aging and skin smoothing algorithms, the idea of being young is probably represented through the eyes of the programmers and their understanding of being young and the shape shifting algorithms are limited by the imaginations of the programmers of those filters.
Section III — Into machine imaginations — a classification
What did you dream? It’s alright we told you what to dream.
“Welcome to the Machine” by Pink Floyd from the album Wish You Were Here (1975)
In the world of computing, sometimes it becomes very difficult for programmers to program all the ifs and else’s to produce an accurate simulation of the real world. The accuracy of the algorithms is enhanced by using algorithms that go through varied conditions and larger datasets. These algorithms train machines through a lot of data so that they develop certain intelligence to generate accurate real world simulations. Even in photography, a lot of machine intelligence is being trained around the idea of the image — what is a good photograph, what is good lighting, what is a good portrait, what is a good sky etc. etc. These intelligent systems make many decisions on behalf of their masters to create technically competent images. The neural networks, while we press the button to capture an image, adds algorithmic layers of their imaginations. Knowingly or unknowingly, we have crossed into the realm of machine imaginations where in addition to the imaginations of the authors of the code, imaginations of the machines are also making their way into the photographs.
“AI is everywhere. It’s not that big, scary thing in the future. AI is here with us.” Fei Fei Li
To understand the extent to which AIs are getting integrated in photographic production, we need to look into them more deeply and understand where and at what stage they are at. Let us assume the following classification to dive deep into the scope of AI making images and their implications.
Class I machine imaginations — machines that have narrow scope AI and can perform a specific task autonomously. These machines cannot do much beyond what they are programmed to do, and thus have a very limited or narrow range of competencies. These imaginations lean towards create a technically superior image.
Most of class I machine imaginations are trained to achieve very specific tasks. These imaginations play out as assisting features to mitigate tedious processes which requires limited intelligence (colour balance and corrections, light balance in highlight/shadows, selecting the best photograph amongst a burst of photographs etc.). For example, the low light mode merges up to 10 images to amplify light and then de-noise the image. The High Dynamic Range (HDR) mode, which is mostly on by default, can merge up to 20 input frames to retrieve the highlights and shadows from a scene. Similarly, the portrait mode of the cameras lets one create a fictional/non existent depth of field effect where the subject becomes in focus and the rest of the background blurs, where the machine creates a simulation of a reality that mimics an actual lens blur. They do so by creating a depth map using machine learning that helps to identify the subject and create a lens blur mask.
Quite apart from technical enhancements, cameras powered by neural networks have even taken over the Decisive Moment; today it is the camera that decides which photograph best represents its subject matter from a burst of photographs. Another technique called Image Inpainting uses machine intelligence to create alternative contents for the reconstruction of the missing or deteriorated parts of an image. This is how the camera apps allow us to capture more details programmatically increasing the resolution and adding information to generate images up to 32 megapixel from a 12 megapixel sensor. Similar technology is used in forensic facial reconstruction, where one can reconstruct faces of our ancestors to depict and imagine them as individuals rather than specimens.
Class II machine imaginations — machines that could alter representation therefore memories. These machines would be able to better understand the entities that it interacts with by discerning their needs, emotions, beliefs and thought processes. They would be able to understand the context and generate representations that manipulate narratives to create memories and remembrances.
AI generated art has been there since the 1950’s but now there also exists image generating AIs which can generate photorealistic portraits and landscapes that do not exist. These image generating AIs use General Adversarial Networks(GAN) where two neural networks are pitted against each other to generate new synthetic instances of data that can pass for real data. Class II machine imaginations venture into creation of photorealistic synthetic media and are widely used in image, video and voice generation.
Websites like generated.photos and thispersondoesnotexist.com have built in proprietary datasets of images of thousands of people which are used to generate portraits of people who do not exist. Similarly NVDIA’s GauGAN & GauGAN 2 can generate photorealistic landscapes which are hard to point out unless looked upon closely.
When Image Capturing Devices (ICD) in future are integrated with above mentioned neural networks, they could change the subject matter drastically. They could change the faces of the subjects in a photograph or change subject expressions in a portrait. They could even decide on behalf of us that a particular scenic background looks better with some mountains and a tree and simply render an image. All these possibilities are far nearer than distant.
Rick : Everybody just relax for a second. There’s no such thing as an “Uncle Steve.” That is an alien parasite.
Jerry : But I’ve known him my whole life!
Rick : No, you haven’t Jerry. These telepathic little bastards, they embed themselves in memories, and th-then they use those to multiply and spread out, take over planets. It’s — It’s disgusting.
Morty: Steve wasn’t real?
The television series Rick and Morty episode, Total Rickall introduces a species of alien parasites that implant people’s minds with false memories of invented characters being people such as friends, family members, etc. Once they have multiplied enough, they use this to take over the planet and enslave the race. The image capturing devices (ICD) of the future might not be able to take over the planet and enslave our race but they would definitely affect our memories — what we remember and how we remember. ICDs able to generate and change subject faces, might be able to get rid of all the photo-bombers or even replace people we don’t like with machine imagined faces. However, on visiting these machine-imagined images later, like those parasitic aliens, our brains would try to remember these non existent people and construct memories that are perhaps completely fictional. Since, our minds care more about crafting a good narrative than staying close to the truth, our memories always feel true, despite being extremely vulnerable to errant suggestions, clever manipulations and seamless altercations[2]. In a way, these machine imaginations could possibly plant memories.
Morty : I figured it out, Rick! The parasites can only create pleasant memories. I know you’re real because I have a ton of bad memories with you!
Once Morty figures out that the parasites can only create happy memories, they eliminate all of them to save the day. Similarly the ICDs capable of changing human expressions in images would make us appear to be happy, smiling, satisfied or content. Phrases like “Say Cheese” might become outmoded since the ICD would automatically make us smile. Not just a smile, but they may always try to present us as pleasant, socially more appropriate and acceptable, irrespective of how we feel. In a way our representations in the future could possibly be machine imagined without us having much agency. In the event when these images are seen as documentation of the past, they could present a very different narrative. A narrative woven by machine imaginations.
Today, there still remains a choice to engage with these technologies to generate a photorealistic image, but soon there will be a time where these technologies will come pre-built with the image capturing devices. In such a world where these image capturing devices exist, there will be a constant engagement with machine imagined synthetic media. The cycle of the production, circulation and consumption of this machine imagined representations would govern what we see, how we think and what we remember.
Class III machine imaginations — machines that can create new ways of seeing. These machines could create imaginations that are well beyond our understanding of the current media.
It is very difficult to imagine what these imaginations would exactly be. We can see a starting point of what class III machine imaginations could be in the variations that openAI’s DALL.E.2 generates. With DALL.E.2 there is a new way to create images without having to actually execute them with paint, cameras or code. The input is a simple line of text which is interpreted by the machine to create unseen images of bizarre subject matter compositions. In essence DALL.E.2 has opened up a new way to think about images through words.
If we extend this idea and look at the movie, The Matrix Revolutions (2003), by the end of the movie, the protagonist — Neo turns blind but obtains a “third eye” through which he can perceive the true nature of things, instead of being fooled by their external appearances. To understand the machine, Neo has to see in a new way that transcends beyond what is seen by the human eyes and seeps into the world of the machines. It is this true sight that guides him through the climax of the series and allows him to defeat Agent Smith (machine). In this way, Neo, now able to perceive beyond the five senses is able to see the conflict from the machine’s perspectives and thus sacrifices himself and ends the unsolvable crisis on which The Matrix series is based.
Taking a cue from this, the class III machine imaginations could push us to see in newer ways. DALL.E.2 pushes us to think verbally to see graphically and the machines in The Matrix pushes Neo to see and therefore understand the machines. Similarly class III machine imaginations might create a new primordial way of understanding and imagining. This new understanding may reside in the liminal spaces of information bits. The land where these machines dream.
But the dream of machines is a far distant thought therefore these imaginations are very far.
Section IV — Towards a new imagination
If all creative and knowledge work will become the domain of AI, what will be left for humans? What will be the purpose of our existence? Watching endless films created by AI, listening to AI-generated music, and being driven in driverless cars around AI-generated cities?
Many modern thinkers and artists have envisioned a future where humans, liberated by machines from mechanical and boring work, will be engaging only in play and art (e.g., Constant’s New Babylon). But if automation of cultural production by AI continues, eventually it will be these AI playing and making art — not us. [3]
In both analogue and digital photography, the photographer took away some agency of their subjects and then the photo developer (analogue) or the photo manipulator (digital) took away some agency of the photographer in how they presented the final photograph. However in both cases the agency shifts were at the human level. With algorithms being inseparable from the photo creation processes we see a shift in the agency from humans to machines. From noise reduction to age alterations, we already see and use a variety of these class I machine imaginations in our day to day photographic processes. We see class II machine imaginations in photo applications that curates photos from the photo library and presents us with a visual narrative (photo album) in their imaginations. With class III machine imaginations we are moving towards an epistemic breaking point in how we think about images.
If we were to think about the nature of this breaking point, it would be unlike that from two to three dimensions or from the traditional image to the technical, because they didn’t fundamentally change our ways of thinking or imagining. Instead, it would be a more fundamental breaking point like how photographs that achieved perfect realism, opened up newer ways for artists to think and imagine — cubist, surrealist and abstract expressionist imaginations. Similarly machine imaginations could possibly revolutionise the way we think and imagine.
Any technology robs us of a certain agency with the limitations that it brings along. A millimetre scale doesn’t let us measure in between millimetres. It trains us to see what it can measure or capture in-turn limiting our thoughts. At the same time it opens up a zone of subjectivity (in between millimetres), a zone of the unknown where imaginations exist. Photography limited us to a representation of what is seen — a slice of time and space, nonetheless the photographer found out ways to counter this objectivity by imbibing the photograph with symbolism and contextualism. Similarly, machine imaginations will probably limit us to think how they want us to think. Perhaps they could do all the thinking for us but at the same time we might be able to access more complex parts of our minds to further develop thinking and imaginations as we know.
Conclusion
I’m an eye. A mechanical eye. I, the machine, show you a world the way only I can see it. I free myself today and forever from human immobility. […] My way leads towards the creation of a fresh perception of the world. Thus I explain in a new way the world unknown to you.
With the machine imaginations seeping into photographs, the above lines from 1923 by Dziga Vertov, find a renewed perspective in our post truth world. The mechanical eye captures light in the same physical way (shutter, aperture and a light sensitive material). However, the mechanical eye is not just mechanical anymore. It is attached to a digital brain, which takes the image into the ephemeral zone of 1’s and 0’s in an attempt to free itself of the human more and more. The power of the mechanical eye and the technical image has increased manifold [4]. Living in a world of misinformation and memes, we see that these generated synthetic hyperreal visuals govern our imaginations through various algorithmic filters. The cameras of today are transforming into something which knows us, understands us and shows us what we want. Perhaps it controls us as well. It tells us what to see and what to remember — creating a perspective of the world as they imagine. Most photographs of today are tainted with computational processes and when they become memories in the future, they would present us with a very different past — a documentation that was not or was entirely machine imagined.
Footnotes
[1] “The New Paradigm.” Fragmentation of the Photographic Image in the Digital Age, by Daniel Rubinstein, Routledge, Taylor & Francis Group, New York, 2019. Copyright. Routledge History of Photography, 2020.
[2] Lehrer, Jonah. “Memory Is Fiction.” Wired, Conde Nast, 4 June 2010, https://www.wired.com/2010/06/memory-is-fiction/.
[3] Manovich, Lev. “AI and Genre Conventions.” Ai Aesthetics, Strelka Press, Moskau, 2018.
[4]“The difference between traditional and technical images, then, would be this: the first are observations of objects, the second computation of concepts. The first arises through depiction, the second through a peculiar hallucinatory power that has lost its faith in rules.” Flusser Vilém, et al. “To Abstract.” Into the Universe of Technical Images, University of Minnesota Press, Minneapolis, 2011.