The Visual Language of Reality


The history of human culture is told and explored through story. In the beginning the medium was the voice, as stories were told around a fire and passed on through oral tradition. Later the medium changed to text on stone or paper, and still the story was recited through an actor’s voice. With the advent of photography and cinema a new visual language began to be explored. The narrative techniques for staged live performance were slowly replaced with newer more powerful techniques like editing, framing and compositing of imagery, enabling the artist to manipulate time and visual reality in order to tell the story in a whole new way.

The traditional stage methods of lighting and props were still used in films, but with the ability to slice a segment of film into parts and include totally different shots in sequences, the medium took an exponential leap in its ability to entertain and tell stories. These techniques have been further enhanced by the use of digital editing, compositing and virtual model making. We have come to the point where any fiction or reality can appear real, time travel, space exploration, death, re-birth, it is all within the realm of possibility when viewed through the eye of the cinematographer.

Now there is a new form of digital media brought about by the advent of digital graphics technology and cinematography called virtual reality. Virtual reality and its hybrid sibling augmented reality offer a new more immersive ways to tell stories. In many ways they should be the end of the evolution of story building, having come full circle to the point where the viewer can finally be the actor in the story if they choose to be. There is however a problem. The language of cinematography which enhanced film narrative to the level of art relies very heavily on the two dimensional nature of film. The frame of the shot and the ability to cut from one view to another instantly without cutting through the viewer’s engaged immersion is key to the powerful effect good cinematography has on story. Even theater was framed by the stage with the viewer’s perspective constrained to the lit view of the set and actors. The problem for virtual and augmented reality is that the medium surrounds the viewer and in the case of augmented reality overlaps the world around them.

How do you frame a shot when the story occurs in the world around you? How do you control a stage when the actor’s may be unknown individuals, on their way to work, walking between you and the focus of the story as it unfolds in the environment you both share? Is it possible to take a story intended for a specific place and allow it to occur anywhere? These are the questions that must be considered when thinking about narrative and visual story telling as it applies to digital realities.

Some may argue this is a problem that is already solved. Games are virtual realities, and they have stories, sometimes multi-narrative stories applied to them. So far games and narrative games have been played on a framed screen. Some have been attempted in virtual reality, which is easier to control from a director’s perspective. In a discussion with most people who have made these types of digital realities you will often hear the word “tradeoff” used. The interaction might be constrained to an “on-rails” approach or the interactor is standing on a platform which just happens to fit the narrative of the story. Some more “exploratory” examples might constrain the user to single room with one character and some props for exploration. There is a reason for the limitations applied to most virtual reality experiences. It is very hard to create an immersive compelling narrative when the view is controlled by the viewer. Imagine a movie which showed the director’s view behind the camera with no ability to cut from one shot to another and you begin to see the problem.

By exploring the theory and evolution of visual story telling solutions for theater and film it may be possible to develop a new visual language which allows the new medium to evolve and enables even more powerful forms of narrative. Since the advent of story, narrative and location have been locked together. With the advent of augmented and virtual reality it should be possible, given the right visual tool set, to allow any narrative to be retold specifically related to any location. Obviously, this magnifies the problems of traditional cinematography when applied to augmented reality, since the stage or set changes depending on the viewer’s location. For that matter who is the viewer in the narrative? Is she the main character, a third person viewer like a traditional film, or an extra in the movie? This further complicates the problem. However, given the powerful techniques of modern film and digital media, these problems are not insurmountable. They simply require a new way of thinking about visual story telling.

Narrative and Performance Studies

When film was first shown in cinemas the event was very similar to the production of a theatrical play. There was a stage and a screen on which the film was projected. And so the notion of how films could tell stories took on the methods of live theater. Some of the masterpieces of the time employ staged sets very much like a theatrical production would.

‘le voyage dans la lune’ illustrates the obvious theatrical staging influences in early cinema.

Eventually film evolved its own language in cinematography and montage. Now it was theater’s turn to borrow from cinema. The “epic theatre” of Berthold Brecht with its Verfremdungseffekt (translated as “defamiliarization effect”) uses several concepts such as a montage technique of fragmentation, interruptions in action, breaking the fourth wall and contrast and contradiction to allow the audience to feel detached from the play. The goal of these techniques was to promote rational self-reflection in the viewer. [‘Understanding Brecht’, Walter Benjamin. 1983.]

These types of effects might also be used within digital realities to allow the viewer to focus on the intended focus of the narrative. Elements within a scene could be slightly different or paused in action compared to the world around them in order to guide the user to the next sequence of events in a narrative. The thing that interrupted the viewer to create self-reflection for Brecht might serve well to grab their attention within an environment with no other frame of reference.

Perhaps one of the best visual examples of these techniques, as they might be applied to a virtual world, is embodied in a scene from the Matrix. The scene involves the teacher Morpheus taking the student Neo into the Matrix to explain how the system works. On this tour of the matrix, Morpheus moves through a very crowded street when a woman in a red dress, the only thing red on the street, walks by and Neo is distracted by her. Then Morpheus stops the action of the scene allowing Neo to adjust his focus to the lady in question, only to realize with a shock that she is an enemy [The Matrix, The Wachowski Brothers, 1999]. Though the scene has many Brechtian elements about it, it serves as an excellent example of how those same techniques can be used to focus attention in a chaotic immersive world.

The woman in red stands out from the crowd in the Matrix.

The concept of defamiliarization can be explained as taking something common or taken for granted and presenting it in an unfamiliar or strange way. The Russian literary critic Victor Shklovsky declared this to be the essence of all art [ The Theory of Prose, translated from “Art as Device”, Shklovsky 1991 ]. The concept can serve well in the artifice of visual narrative as well, as witnessed in the Matrix.

If the red dress scene from the matrix was a virtual reality, and you were in it, who would you be from a narrative point of view? You could answer Neo, it is the obvious choice since he does not talk and is following Morpheus who is giving the speech. You obviously are not the lady in red since you notice her. There is another choice though, you could fill the role of the observer which is usually the role given to us in film and theater. Virtual reality transports us inside the fourth wall to the confines of the boundary as Janet Murray describes it in her book Hamlet on the Holodeck. The problem with being a third person viewer in a virtual reality is the lack of interaction. You are like the viewer who has wondered up on stage, immersed, but somehow in the way unless the author writes the narrative with you in mind. You could take on the role of the extra or supporting actor, but is this an interesting role that takes advantage of all that virtual reality affords?

In a recent interview Ed Catmull, co-founder of Pixar animation, warned that virtual reality technology is not storytelling. He stated that while virtual reality may have interesting contributions for gaming that he believed it was inappropriate for storytelling. He said, “Linear narrative is an artfully-directed telling of a story, where the lighting and sound is all for a very clear purpose. You’re not just wandering around in the world.” [ Interview with the Guardian. Stuart Dredge] This gets at the crux of his view, which has some merit but fails to consider the history of storytelling. He is comparing the art of cinematography to a new medium which has not discovered its own grammar. Virtual reality may try to mimic films and cinematography as a narrative medium, but eventually it will discover its own narrative tool set. In addition, what Catmull says is true; an artful story is well crafted and has a very specific designed and linear narrative. The concept of roaming through the story world seems to trample the concept of authorial intent, but does it really? In a poorly designed virtual reality it would, but what universal law of narrative prevents the viewer’s ability to freely explore a story world and the author’s intent for how the story unfolds? One “law” would seem to be pacing. If in the middle of a car chase scene a user stops the car and opens the door to look at a dead animal on the side of the road, the authors desire to create a tense and heart stopping action scene falls dead itself. Truthfully though, the concept that the viewer had to experience that particular scene the way the author created it is an illusion. Many people might step out of the theater to refill their popcorn in the middle of the film, how does this differ from a person walking around in virtual reality? Once the concept of authorial intent is seen for what it is, an illusion, then the concept that virtual reality is somehow inferior to film as a storytelling medium seems more ridiculous. In reality, the affordances offered by allowing the user to discover an author’s intent by exploring a scene from multiple viewpoints would be beneficial for both the author and the viewer. It is true that film’s language seems optimized and extremely specialized for telling stories, but to imagine that the language of film existed from the advent of photography is a fallacy. It had to be discovered over time. In fact the language of cinematography is still evolving as anyone who works at Pixar should readily admit.

Consider your viewpoint, if the red dress scene was an augmented reality. In other words, what if the red dress scene occurred in the real world around you? How is narrative viewpoint different in VR and AR? There is a subtle but powerful difference between a completely virtual environment and one which overlaps the real world. The real world has chaos, real objects to composite virtual objects against, real sensory experiences and many unexpected events. There are a million little things that create an impenetrable boundary wall within augmented reality narrative. This can be problematic for narrative but it provides affordances as well. One important affordance is that the scene does not have to rely completely on computer graphics for its environment; it can use the environment around it. The scene would not need hundreds of cg avatars and AI agents (programmed agents not Agent Smith) to animate them. We could take the role of any character in the scene as long as we played the part. There are real perceptual issues with forcing alternative viewpoints upon a viewer in virtual reality which can cause physical discomfort and loss of immersion. These issues do not occur in augmented reality since most of the view is reality as the user’s body experiences it. Augmented reality narrative appears to dismantle the concept of an “artfully-directed” story, but does it really? Is an artfully directed story completely locked to the author’s intent to the point it can only occur in one time and place? This is the crux of the discussion for a visual language for digital realities. Many stories are intended for a specific place and time, but only because those stories were limited by their author’s intent and the medium for which they are created. Consider a dinosaur in a book, walking along talking to another character. A film director could add more to the scene in order to fill in details or require extra lighting if the event took place at night. The question is; does the extra lighting affect the author’s intent? What affordances would drive the virtual reality director to create some other twist for the scene allowing them to artfully direct the story for their own medium?

‘Gertie’ vs Pixar’s ‘the Good Dinosaur’

Film Theory

Is the “Subject” of a narrative an immutable object which cannot be represented in multiple ways? For that matter what is the “Subject”? In his essay “Film Studies and Grand Theory”, David Bordwell discusses the commonalities between subject-position theory and culturalism as they relate to film. Bordwell discusses theories of subjectivity, specifically how both systems conflate the category of subject with that of the individual. The subject in subject-position theory should be a category that enables knowledge, experience and identity to occur within signifying practices. However, most adherents of “1975 Film Theory” use it to refer to the individual ego and the imaginary and symbolic domains equally. Subject-as-individual permeates culturalists’ theory as well. Bordwell describes the proclivity to turn subjects into conscious individuals who can assume roles, thus reaffirming the social agent’s freedom [Introduction to Post-Theory, David Bordwell 1996]. To answer the question on the immutability of subject from the author’s perspective we will assume that the “Subject” can be both the individual within the story and the category of knowledge represented within the story. We can investigate the question of “subject” as it relates to the author’s intent and the perspective of the individual to see how this is affected by the medium of digital realities. Using the example of the dinosaur story mentioned earlier, we can imagine trying to take that story and swap the environment out with a modern metropolis. Let’s assume the author’s intent with the story was to illustrate the nature of friendship between two very different species. How does the transposition of place and time affect to the original “Idea”? The dinosaur might befriend a businessman rather than a caveman. The more creative author might take this opportunity to illustrate the similarities of going out in hunt of work on a daily basis versus the way cavemen hunted for their food. There would be other inconsistencies to fix as well, but if you look at most movies today we tend to relate them to our modern viewpoint in an almost automatic way. Film makers tend to automate this process for us by anthropomorphizing non-human characters and creating cultural references and lingo within the film to further connect with the intended audience. The observant reader may have noticed that I intentionally use the word “idea” instead of story at this point. What is a story, but a group of ideas told to illustrate more ideas. Obviously changing the place and time for a narrative changes the mechanical structure of that narrative, but does it change the ideas? Culturalists might argue that it does. Subject-position critics might argue that it doesn’t, either way the point of the exercise is to realize that story like all of reality has parts that are important to it and parts that are relative and changeable. The exercise begins to look like the revelations of directors at the beginning of the 20th century with regards to film and its ability to change perceived reality as it was recorded through the use of cinematography and effects.

New Media Theory

In his book The Language of New Media, Lev Manovich discusses the essential elements of digital media as being databases and algorithms. He compares this to early works of film such as Dziga Vertov’s “Man with a Movie Camera” in which Vertov cut footage and laid it out in large grid like structures in order to plan and edit his media into a new narrative. In addition Vertov used elements of montage and “kino eye” to alter the image of the original footage creating a visual language which allowed the seemingly random database of shots to tell a story. Manovich points out that the linear structure of narrative and the non-linear structure of databases would seem contradictory, but that in effect linear narrative is really just one path through the database of a narrative’s fabula (everything within the narrative world). To illustrate his points Manovich enlists Ferdinand de Saussure’s concept of syntagm and paradigm as they were further expanded by Roland Barthes [The Death of the Author, Roland Barthes 1967]. Manovich describes the linear narrative is the syntagm, linear path through the world, of the author’s paradigm, all the imagined paths the author might have described [The Language of New Media, Lev Manovich 2001]. This, better than any other concept, can describe the viewer’s choices in a digitally constructed reality. The virtual reality must therefore be the author’s fabula or paradigm while the path of the user becomes the narrative or syntagm explored by the interaction of the viewer and the author’s skill in guiding the viewer on the path she desires for the main character in her plots. Whether the viewer follows the author’s direction or trots of to explore some other element of the story becomes irrelevant. The truth remains, a narrative exists, a more interactive narrative world exists and the viewer may read into it what they will. This is not a new concept. It has almost always been the case that the reader/viewer was the driver and the author was simply a helpful guide giving them a map allowing them to explore the world. Some authors are more capable than others, which gives the illusion that they direct the viewer completely. Some readers are less inclined to diverge from the path of the story created by the author, but still they have their own interpretation of the story to follow. New media simply brings about a swap in the status quo, the author creates a paradigm and the reader picks the syntagmatic path of their choosing. Where the database story world becomes the main tool for the author creating paradigmatic rules for her fantasy world, the algorithmic deciphering of rules for that world carried out by the viewer becomes the artfully directed path of the viewer.


What are the narrative affordances associated with the database and algorithmic nature of digital media as well as virtual and augmented reality? In the beginning of the 20th century, film makers began to create their own visual language and techniques which enabled them to change the nature of the stories they could tell. For example a linear series of events could be cut and reordered to create a completely different perception of the story and event order. Suppose that everything about a specific time and place could be recorded as a series of events and media elements. This would enable the author to rearrange that database into any order and output various stories based on each new sequence of events. This is the advantage and essence of film as it relates to narrative. However, film does have its limits, in that it must be processed and can only be shot from certain angles. The framing of a film cuts off potential views of the environmental elements of a story, thus a master of cinematography becomes an artist known for his ability to create a masterpiece with every frame. That is the dogma of film and cinema, There can be no other view but the view framed and chosen by the cinematographer. This of course is a lie, considering the countless hours of footage left on the cutting room floor which find their way back into the “better” “movie as it was intended” directors’ cut. Obviously, the immersive nature of virtual reality allows the viewer to become the cinematographer, which with some guidance and visual trickery could be directed by the professional using elements of interface and visual language to allow the viewer to discover the best viewpoint for themselves. This brings up the interesting question of whether the artful direction of the cinematographer was the best possible choice for everyone or whether it was a mandate passed down from the dictator trying to impose his world view on each viewer. With VR/AR the power is handed back to the individual with director/author serving the role of masterful guide in their story world.

VR/AR gives the individual even more narrative control than they already possessed while enabling the cinematographer the opportunity to free themselves of the frame and attempt to see the world as a more artful reality. This of course comes with some very weighty pitfalls and problems for the medium which can literally make viewers sick given its persuasive inputs to their brain. For example, a cinematographer is free to establish shots at a distance and fly the character to the close-up perspective they intend for a scene. They may even choose a locked shot hovering thirty stories off the ground looking through a window. These artful methods are steeped in signs and symbols for the director and the audience. However, these same techniques become problematic within digital realities because they may break the viewer’s sense of immersion and physical engagement with the scene. The eagle-eyed view in film becomes the unintended vertigo inducing effect in reality. There seems to be no way to alter this without the use of props like platforms or avatar bodies which allow flight. These workarounds begin to weigh down the potential world rules with mechanical “filters” which tie the user’s imagination down and limit the author’s designs. This like the framed view of film becomes the advantage/disadvantage of digital realities at least until we crossover into Matrix and believe like Neo that we can jump across the fall and that flight is natural. Language is bound by rules and limitations, it is what allows understanding and a sense of wonder when the rules are broken and the limitations are forgotten. It is the artful manipulation of those limits which sets the cinematographer apart from the home videographer. It may be for these reasons that the visual language of digital realities will rely more on effects like Vertov’s “kino eye” and alternative forms of view manipulation like those used by magicians on stage and modern theater in order to artfully guide a user through a scene.

How will a viewer be directed from an establishing shot to the desired focal point of a scene in virtual and augmented reality? Obviously in virtual reality there is the “on-rails” approach and many VR narratives already employ this technique. This is one approach but is severely limiting in a medium which should seek to remove limits on the viewer. In addition this approach will not work for augmented reality since the user cannot be constrained in the real world. How then would you lead a viewer to the focal point for a scene in augmented reality? If this particular problem could be solved then the framework for a digital reality visual language can be set. Solutions include directing the viewer with a visual overlay or path, not unlike the subtle dodging and burning used in film to guide the viewer’s eye to the desired focal point of the frame. There is also the possibility for 3d sound which could enable a prop or character within the scene to call to the viewer in order to get their attention. In addition, the compositing of the scene could be controlled in such as way so that the element of focus could be the only element of the scene in color and the rest of the scene could be black and white until the viewer moved to the perimeter of the shot. Now the methods for exploration start to look more artful, similar to the fade from black of film, the iris close on the main character at the end of a scene used in film or the spotlight used to highlight the speaking characters on a stage. It could be said that the frame of the 2d film becomes the perimeter of the 3d scene. In this case then, the art of framing the scene for cinematography becomes the art of establishing the perimeter or possible perimeters for virtual and augmented realities.

What about the other affordances of database for digital realities. Film offers the affordance of recording ‘framed’ time. While reality has space, the time associated with it only moves forward. The history of the location in reality is buried or forgotten depending upon the viewers knowledge. Film’s linear cause and effect can be viewed forward and backward. Could there be a rewind for reality? Yes, and perhaps much more. Human culture is beginning to establish databases of time and space for large chunks of reality. Google earth, Google Street view and YouTube combined with cameras like GoPro are excellent examples of our human desire to record and catalog reality in as much detail as possible. One highly underused affordance of virtual reality and augmented reality in particular, is the ability to tap into the ever growing database of reality and use it to construct narratives. Let us imagine that the digital media revolution began when the film camera was created and that we have access to all the media created since that point in the form of a spatial database associated to a map of the world, similar to Google street view. The virtual Auteur could take an algorithm for artificial intelligence capable of parsing that record of space and time and use it to construct a narrative for an individual based on their current location. Obviously there is a lot of work that goes into the creation of this type of tool, but the building blocks are already there and the potential for narrative and visual language is enormous. This is one of the reasons I believe that the concept of virtual reality and augmented reality will eventually merge into one medium which is capable of both types of outputs. There are too many compelling uses for the database driven augmented reality narrative for it to play second fiddle to the less useful but more robust power of virtual reality. You could compare it to the difference between film in a theater versus the combined viewing of Netflix and Youtube cut into viewing segments based on sequences of actions and effects then re-sequenced into any combinative story imaginable.

The affordance of massive re-combinative media is extremely useful from a content perspective, but what does it mean for the proposed visual language of digital realities? These types of inputs become the foundation for rules that drive the compositing effects of digital realities. Modern film effects creation relies heavily upon 3d data. The same goes for digital reality compositing. In order to insert objects into a world scene, the compositor needs a 3d frame of reference. Taking geospatial positioning and databases of 3d data such as buildings from google earth it is possible to create the same types of compositing which occurs in film in augmented reality in real time. In addition using positioning, mapping, artificial intelligence and computer vision tools it is possible to find and track locations within the real world which map well with a director’s intended location for a scene. For example imagine that a scene requires the corner of a building and a side street in order to work well visually, as imagined by some AR director human or AI. The nearest best match for that requirement can be tracked and located through algorithms and databases, enabling other elements of the “visual language” to then guide the user to that location in order to lay the scene out around them and continue on with the story. Alternatively, the scene could be constructed to fit the surrounding environment or constructed completely in the case of virtual reality.


It should be noted that this essay makes many assumptions and chooses a very specific path to explore the perspective of a narrative within augmented reality. The reason for this choice is to push the questions and the answers for more complicated scenarios. The assumption being that if it is possible in augmented reality, it is easily possible in virtual reality since the author has much more control over the story world, and can craft it as specifically as a film director chooses sets and shots. The goal for both mediums is to consider the language of film and the language of new media in order to consider what hybrid child they might make.

There should be no question as to whether virtual reality and augmented reality are a powerful narrative medium. There are valid concerns and considerations for the medium’s approach to storytelling, these are no different than the initial learning steps which other mediums like film and video games have had to experience before they discovered their own grammar. The potential for the technologies discussed within this essay, specifically AI and its ability to drive this new visual language have far reaching implications for our culture as a whole. As it has often occurred in the past, new mediums come along and alter the way a culture consumes art and entertainment, which alters the way culture creates art and entertainment, which then goes on to create entirely new forms of media and new mediums to explore. It is easy from this viewpoint to imagine a future where a person can walk through a city experiencing a documentary about the buried history of the streets they walk on, not simply a linear narrative hard coded to tell the same story every time, but a narrative made different with every step and turn the viewer takes. A walk that starts as a trip down memory lane may end up as a dinner murder mystery or a walk through the park and a Shakespearian play like a Midsummer Nights Dream. We would not see them the way we experience these separate events today, but as one continuous flow of reality driven by the user and all the content at their disposal.

Writing about the hypothetical death of the author, Roland Barthes wrote, “We know the text does not consist of a line of words, releasing a single “theological” meaning (the message of the Author-God), but is a space of many dimensions, in which are wedded and contested various kinds of writing, no one of which is original: the text is a tissue of citations, resulting from the thousand sources of culture.” [ Elements of Semiology, Roland Barthes 1977]. The truth of these words can be seen across mediums and time as books become plays, which then become movies transposed upon the modern world which then become musicals or games, only to start the cycle anew. Creativity and imagination along with a standard vocabulary for visual storytelling appropriate to the medium are the only limiting factors preventing artful narrative within the realm of digital realities.