Published in


Stories in code

Why Are We Still Using 2d Cinema Cameras To Make Films?

In the world of responsive moving image storytelling, the 2d cinema camera is an expensive anachronism blocking the road

3d rendering of man walking along train track into the 3d world of a phone

A man went to see a certain Greta Garbo film more than a hundred times. When asked why he went so often, he explained that in one scene, Garbo undresses in front of a window, but right at the crucial moment, a train rushes by, obscuring the desired body from the viewer. “I figure,” the man explained, “that one time that train just has to be late.” (Gunning, 2014)

Of course, in conventional cinema that train will never run late because film is a static (and immutable) medium. In the 1985 film The Purple Rose of Cairo, director Woody Allan plays with the idea that a film exists simultaneously in film time and in the real-time of the audience. What would happen if the characters in the film could become aware of the audience and cross the screen divide? The main character in the film, Tom Baxter (Jeff Daniels) notices a young woman, Cecilia (Mia Farrow) who has spent the whole day in the cinema watching the film. After a brief conversation he surprises and shocks those on both sides of the screen by leaving the film and accompanying Cecilia on an adventure in the “real world”.

The idea that film was a step on the path to greater levels of expression and interaction, not an end, was expressed within the first few of decades of the emergence of the movie camera. Jean Cocteau stated as early as 1934, “A film is a petrified fountain of thought” (Cocteau, 1994).

The mind has already started to weary of a succession of flat images continuously unreeling on the old magic lantern screen and displaying the cross-section of a ghostly world, with the discovery of depth in noise and sound (minus, I might add, the element of shock), has been deprived of much of its charm (Cocteau, 1994).

Fourteen years later discussing the avant-garde, Astruc wrote of the camera-stylo, translated as the ‘camera as pen’.

It must be understood that up to now the cinema has been nothing more than a show. This is due to the basic fact that all films are projected in an auditorium. But with the development of 16mm and television, the day is not far off when everyone will possess a projector, will go to the local bookstore and hire films written on any subject, of any form, from literary criticism and novels to mathematics, history, and general science. From that moment on, it will no longer be possible to speak of the cinema. There will be several cinemas just as today there are several literatures, for the cinema, like literature, is not so much a particular art as a language which can express any sphere of thought. (Astruc, 1948)

If Astruc were writing today, rather than referring to several cinemas, he would be referencing several platforms: multiplatform, linked data, real-time technologies such as game engines, holodecks and metaverses. The notion of the camera-stylo could extend to the possibilities of a code-stylo or ‘code as pen’ to write images and sound to a multitude of mediums, for example like we write software.

Thinking about moving image storytelling in this way quickly reveals the limitations inherent in the aging technologies that currently produce video, film, and cinema. The prime suspect is, of course, the cinema camera.

Why? Because modern platforms require semantically rich structured complete data, and the cinema camera is incapable of generating this. Cameras produce rushes that, although they are digitised for post-production applications, end up as media assets in a format that is both unstructured (does not have a predefined semantically / narratively understandable data-model or schema) and incomplete. This gives rise to two significant problems using the data we call film or video.

The first is that we are drowning in video data. Every minute more video than any person could hope to watch in a lifetime is being published through YouTube, TikTok, created by CCTV cameras, dashcams and the like. Filmmakers, similarly, contribute a large amount of footage to this stockpile.

The problem of unstructured camera data is computers can’t read footage. This means the output from cameras is not indexable, searchable or accessible to people with disabilities such as blindness, people who don’t speak the language the film was created in or people who don’t have access to the device or medium to which it was authored. It can’t be used for any other purpose except as static video, and it can’t be delivered or viewed on anything except a media player.

One of the first exercises for post graduate computer science students in computational photography is algorithmically detecting editorial cuts, transitions etc. in film. Finding a cut is relatively straightforward, detecting a scene change through, for example, a fade, is still highly unreliable as this requires a level of semantic or contextual understanding not present in the data. There has been a lot of work done over the last few years using deep learning (convolutional networks) to detect and “read” film scenes. A large driver for this is the need to process CCTV footage, as too much is generated for a human to monitor. Some of the technology being developed raises concerns; you are no doubt aware of the concerns over facial recognition… The same approach also underpins research in autonomous vehicles, medical diagnosis (the success story so far), robotics, etc.

Film and video files, however, are not just sequences of images, they are ordered frames of meaning. The search for meaning is evidenced not only by our own encounters with cinema but also by the substantial body of scholarship that has arisen around it. David Deamer, for example, writing about Delueze’s works on cinema coined the term Cineosis:

Cineosis = cinematic semiosis: exploring Gilles Deleuze’s cinema-sign taxonomy, attempting to fill some silences and resolve some contradictions. Where the cinematic image is not just an image encountered in a cinema, but images that move in time on screen. Where the semiotic system is universal, but has an infinite number of possible outcomes, is a process, a semiosis. Necessarily, cineosis is but one way to encounter movement-images and time-images. (Deamer, 2018)

To suppose that we can come up with algorithms capable of reading images and collections of frames at this level and in this way currently feels out of reach.

As algorithms struggle to read film, they are also struggling to assist in the fight to save our rapidly decaying film archive. The erosion of film is a big problem for the film industry — archiving, preserving, and restoring old films. We are losing old films daily to natural decay. The British Film Institute (BFI), as one example, is fighting a monumental battle to save films. Working with Peter Jackson and the resources of Weta Digital, footage selected from WW1 archives and newsreels was restored and then colourized to make the 2018 documentary “They Shall Not Grow Old”. While the documentary is impressive, it only restored a tiny fraction of the archive, and the process was still very labour intensive.

The plethora of digital outputs and formats compounds this problem. How much have we lost in obsolescent formats such as VHS, Betamax and taped based, deprecated codecs? One innovative approach to this problem has been to explore encoding text, music, and images in DNA (Ailenberg and Rotstein, 2009). The assumption is that we will always need technologies to read (and write) human DNA, so if the data is encoded in DNA, it will never fall foul of obsolescent technologies. But this requires digitizing data to meet the specific requirements of the process. The technology is also so new that it is not yet out of the lab.

Cinema cameras (referred to as RGB cameras in computational photography) are essentially the same today as they were at the turn of the 20th century. They have been extended and lenses improved, but the fundamentals remain unchanged. By this I refer to the physics and maths that determine how a camera captures and stores light. A comment from one of my lecturers in computational photography has stayed with me. He observed that the mathematical model, referred to as the “pinhole model” (Forsyth and Ponce, 2015) is no longer adequate for addressing developments in 3d photography or photogrammetry. The pin-hole model is unable to capture depth information and it certainly can’t solve for occlusion (i.e., it can’t capture what it can’t see — the spatial dataset is always incomplete). There is a vast field of research trying to compensate for this deficiency, including light field photography, stereo pairs, 360 filming etc. But none of these resolves either of these problems because they rely on the pinhole model.

The newer mobile phone cameras (Samsung Galaxy, iPhone X+) contain depth cameras (infrared) that work in in conjunction with the RGB cameras to enable 3d scanning, depth scanning, augmented reality etc. There are interesting possibilities here, though we do still run into the limitations of the pinhole model. One intriguing idea is going beyond the visible spectrum to gather the assets for storytelling. An example of this from 2008 is Radiohead’s “House of Cards” music video which was “filmed” using a lidar.

Using a lidar to capture the data is a type of 3d scanning, which can capture a lot more data than an RGB camera. This leads to the second problem with RGB cameras: incompleteness.

Incomplete means that a camera can only capture what it can see: it can only collect data related to what it captures on the sensor from a fixed point. No other data from the surrounding scene is collected. This means that we are left with a framed fixed 2d perspective on the scene as well as fixed light and shadows e.g., time of day. This often results in a high error rate in post-production where cuts don’t match, light doesn’t match and the joins between visual effects and footage are noticeable. It also means that stories struggle to be anything but linear, precluding opportunities to explore and develop story forms for non-linear, immersive, and responsive storytelling.

So, given

  • developments in computer graphics, 3d, computing power to render “photo real” as well as other stylistic outputs,
  • the ongoing innovation and development of existing and new platforms and resulting mediums (VR/AR, new forms of live performance etc) and the inability of the output from cameras to be responsive to these,
  • the problems with indexing, discoverability, accessibility, archiving… arising from the closed and static nature of the film frame, shot, sequence,

why are we still using RGB cameras for story production? Part of the answer of course involves the costs of switching — the cost and availability of equipment, training and retraining the people and the mediums for distribution.

However, the cost of not switching will include an increasing loss of our history and memory through erosion of our film archives, continued problems for accessibility for those who use assistive technologies or don’t speak the original language of the story, an inability to discover and enjoy stories as they disappear into an exponentially expanding sea of unstructured and unsearchable data, and exclusion from the creative possibilities that code-stylo offers. And we haven’t even started talking about how camera-based production processes contribute unnecessarily to climate change…

Stories in code is made for sharing. If you enjoyed this article, please show your support by giving it a clap, following me and sharing with others who you think might be interested.


Ailenberg, M. and Rotstein, O.D. (2009). An improved Huffman coding method for archiving text, images, and music characters in DNA. BioTechniques. Future Science. Vol. 47 №3. pp. 747–754 [online].

Astruc, A. (1948). La Camera Stylo — Alexandre Astruc [online]. Available from: [Accessed 8 April 2021].

Cocteau, J. (1994). The Art of Cinema (trans., R. Buss). New edition. Bernard, A. and Gauteur, C. (eds.). London: Marion Boyars Publishers Ltd.

Deamer, D. (2018). Cineosis [online]. Available from: [Accessed 3 January 2022].

Forsyth, D.A. and Ponce, J. (2015). Computer Vision International Edition PDF eBook: A Modern Approach. 2nd edition. Pearson.

Gunning, T. (2014). ANIMATION AND ALIENATION: Bergson’s Critique of the Cinématographe and the Paradox of Mechanical Motion. The Moving Image: The Journal of the Association of Moving Image Archivists. University of Minnesota Press. Vol. 14 №1. pp. 1–9 [online].




where the future is written

Recommended from Medium

The Sandman Act I Review

Captain America: The Heart Of Marvel


Projekt Ilaiyaraaja Season III — 364/365 — Raagas in Background Score

Komal Jha (Actress) Height, Weight, Age, Affairs, Biography & More


Week 14 — Welcome to the Holidays

Conversation With Santiago Carrera

The Hauntological Folk Horror of Fanny Lye Deliver’d

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
F Bavinton

F Bavinton

Storyteller and technologist. Revelling in the heady mix of algorithms, film and game engines. I love telling stories with and about code.

More from Medium

‘Don’t Look Up’ Walks the Line Between Hilarious and a Dark Reality

Angel Z — Gavin London

Natural or Plastic Christmas Tree… Which is moree sustainable?

Succession — A Portrait of Capitalism