What We Don’t See

Published in

Artists + Machine Intelligence

6 min readApr 16, 2018

Look out the window. The day is gray and damp or bright, morning, late afternoon, Fall or Friday or Christmas. We see trawlers, skiffs, shells, and tankers. Sailors go by wearing loud foul weather gear, companions on deck. A technicolor houseboat is pulled to a new mooring, while a worn highway being dismantled somewhere in the city comes down the cut in pieces day by day.

Tour boats, duck boats, rent-by-the-hour boats. And there’s Google to your right…and there’s Google to your right…and there’s Google to your right…(wave, wave, wave). Lumbering flats slide through an extended view, crossing three panes of look-out, stretching slowly. A quickness at the end reveals tugs coercing dirt mountains downstream. To the left is the city, to the sea is everything else.

The boats have names: Knot in Kansas, Scorpio, Weekender, Love Hertz, Sea Life, Tyme Pilot. Call and response foghorn blasts precede each drawbridge passage for naval carriers that fully eclipse the view and sailboats that dart quickly through and on. We see a girl craning her neck, twisting toward the sun. This lasts a long time — 20 minutes or more. The relief when she rises to walk away fills the room with disparate deskside applause for a selfie so hard won.

Window washers dangle outside. A flock of ducks floats aimlessly, takes flight. Some days the light shifts quickly, even one frame to the next. The mundane persistence of this view doesn’t stop us from watching for quiet narratives, from seeing what will happen.

Google hired Andy Rogers, a photojournalist; Christopher Woon-Chen, a filmmaker; and me, a fine art photographer to teach an intelligent camera about photography. The first on-device AI of its kind, we filled it with directives — this is good, that is not. Yes, no, yes, no. Now, not then. We attempted to imbue this camera with the ability to anticipate “good” shots. To recognize familiar faces and pets. To trigger when detecting smiles, gestures, good lighting, eye contact, the rule of thirds, when to start looking and for how long.

We were given a corner desk spot overlooking a canal. Watching boats float down the waterway outside our office window, each vessel took on singular interest for the brief span before, during, and after cruising through our view. Using a faux television screen, I started recording and publishing these passages. With this, Yacht TV was born — a theatrical complement to the daily task of training an AI. We were devoted watchers, waiting for serendipitous boat blips to become unintentional characters, related to each other in a frame-bound story of perpetual forward motion.

The unbroken gaze can be a luxurious looking, or a test of will. Andy Warhol performed the act of focusing a camera on a subject for the entirety of one activity in his 1964 film Sleep, depicting over five hours of Warhol’s lover at rest. At an original screening in 1964, only 50 of the initial 500 moviegoers remained in the theatre at the end of the film. Others threatened violence. A few years later filmmaker Jean-Luc Godard’s single shot sequence in Weekend followed a traffic jam for seven minutes. As discrete dramas played out, the camera tracked steadily along. It blurred the lines of filmic fiction, giving viewers the impression they were watching life as it unfolded in time.

In 1968, the photographer Paul Fusco took hundreds of photographs from the inside of a train looking out. In The Train, we see images of people assembling along the tracks of a funeral train holding the body of Robert F. Kennedy as it traveled south along the Eastern Seaboard from New York to Washington D.C. The images bear witness to his progression down the coast, saying goodbye by watching. People in fields, at stations, on street corners, and train trestle hillsides look on. They gather in diverse groups specific to each passing locale to witness, to watch, to see.

These 60s-era conceptual experiments in duration resurfaced in the early aughts as endurance entertainment branded “Slow Television” by Norway’s national broadcast company, NRK. Depicting the 7-hour train journey from Bergen to Oslo in 2009, the show began as a way to commemorate 100 years of the Bergen rail line. Its surprising popularity quickly spawned many “minutt for minutt” offshoots, which included cruises, the 8-hour knitting of a full sweater, a fire made from wood chop to log formation, and various fixed views of wildlife.

Yacht TV is in the slow TV tradition, yet, rather than recording ad infinitum, it selects one discrete subject after another, yacht by yacht. Thanks to a collaboration with Google engineer Larry Lindsey and an AIY Vision kit, what began as a manual capture is now being produced by a machine that recognizes “boat.” Riffing on a movement that includes slow travel, film, food, radio, worship, and other savoring rituals, this is television as an activated medium, optimized for object detection.

Original Yacht TV video still, @toilntrubbl, Fall 2017

To automate the detection of a desirable image is to consider the nature of observation itself. What happens when we are freed from the randomness of glancing and instead are given a constant archive from which we choose moments to distinguish with our thoughtful attention? This automation of the gaze, built on expert human instruction, is a kind of collaborative perception.

Many years after Fusco’s RFK train series, Dutch photographer Rein Jelle Terpstra contacted the photographed onlookers who had held cameras. He requested their images from that day, which were long buried in 40 years of albums, boxes, and envelopes of negatives. Some spectators made train images only, while others documented their friends and family waiting alongside the track or pointing to things just outside the frame.

On one side of this two-part series is a vehicular gaze on an unfolding event: Quick punctuations of figures from various points of view create what curator Clément Chéroux describes as a “long, sad human chain that formed along the tracks.” On the other side is the inverse of these myriad gazes, looking back at machinery and documenting it. When seen together these co-images create a true mirror, weaving a collective event and narrative of time, movement, and witness.

To ride and watch, to go and look, to stand and witness. The act of observation is innately human; it begins and ends in the mind. And yet, our bodies and minds are not without a history of technological augmentation, including the recent iteration of an intelligent eye. Seeing is, alongside love, a supremely human endeavor, even being used as a metaphor for being deeply understood by another. The AIY automation of Yacht TV extends our view from an office window. What else might we do with personalized, creative AI that can augment our senses?

Engineer Larry Lindsey automating Yacht TV with an AIY Vision Kit and MobileNet Model

There is a beautiful simplicity in the regular cadence of observing the everyday, allowing mental narratives to unravel against a steady view. In the words of writer Robert Walser, whose practice was heavily contingent on long walks alone in the landscape: “We don’t need to see anything out of the ordinary. We already see so much.” While giving lessons to an intelligent camera in the hopes of creating something that resembled instinctive perception, we simultaneously enacted our own, learning about the mental structures we had built around seeing as we instructed how to replicate them.

In his 2014 New Yorker article heralding the slow TV age, Nathan Heller writes: ”Slow TV is high-definition in its visual information, yet it gets its meaning from viewers’ imaginative consciousness. As entertainment, it is backward; it appears to do its job by casting viewers into their own minds.” As humans, we instinctively engage with the natural world and are destined to continue extending our bodies through technology. Where we amplify ourselves in the boundlessness of watching, while excusing ourselves simultaneously from the mechanics of capture, we perceive alongside AI, not as separate beings but rather as dual remote prostheses, reflecting and extending each other. As we become interconnected with technology, we are not evolving and adapting, or being replaced, dominated or destroyed. We are being seen.

What We Don’t See

Written by Christiana Caro