Escaping the Box

The future of visual storytelling on the web

This article originally appeared in the April issue of Inside the Story Magazine, a quarterly digital publication for people who take their storytelling seriously.


The numbers are almost impossible if you think about them long enough. According to YouTube’s official count, 72 hours of video are uploaded to its servers, every minute of every day. That’s the equivalent of 33 movies. Every minute.

Add that statistic to this photograph published by NBC in March 2013 and you start to wonder whether the human race’s favourite activity in the 21st century is filming other humans and sticking it on the web. The viewing figures for online video are huge too, leading many to see video as the zenith of what can be done on the web.

But video has a problem. It’s a problem inherent in its form, and at the same time it’s infecting how we approach visual storytelling on the web. Video’s problem is that it’s a prisoner inside the rectangle box of the video player. Videos are large, self-contained elements which means they can’t be changed, they can’t respond dynamically to users and they can’t be searched by Google spiders. This has led to video being described as the “black hole” of the internet.

“Video is the gorilla in the room” argues Cody Brown, founder of ScrollKit, a new platform which wants to make the web “more cinematic”. “If you are compelling on video, you will rule pop culture. What is pop culture and what is web culture is getting more and more unclear. There are a zillion ways to make video more web-native and people have been pushing that forward ever since they had the bandwidth.”

Brown is one of a number of entrepreneurs, filmmakers, designers, developers and journalists trying to change visual storytelling on the web. ScrollKit, a visual editor he has created, lets users build a web page in the same WYSIWYG way one might make something in InDesign. This, he hopes, will let us approach the web more visually, and perhaps even make things more quickly. He says he’s been able to use ScrollKit to build a replica of The New York Times’ lauded SnowFall piece in a matter of hours.

Scrollkit’s replica of The New York Times’ Snowfall project

“Less is wrong about how we are visually consuming the web and a lot more is wrong with how we are making it” Brown tells me. “And the problem with how we are making it is that we aren’t making it in a visual way. The overwhelming majority of people fly blind when they make content on the web. They fill out two forms, a header and a footer, then preview it in WordPress when they’re finished. Imagine being a painter and not being able to see what happens as you wet your brush and slide it across your canvas. This is the kind of fundamental problem we have now.”

Black gold boom

New approaches like this bring up difficult questions about how we think about video and other visual elements on the web. Rather than being something new and revolutionary, as so many would have it, web video is worryingly old-fashioned. We are making films (and even calling them films, often) in the same way as a cinema or television director would, except with less money. The only difference is we upload our file to YouTube instead or projecting or broadcasting it. The product is still the same.

In other words, web video is still made like television, and it isn’t new at all. This is a problem because there’s no reason television ought to work on the web: it’s simply not built for that.

So where does thinking differently about video on the web begin? According to Jesse Shaplins, one of three founders of the interactive documentary platform Zeega, it starts right at the definition of what video is.

“I think what’s important is to distinguish the difference between video and moving image. I think motion is a really important part of creating compelling interactive experiences…” he says. “Shorter, more animation like video is the type of video experiences that maybe we should be thinking more and more about. Vine, for example, is a really powerful and interesting medium. It gives you a very focused constraint around what is effectively a video but it’s limited to seven seconds, it’s a looping animation.

“The reality is that long-form video is a mode of consumption that is slightly at odds of the trend of how people are increasingly consuming media, which is on their mobile phones. So I think that we’re at an interesting juncture here, and I think it’s about asking ‘What is the context you imagine people consuming your work?’”

Black Gold Boom on Zeega

This bottom-up approach is clear in the platform he’s building with Kara Oehler and James Burns. Founded last year, Zeega lets users create interactive experiences using simple online tools. They make the most of the screen and some of the more ambitious projects, such as this webdoc about North Dakota’s oil industry give us a vivid glimpse into the potential of the webdoc form.

A manifesto for the future

What is a webdoc, anyway?

In October 2012, a group of filmmakers, developers and journalists - including the founders of Zeega - met at the Mozilla Webmakers conference in London and attempted to figure that out. The result was the Webdoc Manifesto, a living breathing document, hosted on Google Docs and open for anyone to contribute to. It’s raw and messy, but some of the ideas on its pages start to give us a glimpse into the future of factual storytelling on the web.

“We hold that web documentaries are interactive” it begins, “though the interaction does not necessarily have to occur between the user and the the documentary, rather the documentary can interact with reality using the networked technology of the web.”

It describes documentary on the web as being a process, rather than a finished film, and “something that is used as much as it is watched.” Perhaps freeing video from the rectangle also creates the chance to build a much less passive experience for our audiences.

Lost and found

At the same time, video isn’t the only way to create an engaging audio-visual experience on the web. Photofilms have been popular for years, and now these too are being freed to live dynamically on the web.

An early breakthrough in this concept came from a team at NPR in Washington DC. Reporter Claire O’Neill collaborated with developer Wes Lindamood in building Lost and Found, a multimedia package about the discovery of a once-lost photo archive. The piece is effectively an audio documentary, complemented with dozens of recently discovered photographs from the archive of Charles Cushman. O’Neill trawled through thousands of photographs and selected the best.

Lost and Found by NPR

“From the beginning we knew this would be an audio-driven story and we thought a lot about how visual material could augment and complement that story” says Lindamood.

But this is no ordinary audio slideshow. The webpage is powered using Mozilla’s Popcorn javascript library alongside JPlayer, another lightweight file which plays media on web pages. Using these open source libraries, developer Wes Lindamood was able to trigger animations on the page as the audio file played. The audio slideshow is no longer trapped inside a video player. Every one of the pictures in Lost and Found is a dynamic image which can be updated or downloaded, and it’s all on a web page that is viewable on any kind of device.

“For me video can only be that single object, and you can share that object in different ways, but it can never be disaggregated” says Lindamood. “Breaking out of a video file allows us to explore and create something more collaborative and that’s something that we’re definitely interested in. Also you can disaggregate and share individual pieces of it…We can actually recombine elements in different ways, so just having that flexibility of having all these malleable pieces I think is a real advantage of this kind of approach.”

This flexibility could mean stories could be dynamically updated even after publication, and can be shared much more easily. Data in the documentary is available to Google’s spiders and it even creates a level of interactivity in the story.

O’Neill agrees that making the moving image more dynamic is a step forward. “The biggest challenge was the most exciting, which was trying to think how to do this differently…just trying to break the mould a little bit of the video template. Clearly we didn’t deviate that far from it but I dunno, we succeeded by not having to build it in Final Cut, which was exciting to me.”

Rethinking the web

These projects are baby steps into a new way of thinking about how to tell stories visually, embracing the native abilities of the web. And it may be a long time before more of us are able to wrap our head around the idea that visual doesn’t have to mean video and video doesn’t have to mean a 16:9 embed on a webpage.

The pioneers, asking the difficult questions early on, are optimistic about the future. “My hope for the next 10 years is that they are as exciting as the past 10 years” says Cody Brown of ScrollKit, a sentiment echoed by the others.

“Video’s a really passive experience you click play, you watch it for however long it is, and then you’re done and you go onto the next thing” says NPR’s Claire O’Neill. “What interests me in storytelling is engaging the audience and the user, because we have these devices that encourage us to be tactile and I think that territory is really exciting and that’s where I’m trying to push the envelope.”

And from Zeega’s Jesse Shaplins, a recognition that there’s a long road ahead. “I think we’re still very early in terms of what are quality storytelling experiences, that involve interaction. My hope in 10 years is that we see a legacy already established of some genres and formats that have real value for people…One of the beauties of film is that it’s something that millions and millions of people have experienced. I don’t think we are yet there with the web as a format, where millions of people have experienced really quality stories and I’m really excited about that potential for us to think about the web again as inherently an interactive audio-visual medium.”


Issue 3 of Inside the Story Magazine will be published in July 2013.

Next Story — How to make anything fascinating
Currently Reading - How to make anything fascinating

How to make anything fascinating

A demonstration of how careful storytelling can make education, journalism and information more compelling.

Storytelling, once all the pretentiousness has been stripped away, is little more than information dispensing.

A story is a vehicle for dispensing units of information from one mind to another, and the storyteller’s job is simply to choose what units are dispensed when.

Which makes it all sound rather mundane, as I think it ought to be.

In journalism, documentary and education, we are not very sophisticated in our dispensing of information. The standard practice is to give all the information to the audience quickly and clearly, ordered from most important to least, or to start at the beginning and work your way forward.

And yet there is enormous potential to make even the most mundane and complex of ideas fascinating and engrossing simply by being more sophisticated in the dispensing of information.


Analysis of a single scene

By way of example, take a look at the first 35 seconds of this film I recently produced with Fusion.

The first 35 seconds make up effectively a single scene with a single idea. If we were to summarise all the information given in the scene at once, we could say:

“On the 6th of August 1945 a Japanese man called Tsutumo Yamaguchi was walking to the Mitsubishi plant in Hiroshima when he was hit by the first atomic bomb. A series of events made him unexpectedly late for work, so he was away from the epicentre.”

Saying these words out loud this way would cut the length of the scene down from 35 seconds to about 15, but blurting out all the information in a single breath is not good storytelling.

I want to show you how it is possible to squeeze emotion, suspense, surprise and audience engagement out of a story, simply by being smart about how you dispense your information.

Let’s start with the obvious but overlooked singularity of the audio-visual medium: it uses both audio and visuals to dispense information. The relationship between words and pictures is a book in itself, but for now, just notice how some information is given by words and some by the pictures, and some by both.

The words do most of the lifting, as this scene requires it. Here’s the first sentence in the script:

“There’s this famous story — you might of heard of it actually — about a Japanese man called Tsutumo Yamaguchi.”

Rather than dispensing all the units of information at once, it dispenses just three pieces: a famous story, a Japanese man, and his name. The ‘you might have heard of it actually’ line might seem superfluous, but actually it is setting up a thematic pay-off in the last 30 seconds of the film.

Next, we reveal two more pieces of information:

“He was working for the Mitsubishi company and on this particular morning he was on business in the city…of Hiroshima.”

We are now 17 seconds in and we have been told very little. Think about all the information the audience has not been told by this point. On top of this, the visuals — of an unknown city seen from the air — do not even seem to match what we are being told.

Towards the more journalistic end of the spectrum, this is considered bad practice, as there is a risk of confusing, and then losing, your audience. You’ll notice how other video makers, from Buzzfeed to the BBC, prefer to use literal images which simply mirror what the narration is saying.

But what do people do when they’re given a few pieces of seemingly unrelated information?

They guess!

Human brains are connecting machines and can’t help it. Good storytelling exploits this in order to draw the audience into the story. By handing some of the narrative responsibility onto the audience you are asking them to participate in the storytelling.

I can make a safe assumption that when I say the name ‘Hiroshima’ most people will immediately think ‘atom bomb’. I know I only need to dispense this one unit of information for the audience to be able to guess what is going to happen.

As this unit of information is more important than some of the others, it is delivered simultaneously by both words and pictures. This is a technique I call ‘double-punching’ and it’s a useful way to relay a hierarchy of information to the audience.

And with another sentence we have two more pieces of story information.

“It was the 6th of August and he was running late for work.”

Having launched your audience’s guessing machines into action, it is important not to give the rest of it away too quickly. The guessing is the fun part for the audience, so you can really engage them by refusing to confirm nor deny. So I tell you the date, but not the year, which would give too much away. History nerds will know for certain, but others will still wonder if their guess was correct or not.

At this point, the pictures, which have so far remained slightly mysterious begin to pay off. We hear a mechanical noise, and then appear to start to fall to earth.

It is not immediately obvious what is going on. This forces the audience to try and solve a puzzle once again. Our brains look at the clues we’ve been given…Hiroshima…famous story…August…falling….

Working together, the pictures and words relayed the information, not obviously, but through piecemeal clues delivered sparingly.

The final words of the script go like this:

“He’d left something important and had to run home to pick it up. Then the owners of the boarding house he was staying in offered him some tea, and so out of politeness he sat down with them, making him even later. Then he got on a bus and then a streetcar, and he was still walking to the office when..”

Alfred Hitchcock once gave a great definition of the difference between suspense and surprise.

The falling bomb in the Hiroshima sequence exploits Hitchcock’s idea. We have figured out an atomic bomb is about to explode. But instead of letting it happen, I slow down time and begin to talk about the mundane details of Yamaguchi’s day. Does it matter that he got a bus and then a streetcar? Of course not, but delaying the inevitable racks up suspense.

The images and the words work together in harmony to squeeze tension and emotion out of the sequence. Note that this emotion does not exist naturally in the information, you have to work to create it.

This particular arrangement of story information, dispensed carefully like this, raises the inevitable question in the audience’s minds: if Tsutumo Yamaguchi is about to be hit by an atomic bomb, will he survive?

Which is the most important question any storyteller wants their audience to be asking: what is going to happen next?


In 35 seconds of screen time I have delivered a short paragraph of information. But I have injected suspense, surprise, mystery and emotion into the delivery of the information so that it becomes much more than the sum of its parts.

More importantly, the audience themselves have participated intellectually and emotionally in the telling of the story.


Update August 2016: a video version of this breakdown is now available to supporters of my Patreon page.

I am thinking about doing more breakdowns like these, to share the techniques of story design that I have learned over the last few years. Please hit recommend if you want more!

Adam Westbrook is an independent video producer. You can find out more about him on his website.

Sign up to continue reading what matters most to you

Great stories deserve a great audience

Continue reading