Making An Interactive Music Video With WebGL

Or how After Effects is for chumps

Update: You can view the full source code for the project, including instructions for setting it up locally, over at Github.

A couple of years ago, my band Brightly wrote a record called Beginnings & Endings. Glen Maddern and I, along with a few of our friends, built an interactive tweet-powered music video for the single Preflight Nerves, which we affectionately called Tweetflight.

It took searched out tweets that matched the lyrics of the song in real-time, and animated them against a backdrop of public domain and found footage.

It did pretty great, and we were pretty chuffed.

Tweetflight, a real-time Twitter-powered music video

Since then, I’ve been a bit obsessed with the concept of interactive music videos — inspired in part by projects like Arcade Fire’s The Wilderness Downtown and Rome’s 3 Dreams Of Black — and I’ve since put out a couple of smaller scale clips, like True.

When I moved to the UK and started to work on an EP, I had a vague idea for the design, and I wanted to step away from the muted colours that have accompanied our previous releases. I knew I wanted to put out something bold and bright to accompany the first single, I Will Never Let You Go, about the lengths we go to in order to save a failing relationship, a sinking ship.

I’d collected a series of watercolours, and I liked the idea of animating them into frame using a simple ink blot mask. I pulled together a quick example in After Effects, felt pretty great about myself, and then realised I had absolutely no idea what to do with it.

After Effects and an ink blot mask does not a film clip make.

I told Glen about it while were in a car, pre-dawn, driving to Golden Plains; a musical festival in Australia. I wanted to communicate the feeling of a rapidly declining relationship, that final break up over coffee, and the emotionally manipulative tug of war that accompanies it. The masks we wear and the way they deteriorate. He mentioned he’d been looking at WebGL, I mentioned the ink blot, and we started planning.

The initial concept was to build a realtime mask over any number of videos, so they could be interchanged without needing pre-rendering. We weren’t sure if it was technically possible, so we initially started playing with short square videos, overlaid on one another.

The more we talked it through, the more concerned we were about trying to synchronise multiple streams, so we decided to stitch them together into one giant one. We wrote a simple WebGL shader to display one area of the video at a time, and to do the After-Effects-style blending. It turns out, apart from a couple of sizing limits you should keep in mind (which is ultimately dependant on the graphics card), that approach works really well — and doesn’t need a particularly powerful GPU to run.

Exploring the frustration of singing in reverse by making a series of facial expressions that make me look like an idiot.

At the same time, I was experimenting with the video itself, creating short clips of myself singing the lyrics emphatically at the camera, trying to get across the core concept behind the track. It was awkward when Glen’s mate, who was staying with him at the time, walked into the apartment as I was midway through tearing at my hair, aggressively mouthing lyrics like some kind of confused wildebeest trying to eat its own reflection.

Initially I’d planned to mime the song backwards, in much the same vein as those bastions of culture and style, Coldplay, in their video for The Scientist. However, I learnt something deeply valuable while spending four hours in front of the camera. It is really, really hard to mime something backwards. Do not do it. Save yourself. Save. Yourself. Do not say I didn’t warn you.

So it was back to the drawing board. Inspired by the debauchery of Spender’s Never Again, I covered myself in make up, and over the course of the song proceeded to super-duper ruin it. I have a newfound respect for any and all who wear make up more than once in a blue moon, because mascara is hellish to take off.

At the same time I continued refining the user interface, eyes stinging, while playing with shaders via Three.js. Without exaggeration, shaders offer a phenomenal level of control over the output of WebGL. They run entirely on your GPU, which means you get a pretty ridiculous amount of grunt out of a browser. And that is not a sentence I ever thought I’d type back when I was trying to get pages to render properly in IE6.

Writing GLSL shaders is pretty metal. Well, as metal as writing code into a text editor can possibly be.

When you fire up I Will Never Let You Go, a bunch of things happen. The browser requests the video; which includes two separate streams and the ink blog mask, and optionally, access to the webcam. These get passed into a scene, along with an image; which creates the texture and colour, and two integers, which define which video frame is being displayed, and whether or not the webcam is currently visible. It’s a relatively chunky amount of information to flick through, but WebGL seems to handle it pretty gracefully.

Looking serious while my shadow also looks pretty serious. Serious shadow is serious.

The webcam video gets treated slightly differently by the shader. It is turned to greyscale to try and achieve a similar visual aesthetic to the pre-recorded video, which is in turn masked in real-time using the ink blot.

I used React for the interface, which manages the views and UI while passing states to the Three.js scene, which, as a component, is agnostic. It’s a pretty comfortable arrangement, especially when using Flux to dispatch events between each separate element. I managed the browser assets with jspm, and the build with Gulp, both of which I absolutely recommend.

The 3840 x 720 pixel video, which essentially powers the whole beast.

I focused on creating a sense of rhythm by timing the switching of the streams to the beats of the song — first in time with the pulses, then with the drum pattern. Alternating this against the webcam footage finally felt like everything coming together — like being there, in that moment, witnessing that scene.

But what’s the point? Me, in a different order? That’s not interactive. It’s a technical demo, at best. I sent it to a few friends to check out, and Will emailed me back.

“Oh wow that’s so confronting!!! :o
Is this something that could mash together all the people using the app at once into a mega-everyone thing? or just me and you (baby)?
You’re so obnoxiously talented.”

(I 100% didn’t need to leave the last line in, but Will is one of my absolute heroes, and I was so chuffed. Also, I apparently have zero shame. Go figure.)

But he had a point.

I’d considered the idea of doing something more communal, like sharing video been people watching, but with the app designed, built, deployed and CDN’d, I’d kind of had enough. I’d wrote the EP at Christmas, the first single still wasn’t out, and it was nearly September.

But the joy of refactoring is that it gives you a new enthusiasm for any project. The process of pulling together each thread to get the video live meant I suddenly had a finished thing. And a finished thing is a lot more fun to mess with.

I brought it up with Glen, and he mentioned he’d used Imgur for his typographic React-based project Typeslab (which is awesome). He linked me to some of the open-sourced code, and I started patching it together on a new branch.

The first test, my hand covering the camera because my hair was terrible.

I set it to surreptitiously capture an image from the webcam while the viewer watched, and added the option for them to approve it to be uploaded at the end of the video. After wrestling with the Imgur API for a little while, I had a functional prototype, and I marvelled at how awkward I can manage to look from any angle via a webcam.

And Glen made a really good point — people don’t move a lot while they watch a video. By capturing three images in tandem and stitching them together in canvas, I could recreate the idea of a moving image without the hassle of WebRTC or live streaming video or rebuilding Chatroulette from scratch.

So I did, and waited with baited breath for the result.

Upon reflection, it was not the best photo.
(And it was late. And I was in bed. And I don’t wear those glasses outside.)

Using the Imgur API I was able to store and display the images as they were captured. I rewrote the shaders in WebGL to manage the new element, marvelled at how effortless JavaScript feels in ES2015, and momentarily pondered how many dick pics I’d have to moderate if 4Chan got involved.

And in a homage to my love of Mario Kart and Patrick Swayze, I called them ghosts, because I guess they are. Little fragments of the people watching, flickering over and over again, like background noise.

It was a trial and error experience, each iteration building on the last to make something, hopefully, greater than the sum of the parts. It is a testament to the power of the web that these kind of experiences, once the bastard child of the Flash plugin, can be recreated natively with nothing more than JavaScript and a sense of adventure.

With the exception of iOS and its truly terrible WebGL video support.

I encourage you to try out Three.js, and to continue exploring the possibilities of the web. Oh, and to watch the video.


Watch the video

Thank you a million times over to Glen Maddern for all his help, and for being a pretty constant source of self doubt when I place myself next to his absurd genius. Thanks to Andrei Eremin for mixing, mastering and being a superhero. Thanks to Samara Clifford for taking photos that cover up how awkward I really am. And thank you for taking the time to check it out.

Brightly I Will Never Let You Go
Watch it here.
Buy it here.
Find out more about Brightly here.

The artwork for the single, which you can buy on Bandcamp.

If you want to get in touch, you 100% should, by email or tweet.
Thanks for listening.