Plastic ocean debris, virtual reality, and art: How to create a VR experience with A-Frame
By Dylan Freedman
In the deepest depths of Stanford’s unassuming McClatchy Hall lies a voluminous double-story multipurpose room. The auditorium, which hosts everything from video interviewing workshops to class presentations, was used on an overcast February day in 2018 for a talk by British photographer environmentalist extraordinaire Mandy Barker. The room, which had just been rebuilt to great fanfare, filled up to capacity with professors and students. Geri Migielicz was there, the former photo director for the San Jose Mercury News and now a professor with Stanford Journalism Program, who strived to push the boundaries of media journalism. She convinced most of her students to come as well in “Virtual Reality in the Public Sphere,” a nascent class focused on applying three-dimensional experiences to telling stories. As Barker flashed montages of ocean debris she captured from coasts worldwide on a large projector screen, an energy lit up the full room. Vibrant renditions of barely decomposed sea plastics contrasted against a space black background. Migielicz turned to her students with a spark in her eye, signaling with characteristically excited nods: Are you thinking what I’m thinking?
After the presentation, Migielicz elevator-pitched an off-the-cuff project idea to Barker to be carried out by a group of students in her class, myself included. We would attempt to recreate Barker’s artistic style in a virtual reality experience. The idea was to submerge the viewer in a 3D world, bringing to life Barker’s hauntingly beautiful marine plastic debris with floating animations and didactic narrations. We would collectively create a new kind of immersive environmental story — a story born from a serendipitous moment in a basement room when an internationally acclaimed visual artist agreed to work with overcommitted master’s students in the spirit of innovation.
From vision to reality
Our idea for the core experience was relatively simple. Barker’s artwork featured pictures of plastic debris overlaid at different sizes on a black background. To make this three-dimensional, we wanted to center the viewer in a solar system of debris soaring like orbiting planets. But where to begin? Building interactive experiences in virtual reality — or VR — is anything but simple, especially if you do not have experience with 3D graphics programming. The medium is nascent enough that there is no industry-standard tool for creating VR experiences, and there is practically nothing geared towards journalists with often limited technical resources.
As the engineering lead for this project, I wanted to create this experience so that it could be enjoyed across many different devices. People commonly associate VR with goggle-like headsets strapped around one’s head — like those made by Oculus, HTC and Samsung — but computers and cell phones are more than capable of showcasing VR content as well. By dragging around with your mouse or physically moving your phone around, you can explore a 3D scene. With these considerations in mind, I ultimately chose to work with a web framework created by Mozilla called A-Frame. Unlike leading alternatives, like Unity, A-Frame is built for the web on top of open standards. Rather than download the end result in an app store, an experience coded in A-Frame is as accessible as an online news article: simply navigate to the experience’s website on any modern device — laptops, cell phones, and yes, VR headsets.
The question now was how to get Barker’s 2D debris images into a VR experience. Barker provided individual image assets of each piece of debris used in four different scenes:
- Hong Kong Soup: 1826 — Poon Choi: a collection of debris from beaches in Hong Kong featuring many children’s toys
- EVERY… Snowflake is Different: a wide variety of all-white plastic debris collected from a nature reserve in England
- WHERE… Am I Going: a colorful assortment of balloon fragments and ribbons
- Soup: Bird’s Nest: old fishing line that has become entangled with ocean currents into creature-like balls
These scenes presented a diverse range of different objects, from solid geometric shapes to complex balls of string. We planned to convert the 2D images of these objects into 3D, so they would look seamless in VR. Initially we considered obtaining the original objects from Barker and scanning them using the latest in 3D tech, but we rejected that idea ultimately because of its difficulty. We then experimented with a now-defunct service called Volume.gl which uses AI to 3D-ify images — they provided an online demo that suggested that this would work, and it looked pretty convincing when we dropped an image of fishing line in. I also did some experimenting with free tools and found a YouTube tutorial to generate 3D objects from photos using open source software called Blender. This method was much more involved and required navigating arcane menu systems and hitting just the right options in the right order — but it worked! Even though the Blender technique was just mapping color data to depth, it produced very convincing results, likely because Barker’s images were of such high quality and so detailed in light and shadow to begin with. Adding in even this very rudimentary 3D method looked entirely convincing.
To automate the complicated process of making an image 3D in Blender, I dug into online manuals and learned to create a program using Blender’s custom scripting language. The program takes in a source image and produces a 3D model with a few tweakable parameters. We then organized an in-class workshop where I taught my classmates how to go through the process. Start with one of Barker’s images of a plastic object. Then, preprocess the image in Photoshop, removing the black background and splitting up objects if there are multiple in a single image. Finally, open Blender and use my script to render 3D models. We used Dropbox to store the shared asset files and employed a custom file-naming convention to keep track of the work. By splitting up the workload amongst eight, we were able to create over 100 3D models in the span of about two hours.
Designing the experience
Barker’s artwork is layered with pieces of debris at different sizes, manually positioned such that no piece of debris touches another. The crucial task: how do we arrange these individual pieces of debris in a 3D space, mimicking Barker’s style? Unfortunately, we can’t just use Barker’s 2D image arrangements and wrap them around in 3D. VR compositions take place inside a sphere with the viewer at center, like looking at a globe from the inside or being in a perfectly round planetarium. Taking Barker’s artwork and wrapping it around the globe would have the following problems: 1) it is not designed to repeat at the edges so there would be abrupt changes, 2) it would distort the piece like the Mercator projection of the world map, with the vertical edges stretched out unnaturally, and 3) we found the objects appeared oversized when this tactic was used. So what can we do now? One option is to manually place each piece of debris using 3D software, taking artistic liberties but trying to emulate Barker’s style. The other option is to take a programmatic approach.
Barker’s compositions evoke the cosmos because they resemble the chaotic beauty of stars in outer space: multilayered and with some kind of natural mix of dense and sparse regions. In the visual effects industry and gaming world, algorithms have been created to wrestle with seemingly random but aesthetically pleasing pattern generation — and it relies on the same principles. Take a noise function, a mathematical construct that generates what looks like a freeze frame of television static. Blend the output of noise functions at different scales, like a fractal, and something emerges that resembles a granite rock, a world with oceans and islands, or galaxies in the sky — depending on the parameters used.
With this fractal noise technique, we can produce a natural-looking heatmap that describes how densely to pack debris objects, and then randomly sample points to place the debris objects, favoring the denser regions. This would look pretty good out-of-the-box, but there’s still a crucial problem: we need to prevent debris objects from overlapping. What we want is a way to place objects seemingly randomly but without collisions or visible patterns, which is actually a hard problem in the computing world. But one elegant solution is found in Poisson-disc sampling, in which objects are randomly placed and kept only if they lie at some minimum and maximum distance away from all other objects. The algorithm continues until no random placement satisfies the constraints for some number of iterations.
Now we have the tools needed to pack debris in a way that evokes the heavens: layered noise functions and a good sampling method. We just need to combine the two. To gloss over the implementation details, I ended up using a multidimensional noise function called Simplex noise to produce a spherical heatmap, modifying the Poisson-disc function to work in 3D space at different densities, and using a recursive 3D collision-checking algorithm called an octree (a variant on the 2D quadtree). I also made objects in denser regions smaller. You can check out visuals of my implementation in more depth here (2D) and here (3D).
In the end, this method worked reliably to produce pleasing arrangements of debris objects. The end product uses this method on-the-fly each time, such that you will never see the exact same scene twice.
Bringing the experience to life
With the crucial ingredients of the experience in place, the months that followed were rife with trial and error. By this point, I had already graduated from my journalism program. My classmates were remotely fleshing out the narrative of the experience: storyboards, scripts and high quality voice narrations. I was struggling to make the experience smooth. It turned out placing hundreds of 3D models in VR space proved to be too performance-intensive. It lagged on every device. And the quickest way to break the spell of immersion is lag.
My first technique to reduce the number of objects needed was to composite the background ahead of time. For any given scene, I ran the aforementioned debris-placing algorithm several times, tuned to output really small debris objects. I took three-dimensional screenshots of these arrangements and then layered the snapshots on top of one another, allowing objects to collide since they were so small as to be unnoticeable. For each scene, the output of this process was a single three-dimensional image resembling stars in the sky. This image was used to paint the globe the viewer sees from the inside, called the “sky” in VR parlance. Rather than needing to keep track of thousands of tiny objects for the background, the experience just needs to render a single image for the background, making it much faster.
For the foreground layer, I rendered objects much larger, and thus needed fewer of them to fill the space. Unfortunately, the experience was still lagging despite only having around 100 3D models rendered. I achieved incremental performance gains by reducing the complexity of the 3D models involved, using Blender to simplify their geometry. But it still wasn’t enough. In the end, I removed the 3D aspect of the debris objects entirely, replacing them with 2D images that always face the viewer so that their lack of depth is never revealed. Surprisingly, this turned out to look just as convincing, likely because of how evocative Barker’s art is in the first place. This was my first foray into the rich blend of ocular psychology and algorithmic hacks that make rendering complicated 3D scenes possible in real time in the visual effects industry.
With just images in the foreground on an image background, the experience was nice and smooth. I implemented orbiting by rotating the background at a slow speed. It turns out too much movement in a VR experience, especially with hundreds of objects, can not just look unpleasant but can manifest in physical nausea. What I thought was slow orbiting was never slow enough. In the end, Professor Jeremy Bailenson of the Virtual Human Interaction Lab at Stanford gave great advice about motion in VR space, and I was convinced to make the orbit take over 10 minutes to complete a revolution. Interestingly, what appears to be orbiting depends on the device you use. On the computer, the background appears moving since the viewpoint doesn’t move on its own. In a VR headset, the foreground appears to be moving, an illusion akin to thinking your train car is moving when the neighboring train starts to move.
The experience at this point began to truly form. My classmates recorded high quality narrations according to the script we wrote together. We had an introduction, four scenes, a world map visual highlighting where the debris came from, an outro sequence, and credits. I wanted to add music to the scene, so as an amateur piano player, I recorded something on my digital keyboard. To reflect the layered style emblematic of the experience, I composited the audio of my piano piece with itself slowed down eight times. The effect is a surreal atmosphere that matches the visuals.
The experience was about 90% there at this point. But the remaining 10% took months to tidy up. I cleaned up animations, hooking in code to wait until images finish loading before fading them in. The world map scene was lagging a lot, since I was layering lots of images on top of each other to highlight countries in sync with the narration. I ended up recreating the entire scene in Adobe AfterEffects and rendering it as video, mitigating the performance issue. The credits text was made using A-Frame’s text component, but it was hard to arrange well, lacked symbols like the copyright sign, and also carried a performance penalty. I ended up replacing all instances of text with images of text, which worked perfectly. A bug drove me crazy for hours in which the experience had glitchy animations only on repeated viewings. It turned out that I was adding event handlers to the same elements multiple times, causing the same transition to be called multiple times simultaneously. Finally, for users with slow internet connections, the experience was timing out trying to load assets. I ended up configuring everything to be preloaded before the experience began and added a progress bar to provide a good experience for those with slow internet.
A-Frame is a remarkable software framework but is not without its hurdles. The main takeaways for me are that VR experiences should be optimized as much as possible. If something can be pre-rendered as an image, it should be. With the right techniques, you can still give the illusion of 3D without being fully 3D. Make full use of A-Frame’s asset loading system, and preload everything if you can. Though the framework is still very do-it-yourself and favors those who have 3D experience, it can be tamed with patience, browsing other people’s examples, and endless iteration.
Full source code for the experience can be found here: https://github.com/freedmand/rippleplastic