What does it mean to tell a story in augmented reality?

10 min readJul 11, 2019

By Eric Howard and Stan Okumura

Hello! We are Stan and Eric, and we were Storytellers in Residence at McClatchy’s New Ventures Lab in Sacramento where we produced serialized, nonfiction stories using 3D AR for mobile.

Eric Howard, center, and Stan Okumura, right, walk in St. Paul’s Little Africa alongside community leader Gene Gelgelu.

In our experience American Food, available on Actual Reality for Android, foodies can explore the cuisine and culture of four ethnic enclaves across the United States: Little Saigon in Sacramento, Little Arabia in Orange County, Little Africa in St. Paul and Little Haiti in Miami. During our travels, we unpacked the question: What does a nonfiction story in AR look like? In this post, you’ll see how we iterated through the lenses of design, technology and journalism to answer that question.

Episode 1: Little Saigon

We wanted to create a format that could be used repeatedly to tell many types of stories that center around 3D and 360 video. Another goal of our residency was to build on the work of the New Ventures Lab tech team to develop workflows that would enable journalists to do run-and-gun AR in the field with quick turnaround times and clear production methods.

In our first episode, we tackled two challenges:

How do we design a user interface that’s easy to follow in this new medium? And could we capture 3D models of delicious-looking food in the field, quickly and easily?

User interface

After brainstorming and speaking with test users, we decided to create an interface similar to Instagram Stories, but we swapped photos/video for 3D objects in AR. Since AR and VR interaction is so new, we wanted to make the interface as easy as possible to use so the user could easily move forward through the story at their own pace.

*The first iteration of the UI for American Food is on the left. The second is on the right.*

In prototyping, we had initially tried to present our 3D objects and text on separate screens, showing the object and then add accompanying interstitial 2D screens where you could read a couple paragraphs about what you were looking at. The feedback from internal testing was overwhelmingly negative.

So, we decided to cut way back on the text and laid out a simple design in screen space where we could provide just enough information to move a story forward — while displaying the 3D content on the screen at the same time. For each new “slide,” that design consisted of a headline, a subhead and room for a small, 1–2 sentence nugget near the bottom of the screen.

We felt the text in the design was sufficient for framing the story, but it turned out that our biggest narrative win came in the field. At first, we were recording interviews simply to inform our written content, until it immediately became clear that we should use those sound bites to thread a narrative. While we’re aware of the oft-quoted stat that 85 percent of Facebook users watch videos without sound, we weren’t afraid to use voiceover. The consumption of AR and VR is still quite different from cruising social media and, in our onboarding, we explicitly request that the user turn their sound up. (If we had more time, though, we would have loved to subtitle all the episodes.)

Capturing food in 3D

The centerpiece of our story was food, specialties served on the table in front of a user in 3D. So getting high-quality captures of the dishes was essential. Given our mission, we needed to be quick in the field and in post-production.

We decided to use photogrammetry, a method of capturing a 3D version of an object using a series of photos taken from several angles. Through several rounds of practice, we were able to get the capture process down to about 3 minutes in the field for one dish. Our photogrammetry “dance” involved taking 200–400 photos from a vantage point where there was space on top/bottom of object, and then we went in closer for areas that they might have missed. We used natural light, so no lighting setup was needed.

After bringing the photos into Reality Capture, a software that compiles the 3D objects using the photos, we realized that this approach wouldn’t work for every dish. Food plated on shiny, white and transparent surfaces confused the software, leaving holes in our 3D images. Drinks in clear vessels were nearly impossible to capture. (In later episodes, we seek out lower lighting to reduce shine and experiment with 3D recreations for a glass dish.)

But for this episode, avoiding food plated on tough-to-capture surfaces was a constraint we were happy to live with, as we could include many signature dishes from Little Saigon and still tell a great story. We showed broken rice from the restaurant Bốn Mùa. 360 video orients the viewer and gives them a window into the look and feel of Stockton Blvd. And audio bites let us hear about the incredible journey taken by our subject Tido Hoang as he left post-war Vietnam.

Episode 2: Little Arabia

Making and serving the food, and the traditions practiced alongside dining (like smoking hookah), were key parts of our story. Right now, capturing action in 3D is expensive and best done in a studio, not in the field. We used 360 video to go behind-the-scenes in the kitchen, but were looking for another solution. How could we easily capture and display actions in our AR story?

An answer to that question came almost by accident: While shooting in Little Arabia in Anaheim, Calif., we struggled to explain to our subjects exactly what AR storytelling is. The owner of a restaurant called Koftegi kept requesting we shoot video of his different cooking techniques. So, we did what was asked of us and shot video while assuming we’d never use it.

Once we were in post-production, the utility of the footage became clear. Though we had decided early on that we didn’t want to use video as we felt it would be a crutch to our 3D experience, playful GIFs using just a few seconds of video felt quite in line with the social media-inspired design of our format and gave us an opportunity to show the real-life motion of our characters. (We’d later try another approach to capturing action. Check out Little Haiti below.)

Where would those GIFs go in the experience?

We already had some feedback to include more information about our food assets. So our solve for both was to add 3D text and GIFs triggered by a gaze detector. When aiming your camera at a certain part of the dish, you get a delightful bit of text or a moving image to give you more information. It’s a subtle way of rewarding the user for exploring and it’s a passive interaction you can learn without a tutorial.

Additionally, it’s an optional interaction that reveals content that is secondary. You can spend more time to learn more information or you can skip on ahead.

With the inclusion of our new pop-ups, we were able to tell the story of a diverse community unique to America. While the 3D food is varied from meats to pastries to dumplings, the pop-ups reveal many shared and overlapping histories. For instance, what you couldn’t know just by looking is that the drink served next to the Syrian kibbeh is actually Turkish coffee, now clearly marked by 3D pop-ups.

We found that mixing of cultures to be a large part of Little Arabia’s charm. Within a few blocks you can try food from Lebanon, Syria, Turkey, Palestine, Egypt and other countries. As activist Marwa Balkar says, “There’s a kind of unity in Little Arabia that doesn’t exist in the actual Middle East. …That’s the beauty of it.”

Episode 3: Little Africa

Explore the drone-captured city blocks from Little Africa on Sketchfab.

We had longed for a 3D equivalent to a film standard, the establishing shot. We captured some facades using photogrammetry in earlier episodes, but they lacked life and also context as many were connected to much larger shopping center buildings we cut out of frame. In our pre-production, we saw that Little Africa in St. Paul is made up of old brick buildings, rich with character. So, we brought our trusty drone pilot, Jayson Chesler, to fly overhead and capture entire blocks with a few dozen photos per building to combine with our ground level photogrammetry.

The result is an opening scene featuring three different buildings placed like doll houses on the table before you. Instantly, the viewer gets a window into what it’s actually like to show up to Little Africa.

Also, we got some local musicians’ tracks. With that music and the buildings, it was a deeper experience than just hearing the people and seeing the food.

In a way, our goal of scene setting complements our main subject Gene Gelgelu’s mission of “creative placemaking.” Before his organization created the business district “Little Africa,” there was no destination. There were just a few markets and several Ethiopian, Eritrea and Somali restaurants within a mile or so of each other. Similarly, our new establishing shot makes a connection between what could feel like a random assortment of 3D dishes.

Episode 4: Little Haiti

We love getting feedback on our work. But it’s sad to hear notes you know you can’t act on. And a regular note for the first three episodes went something like this, “Why don’t the people move? It’s awkward that they just stand there like statues.”

We had grown comfortable asking our subjects to remain totally still while we took several hundred photos of them to make 3D models using photogrammetry. And getting video for the GIFs was a regular part of our production process. But we made an incorrect assumption about the maximum amount of time and effort you could ask an interview subject to give up and that there would be no practical way to capture their movement in 3D.

We had considered a couple technologies out there for 3D motion:

The first option was Depthkit, a software that combines motion and depth to create 3D video. But it didn’t match the aesthetic of our series. Additionally, it requires calibrating a sensor and setting up (and lighting) a green screen in the field. Since the field typically meant a small, busy restaurant, we didn’t find Depthkit feasible.

The other option was to use a Microsoft Kinect motion sensor with iPi software in order to do motion capture. All it requires is a small bit of open space that doesn’t have a ton of natural light.

After continuing to hear negative feedback about our static scans of people, we tried a couple tests of MoCap in the lab. We realized we could capture the motion data in almost any interior with about 10 minutes of setup and 5 minutes of recording time.

And, to our surprise, all four of our subjects in Little Haiti, Miami, were 100 percent open and willing to participate. To get their motion data, subjects had to start in a t-pose and then act out a few basic movements like standing in place, waving, gesticulating a conversation and anything else that suited their personality. It was not a challenge, which shows significant promise for the idea of continued AR journalism.

Back in the lab, we combined the motion data with photogrammetric models of the people. In one case, it turned out that we didn’t get what we needed in the field so we used motion data from an animation library, Mixamo, to add life to one of our models.

Almost every scene in the Little Haiti episode comes with a human subject in 3D with motion. They add energy to what could otherwise be stilted and static AR. We’re happy about this because that energy absolutely represents what we were met with in Little Haiti. For instance, Jean Cidelca, the tour bus driver there, is constantly playful and welcoming. We think it’s a little more honest to show him inviting you to hop on his bus, rather than just to hear him talk about it.

Did we accomplish our goals?

Re: story.

Like we mentioned, the series was designed to “explore the cuisine and culture of four ethnic enclaves across the US.” Just as any storytellers covering travel would hope, we wanted to provide a vicarious experience for you to enjoy these places as if you were there yourself.

In our first episode, we struggled to bring you there. We didn’t know how to breathe the same life and vibrance into our story that we felt during our visit. Over time, though, we think we got closer to that goal. By the time we shot Little Haiti, we were confident in our production methods and we think it shows. But only you, the consumer, can be the judge of that.

Re: everything else

We’re confident we created something that is accessible to people who have never used AR. We started to carve pathways for AR journalists on a fixed timeline and limited budget. And we made a framework accessible to the technologist whose job is to support the process.

And an added plus? This experience is one more instance where the ever-critical tech consumer can ask, “Why AR?” For us, the experience of seeing the food right on your own dining room table is a unique one. Additionally, being able to navigate around it by moving your phone and your body is completely special. You can’t investigate an object that way in your web browser or in a console, nor would you likely reach for a headset to experience a several-minute news story such as this.

Please let us know how we could have done better or if you’d like to know more about how we did this.

-Stan and Eric

Download Actual Reality for Android to experience the entire American Food series.