A Glimpse into a Screenless, Audio AR World

Lessons from building the world’s first VPS-powered Audio AR experience in Dubai

Sheng Huang
Sturfee
Published in
6 min readJun 22, 2018

--

“I’m scanning”

That’s the cue my AR headset gives when it’s checking out the world in front of me. After a brief moment, the machine voice responds:

This massive structure you are looking at with the huge images of Dubai’s rulers is the Gate Building. On the right, it is the late Sheikh Zayed who is considered as the father of the nation.

I take a look at some of the other buildings nearby, press a button on the device to get similar info.

Are you seeing that giant blue banner showing Masafi mineral water? It is on the tall yellow building. Masafi is a village up north of the UAE.

The camera is able to determine that you’re facing a particular building to deliver recommendations. You can see the full video here.

This entire time, I’m completely engaged with my surroundings and never once look at my phone for info. To be honest, it felt a little weird at first because I’ve been trained by my mobile device to reach into my pocket, unlock the screen with my thumb, find and fire up Foursquare, type in a search query, scroll through the results, and tap the map to see where it is.

That’s at least six steps.

What’s striking about the AR Audio user flow is its simplicity. Once the device was worn, I only performed two actions to get the information I needed — look at things that pique my interest and tap a button. Having information on demand so immediately while remaining connected to my environment gave me a taste of how augmentation technology allow us to engage and learn about the world, completely unrestricted by the current screen-based paradigm.

What we did in Dubai

Dubai is the 4th most visited city in the world and home to the world’s busiest airport. It will also host Expo 2020 — a 6-month, $7B event showcasing the best cultural and technological highlights from over 150 countries. We were invited by the Dubai Future Foundation and Etisalat, the UAE’s largest telecom company, with the challenge of transforming how 20 million visitors each year engage with the city through Augmented Reality.

Our time in Dubai began with intense brainstorming sessions around providing a frictionless visitor experience. This is what 10+ people locked a room for a few days looks like ;-)

We wanted an immersive, hands-free, screenless experience, but also knew that today’s HMD and AR glasses just wouldn’t meet the expectations of the consumer tech market, at least on the visual display front.

But what if AR wasn’t just about visual overlays and we could derive just as much value from hearing the same information?

After all, it’s not that much of a stretch from our current daily behaviors. Airpods and wireless headphones are widespread and we’re already listening to podcasts, driving directions and converse with our seemingly ubiquitous AI assistants.

We’re already relying on audio for everything from music to podcasts to navigation directions. Audio AR kicks it into hyperdrive

The key was to first find a camera device that roughly faces in the same direction as your gaze. This naturally made existing AR glasses good candidates given the camera literally sits on your face. Second, we had to make the camera “smart” enough to determine your location (called “localization”) and understand what you’re looking at.

After experimenting with several AR wearable devices, we finally went with the Vuzix M100* for its simplicity and ability to be clipped on existing eyewear, like Anil’s sunglasses. A close second would have been Vyoocam or Orcam, but they weren’t production ready at the time of our demo.

The Vuzix M100 headset attached to many existing sunglasses and looked pretty sharp to boot

For the AI part, we used our in-house secret sauce: the Sturfee VPS (short for “Visual Positioning Service”). You can read more about the technology here, but it involved generating camera recognizable signatures to localize and measure the gaze direction of your AR wearable device using visual cues. These signatures are the backbone of our VPS system to localize cameras. Signatures are generated by a deep learned computer network that takes in data about the location including satellite imagery, building models, road networks, and terrain details.

VPS is like a visual GPS, but rather than a chip that receives satellite data, it’s an AI cloud service that uses visual data inputs to let cameras provide accurate positioning and contextual information. In other words, VPS lets our smartphones and camera devices “see” the world to determine its location and provide real time 3D measurements of its surroundings.

Our VPS system allowed AR devices to have: 1) precise read on a user’s location (especially compared to GPS) using computer vision, and 2) comprehend the full geometry of the buildings in view and pull relevant point-of-interest information, such as historical trivia, hotel bookings, event listings, and if your buddies have checked in there before and left reviews.

New ways to tell stories tied to places

Dubai was great learning opportunity for us in the nuances and challenges of making compelling Audio AR experiences — such as the power of content and conveying the right information to the user in a natural, human-sounding manner.

This is where creative developers and location-based content producers can use VPS and audio AR to unlock useful new ways for people to connect directly with the world in front of them without looking down at a flat 2D map.

If you’re a local expert, it offers a whole new way to tell stories from your vantage point. Use AR flags to mark waypoints, and when users arrive, highlight entire parts of the street to direct their gaze. When they point the camera at the scene, your voice narration can kick in to tell your story about the place.

This is just the beginning of the many possibilities we can do with AR Audio. For starters, we can replace the one-tap trigger with a voice command similar to how we currently order Alexa around at home. Further down the line, we may even eliminate that step altogether as our AI assistants gets smarter and predicts what would interest us based on our behavior and context.

AR Audio on Smartphones

And what if you don’t want to use $1000 AR glasses? The smartphone in your pocket is the next best thing and adds a visual component to the Audio AR experience. Imagine having serendipitous adventures by walking down Manhattan alleys and tapping on interesting shops in front of you to hear their Foursquare reviews. Or saving a “3 days in Seattle” itinerary from TripAdvisor and opening the app to see all the attractions highlighted in real time AR.

This is an AR Audio demo we made for smartphones. The user taps on any building on screen to hear more interesting info

The power of VPS is its ability to deeply immerse us with our surroundings. We believe in a world where we no longer have to stare down at our screens while walking, and instead listen to interesting stories about places just by looking around.

If you’re a developer or company that creates or curates local content, let’s connect because we can’t build this world without you. If you haven’t already, check out the rest of the Dubai Audio AR demo here.

*Vuzix sent us their Blade AR glasses for future development, which is a great fit given its form factor and Alexa integration.

--

--

Sheng Huang
Sturfee

Head of Biz Ops @ Sturfee. Ex-Niantic Labs + Google. Learn, plan, execute. Reflect and repeat.