Engineering at Detour

originally published August 30th, 2014

Almost every app is built around the same part of the iPhone — its screen. We wanted Detour to be deeply immersive, however, and felt that every glance at the screen would take away from that. The decision to de-emphasize the screen resulted in a number of fun and unusual technical challenges. We are taking advantage of many of the other aspects of the iPhone so the listener can keep her phones in her pockets, giving her the feeling of being the protagonist in a movie, or the main character of a video game set in the real world, but transformed through music, sound effects, and narration.

To pull it off, our engineers work hand in hand with our designers and writers. Over the coming months, we’ll blog about some of the engineering challenges we face, including the following:

Audio that maps to where you are

Detours automatically link audio to the listener’s location, so everything triggers at the perfect time and place. A typical hour long Detour might have fifty location sync points. They might be a street’s width apart, or indoors, or up a flight of stairs, or in wide open places where you are free to explore at your own pace. The more precisely we can locate the listener, the more magical the experience.

GPS is a good start, but in certain locations, it can be off by more than a city block. To compensate, we’re looking at other signals — iBeacons for added precision, the phone’s accelerometer to detect movements and steps, the magnetometer to know which direction you’re facing — and we fuse them together based on data we’ve collected for every user on every tour. That means our location precision is constantly improving, even as GPS remains unreliable. In a future post, we’ll share the frameworks, data, and tools we’re using to build our algorithms.

Taking a Detour with your friends

We want Detour to be a great social activity. The worst thing about museum tours is that they’re socially isolating. Everyone listens to their own audio stream at their own pace, which means friends end up in completely separate areas, never interacting with each other; it’s the opposite of a shared experience.

We’ve solved that by building group audio sync, which turns Detour into something that’s extraordinarily fun to do in small groups. Everyone hears the punchline to a joke or gets startled by the sound of sea otters behind them at the exact same moment. Any user can pause the Detour for the entire group, or trigger new narration by walking through the next trigger point.

This seemingly straightforward feature was tricky to implement. We use bluetooth low energy, peer-to-peer wifi, distributed triggering, a redundant server relay, and some audio processing magic to keep the phones in sync. None of these channels are 100% reliable thanks to noisy radio environments, bad line of sight, crappy cell connections, reasonable limits to battery usage, roaming, etc, so we have to use multiple approaches in order to adapt to missed messages. Perhaps the coolest trick is imperceptibly adjusting playback rates for group members audio streams until their audio is perfectly re-aligned. The result is nobody misses a single word, and the tech stays completely invisible. Here’s a quick demo, with more detail coming in another post.

What if I’m a fast or slow walker?

The audio engine we’re building for Detour has separate narration, music, ambience, and sound effect tracks, each of which can be triggered independently or aligned at just the right times. This lets us create rich, immersive soundscapes that seamlessly transition as you move along the path, while also triggering the narrator’s words at exactly the right place on the path. It feels like the narrator knows when you’re turning a corner or crossing a street. We’ll be able to trigger additional content when a listener is lingering at a stop, or skip optional sections if he’s moving fast. And we’ll have alternate audio that triggers for Detour-takers after hours, when certain stops might be closed. If someone takes a break to grab that amazing slice of pizza the narrator just mentioned, we detect the anomaly and then tell the user how to get back on track once they’re full. Of course, we also use old fashioned tools, like subtle audio cues such as the sound of footsteps, to give the user confidence that they’re in the right place.

I want to make my own Detour

In addition to the Detour app, we’re building a powerful tool for creators to build their own Detours. We want to make creating a great Detour as easy as writing a blog post.

That’s an early screenshot of Descript, our Detour editor. It feels a lot like a word processor, but it does a lot more. The creator can drop inline events into the script that instruct the Detour engine to trigger certain actions at precise moments — fade in a piece of music, let the listener know they should start walking, show an image on the phone, play a sound effect, etc. The narration audio is synced to the transcript which in turn is synced to the path the user should be walking. When a user arrives at the specified point, all the appropriate events are triggered. Delete a sentence from the script, and it’s automatically removed from the audio stream. Move one paragraph to a subsequent chapter and all the audio, images, sound effects and path move with it. There’s no need to work with audio waveforms.

Tuning our tours

How do we know if a segment is too long or short? How closely do people listen to the narrator’s instructions? Where do people take photos or pause the tours? Those are vital questions the engineering team needs to answer. But since the Detour experience isn’t about touching the screen or clicking through the app, off-the-shelf analytics tools can’t answer them. So we built custom analytics to collect low level sensor data, process them into higher level measures, and visualize everything that occurred on a Detour. That was the only way to know where our listeners are, what they’re hearing, and how they’re reacting to their environment. We’re collecting lots of data, and using it to make better Detours.

We think of ourselves as building the platform for a new medium. Which means we get to tackle (and sometimes bang our heads against) problems that nobody else is trying to solve. Our challenge is to create experiences that feel immersive to listeners and intuitive to creators, through technology that disappears. It needs to seem like magic. Deciding we wanted people to use Detour with their phones in their pockets, not distracted by their screens, immersed in the place, put some serious technical obstacles in our path. We’ve enjoyed solving a bunch and looking forward to confronting many more.

Andrew Mason