AR-First Mobile second
The Product Design challenges of doing AR on Smartphones
With the launch of ARKit, we are going to see AR apps become available for ~500 million iPhones in the next 12 months and at least triple that in the following 12 months (as we can now include the numbers of ARCore supporting devices from Google). This has attracted the broader developer community to AR and we’ll see many many many experiments as developers figure out that AR is an entirely new medium. The 8th Mass Medium. In fact it could be even more profound than that, as through all of history we have consumed visual content through a rectangle (from stone tablets, to cinema, to smartphones etc) and AR is the first medium that is completely unbound. It’s a new medium the way the web was different to print, different in kind, not in scale. In the same way that most of the first commercial websites were “brochure-ware” where an existing print brochure was uploaded onto a website, the first AR apps will be mobile apps copied over to AR. They’ll be just as terrible as brochure-ware was, though they will be novel!
When I worked at Openwave (who invented the mobile phone web browser) in the early 2000’s we used to talk about “mobile native” experiences, as distinct from “small screen websites”. In 2010 Fred Wilson described this concept far better than I could in his widely cited mobile-first post. Hopefully this post will be an echo of his, but for AR.
Apple also recently published some Human Interface Guidelines for ARKit which look great. I couldn’t find anything similar from Google for ARCore, but will happily add a link here if someone can point me to one. This post goes into a fair bit more detail on the same types of topics based on our real-world learnings.
So what makes an app “AR Native”?
I’m going to dig into a bunch of factors that are “special” to Smartphone AR, like you’d build with ARKit or ARCore. I am specifcally avoiding AR apps for Head Mount Displays, as HMDs open a whole new range of possibilities for products that aren’t possible on a Smartphone (hands-free apps, longer session times, and much deeper immersion potential for starters…). So here’s a list in no particular order of things to think about as you consider concepts for your new ARKit app. I’m sure there’s more to be discovered, my friend Helen Papagiannis’ book Augmented Human goes even broader & deeper than I do here, but this list should be pretty comprehensive for now, considering it’s the result of over 50 years of combined AR smartphone app development experience.
When we are pitched at SuperVentures on an AR App (which is pretty much constantly) it’s easy to tell if the developer hasn’t built an AR-Native product, as the simple question “Why do this in AR, wouldn’t a regular app be better for the user?” is often enough to cause a rethink of the entire premise.
Think outside the phone
One of the biggest mental leaps to make is that as a designer/developer you now need to start thinking outside the phone. You also need to rethink the basics of how a user exercises intent with smart devices in general. The structure of the real world, the types of motion that the phone might make, the other people nearby, the types of objects or sounds nearby all now play a part in AR product design. Everything happens in 3D (even if your content is only 2D).
Various interactions/transitions/animations/updates that happen within the app are only one context to consider, as there could be a lot going on “outside the app” to take into account. Once you get your head around this the rest becomes easier to grasp.
Biomechanics of wrist vs head (vs torso)
The fact we hold our phones in our hands is a huge design constraint & opportunity. Our wrists + arms can move in ways that our heads just can’t. An app that you can use with one hand can allow movements that an app that needs 2 hands can’t (as you have to rotate your torso + have very limited translation without walking).
This means AR apps that don’t require you to be always looking through the display (as your wrist might be pointing away from your eyes) can work well. Two great examples are the “Tape Measure” app and AR apps that let you 3D scan a small object.
A real-time fast-twitch shooter game could be difficult to make work with Mobile AR as if you are moving the phone around quickly (if not, then why even do it in AR) then it might be hard to see what’s going on.
Why get your phone out of your pocket ?
AR Developers often forget that we usually have our phones in our pocket/bag when we are moving about. Our imagination of the user’s experience of our app begins when the user taps our App icon (or even after they are in AR-View looking through the camera). It should start much earlier, as one of the big unique benefits of AR is that it can provide real-time context around our location. If the user starts your app 30 seconds too late, then there may be no benefit as they’ve walked past the location the app needs. Facebook or SMS notifications don’t have this problem of being place sensitive.
So there’s a very real & difficult problem in getting a user to get their phone out while they are in the best place to use your App. Notifications could come via traditional push messages, or the user might think to use the app by seeing something in the real world that they want more information on, and they already know your app can help with this (eg Google WorldLens to translate languages on menus). Otherwise your app just needs to work anywhere, either through using unstructured content, or being able to tap into physical content that is very very common. This problem is the #1 challenge for all the “AR Graffitti” type apps that let people drop notes for others to find. It’s almost impossible for users to be aware that there’s content to find. FYI — this is just another version of the same problem that all the “Beacon” hardware companies have, getting the shopper to pull out their phones and then discover beneficial content.
This challenge also manifests in a different way when AR App developers think about “competition”. As AR is so wide-open it’s easy to think that the market opportunity is there to win. However AR is always competing with the minutes in our day. We already spend time doing something, so how is my AR app going to compete against that?
AR as a feature e.g. Yelp with Monocle, Google Lens image search
When we start thinking about ARKit apps, we often immediately think of the entire app experience taking place in “AR Mode” while looking at the real-world. This just doesn’t make sense, as many actions will work better if they were a regular smartphone app eg entering a password. 2D maps often are a better way to show location data due to the user being able to zoom in/out and having a 360degree view. Think hard about what parts of the App work natively for AR, and consider building a hybrid-app where AR-Mode is a feature. Yelp did this with their Monocle feature, as an example.
Interact with the world
At the simplest level, what makes a native AR app, is that it enables a degree of interaction between digital content and the physical world (or a physical person). If there’s no digital/physical interaction, then it should be a regular app. Putting this even more strongly, as smartphone apps are our “default” UX, then your AR App should only exist if it does something that can only be done in AR. Interaction with the world can come in 3 broad ways:
- 3D geometry: This means the digital content can be occluded by and physically react to the real world structure. Lots of good Tango demos of this (16:50 in this one), our Dekko r/c jumping off physical ramps, the Magic Leap little robot under the desk are all examples.
- Other people sharing your view & physical space: If your app needs other people physically near you in order to give benefit, then its probably AR native. Good examples of this are an architect showing a client a 3D model on the coffee table (1:39 here for something similar), or Star-Wars Holochess
- Controller motion: This means the content can be static but you physically move your phone around, or the phone could act as a “controller” like a Wii. This one is borderline IMO. If the content just sits on a tabletop and you can view the content from different points of view by moving your phone around, then personally I don’t see this as AR Native as the only benefit is Novelty value. A Smartphone app will probably get better use by using pinch/zoom etc. Some people argue that the sense of physical scale the user gets by moving around makes it AR Native… The question that you need to ask is that is there any benefit to this interaction beyond Novelty value. One way in which a controller (phone) can be used in a Native way is in a similar way to a Wii controller, where specific movements of the controller cause action in the content. One example would be a virtual balloon hovering over a desk, and you could “push” it around by physically pushing the controller against the balloon. Lots of other ideas here which have barely been explored.
Think about whether your use-case needs these types of interactions to give the user benefit you want. If not, then accept this is a novelty (not a bad thing in itself, just set expectations of re-use appropriately around zero) or build a regular app….
Location re-enables scarcity
Back in 2011 at the AWE event organized by my SuperVentures partners (Ori & Tom) Jaron Lanier gave a talk where he raised one of the most fascinating ideas that I’d heard in years wrt AR. Just as the Internet “broke” the concept of scarcity (ie infinte copies easily made of digital goods; it doesn’t matter where I physically work from etc) then AR has the ability to recreate scarcity. Many many AR experiences will be tied to the location in which you experience it. Even if the experience can be re-played in a different location, it should always be best in the place it was designed for. It’s the digital equivalent of a live concert vs a MP3. This has huge potential for new business models for digital businesses, especially when you start mixing in the potential of blockchain & decentralized businesses. Creative businesses in particular could benefit from this as they have been most disrupted digitally by the lack of scarcity, and at least in the music industry, it’s the live events that are generating the most income.
Pokemon Go is starting to take advantage of this concept, with the “Raids” feature. For example, in Japan a Pokemon Go raid was setup — it was the only location and time in the world you could have acquired the most valuable Pokemon, Mewtwo.
Another example could be an “AR MMO” where you have to travel to Africa to visit a specific smith. I expect some very interesting companies will be built around the recreation of digial scarcity.
Don’t get seduced by your (genuinely good) idea.
AR is seductive. The more you think about it, the bigger you realize it is going to be. The more great ideas you have that are completely new. We regularly see startups pitch us with a great idea. Truly they have figured out a problem that the world needs solving. It can truly become a huge platform, the next Google etc. The problem is that the tech or UX probably doesn’t support it quite yet or for the next couple of years. The entrepreneur believes in the idea and wants to get started, but sadly the market or tech doesn’t arrive before they run out of their 18 months of runway.
It can be very hard to put a good idea you believe in on the shelf. But you need to, unless you know for a fact (not just really really believe) that your idea can (a) be built as you imagine it today ; and (b) you know someone real who will buy it (ie you can give me their name & phone number).
If you can’t do the 2 things above, then you should ask yourself whether your great idea is blinding you to today’s reality.
As a footnote here… a “good” idea doesn’t just mean a cool concept. It means essential. It means an experience that provides a compelling reason to displace a more efficient, socially understood & natural version of that feature in non-AR space.
Field of view is a big constraint
One obvious difference between a smartphone AR app and a HMD AR app is the Field Of View. HMD’s with a large field of view (when they arrive) will feel very immersive, however Smartphone AR will always have two challenges:
- You only have a small “window” into the universe your AR content (a “magic mirror”). Your friends have an even smaller window, should they want to also see what you do. This can be mitigated somewhat with good tracking (and also a little with hardware eg the new LG v30 has a wide FOV lens that helps), but it often ends up feeling like you’re looking at the world through a straw.
- The screen on the phone displays what the camera sees, not what you see. The Camera orientation, offset to the display, and field of view are different to our eye. If you drew a line from the center of your eyeball onto the world, then held the camera exactly on that line pointing the same way, and the camera field of view was close to what your eye would be, then the camera can appear kind of see-through, like a piece of glass. In reality it never does at all, but people don’t mind and they just look through the screen & ignore what’s not on the screen (As a note there has been some research done which does a pretty good job of adjusting the video feed to do a much better job of looking “clear”, but that probably won’t end up in mass-market phones…. It’s worth a look & maybe some experiments for the curious)
So what can be done to help with immersion when you’re looking through a tiny window at content that isn’t where your eye would naturally expect to see it?
- First accept that users will generally suspend disbelief and accept the AR experience as being digital content in the world
- Think about scale. If your content is physically small it can fit entirely inside the smartphone glass + camera field of view and avoid clipping at the edge of the screen. For example a life sized digital cat will probably give a better experience than a life size T-Rex. We can apply some learnings from VR here in that playing with scale inversions can create a sense of immersion & feel magical e.g a tiny elf that lives behind a pot plant. If your content is going to be larger than the field of view, then think about some affordances to help with the fact that users can’t see everything. One idea we found worked was to use some fog to cover the digital scene, only what you were looking at would be clear and the edges of the screen appeared foggy. This hinted to the user that there wasn’t anything interesting outside the screen area anyway, but they could pan around and disperse the fog if they wanted to explore a larger area. Anything that’s offscreen should really have some sort of indicator as to where it physically is. The user shouldn’t have to pan around wildly searching for some asset by guessing.
- This sense of immersion can be quite difficult to achieve in practice. Getting your product used beyond a one-time novelty experiment is hard & means re-thinking many UX aspects. User testing becomes somuch more important as you are starting from the basics. In AR, you have to repeatedly test with the SAME users over time, to get rid of the novelty bias and get real data. It’s a big jump from something that looks impressive on YouTube, to something actually feels good to use.
Accept you have no control over the scene
One of the biggest differences between an AR app and any other kind of software, is that as the developer you have no control over where in the real-world the user decides to start your app. Your tabletop game might need a 2x2 foot playing area, but the user opens the app on a bus, or pointing at a tiny table, or a table covered with plates & glasses. The app is effectively unplayable (or you just use it as if its not an AR app). You end up with two broad viable options:
- give the user specifc instructions re where to use the app. Expect them to go to another room, or clear the table, or just wait until they get home. Expect them to have a poor experience the first time they use it in the wrong place. It doesn’t take much to imagine what this will do to your usage metrics… Another way to solve this is to make an app for a very specific use-case (eg fit furniture in the living room) so that users won’t even think of opening it somewhere it won’t work.
- the other viable approach is to design the app with content that is “unstructured” ie it can work anywhere. We eventually figured this out at Dekko, where we started off trying to build a game with structured levels and difficulty, but ended up with a toy car that could just drive around anywhere. If you are thinking of building a game for AR, my advice is to build digital toys that can be played with anywhere (virtual r/c car, virtual skateboard, virtual ball & stick etc). Many friends have told me that the Hololens team were key drivers of Microsoft’s purchase of Minecraft, as it is a perfect fit as a Native AR app.
- the third and theoretically best solution, which isn’t viable as its not technically possible yet, but this is where things will end up, is to have a 3D understanding of the scene, then automatically & intelligently place your content in the scene in context to what’s already there. This needs 2 hard problems solved 1) real-time 3D scene reconstruction (almost solved) & semantics; and 2) procedural content layout tools. We’re still a few years away from having all these solved for developers… so you’ll need to use one of the first two viable approaches above for now.
The good news is that having no control & giving maximum agency to the user is where the magic happens! It’s the defining aspect of the medium. Charlie did a great talk on this topic.
In addition to the above, you are given an arbitrary coordinate system for your app, that is unique to this session. Your app only knows it’s relative position within this coordinate system, the app starts at xyz 0,0,0. There’s no persistence, no correlation with “absolute coordinates” (lat/long/altitude), and no ability to share your “space” with someone else. This will certainly change in future (12–18 months) updates to ARCore and ARKit, and there’s a few startups working on overlay APIs, but for now, you need to take this into account in your product design.
How we naturally hold our phones
In the early days of Mobile AR, we optimisitically thought people would happily hold their phones up in their line of sight, to get the incredible experience of an AR app (we called it the Hands Up Display). We vastly overestimated how willing people would be to change behavior. We vastly overestimated how accepting other people would be of you doing this behavior (remember Glassholes). Your app will not be the app that changes this social contract. You’ll need your app to work based on how people already hold their phones:
- hands-up for very short periods of time, about as long as it takes to take a photo. This means AR becomes a way to visually answer a question, rather than a way to pursue long session engagement.
- 13 degree angle pointing at the ground, like we already do most of the time. Google, working with Speck design figured out that 13 degrees is the natural angle we feel most comfortable with. Not 12 or 14 degrees! This points to apps with smaller scale content (eg 1ft or 2ft high assets) where the action takes place right in front of you.
There are a number of ways that the phone sensors can be used to switch modes depending on how the user holds the phone. e.g. your app is a regular 2D map when held horizontally, but switches to AR mode & triggers a visual search instantly when it’s held up. Nokia’s CityLens app managed this UI in a nice way back in 2012. These are good ideas to explore…
Input is dis-intermediated by the glass
As you test your app you’ll realize that one of the first thing users do when they see your virtual content appear in the real world, is that they reach around the phone and try to touch it, then look confused when they can’t! This is a good sign that you’re building something cool, but it shows the challenge of input & interactions being mediated by a piece of glass that is physically in your hands, while the content may be several feet away.
You’re essentially asking someone to abstract their direct 3D understanding of the world, back into 2D gestures on a smartphone screen. This adds cognitive load and also breaks immersion. But big spatial gestures or depending on the presence of others can be equally problematic.
Working out what UI affordances make sense to users that let them select & interact with content that is “over there” can be challenging. Our Dekko r/c car solved this by having the phone be the r/c controller, which has a direct analog in real-life, so it made sense to people. With our monkey, and at Samsung where we were interacting with Samsung SmartHome products, it was harder to come up with universal solutions that were immediately intuitive.
Note that just being able to physically “touch” the content doesn’t solve the problem either. If you’ve had a chance to try to Meta HMD or Leap Motion, then you’ll notice that the lack of Haptic feedback just creates another dissociation for the user to work around. IMO input for AR is one of the biggest problems to really solve before we can hope for mass-market regular daily use of AR.
Opportunity to manipulate video frames of the real world (vs see-through displays)
One characteristic of Smartphone AR as opposed to see-through HMD’s is that the “real-world” the user sees is actually a video feed of the real world. This gives opportunities to manipulate the video feed to improve aspects of your app. Will Steptoe did some great research back in 20113 before he joined Facebook, where he showed that by subtly stylizing the video feed of the real world, it became harder to tell virtual and real content apart, thus increasing immersion. This is a really interesting idea, and could be applied in innovative ways to AR phone apps.
“App” concept for context switching vs native AR UI
Smartphone AR apps, by nature of the OS desktop being what the phone boots into, rely on the smartphone App metaphor to open/close “apps” to switch contexts (eg switching between my messaging app vs. a game. I probably don’t want both of those contexts running simultaneously in one app and the app having to figure out what I really mean to do). You will dip into & back out of AR-View (looking through the camera) constantly, switching contexts each time, driven by the user. This really simplifies things as you don’t have to worry too much about dynamic context switching (eg when I point my phone at a building, do I want AR to display the address, the yelp review, trulia apartment prices, the architectural history, which of my friends are inside etc). It’s a seductive idea to try & be the umbrella app with lots of different use-cases supported (“I’m a platform!”) but understanding what the user wants and showing just that is an incredibly challenging problem to solve. It’s probably too hard to solve for anyone who is learning anything from this article…
Skeumorphism is good
AR is a brand new medium, and users just don’t know how to use it. There are lots of really clever ways to design interactions, but users just won’t be able to grasp your cleverness. By replicating the “real” thing you are digitizing then users will instinctively understand how to use your app. At Dekko we built this amazing little monkey character that people loved, but they had no idea what to do with a little monkey on their table. When we switched to a r/c car that looked like an r/c car, we didn’t have to explain anything. At Samsung we prototyped receiving a phone call in AR. There are so many cool ways to do this (eg a hogwarts owl could fly over and land in front of you with a message) but the idea that user tested the best was pretty much copying the look and feel of your iPhone phone app, a rectangular box with green circles to answer, red to ignore etc.
It’s important to note that it won’t be 2D skeumorphism — it’s likely a new 3D symbolic language will evolve that uses spatial clues (shadows, shading, transparency etc) but without necessarily needing to be photo-realistically faithful to the real world objects.
The proof of this comes only by user testing your app with users who have never used AR before. I guarantee that you will be shocked at how much you need to simplify and make things literal in order to avoid explaining what to do.
Affordances for users who won’t believe what they see
Something we observed which was hard for us to accept being immersed in AR all the time, was that users often did not believe what they saw. They couldn’t believe that the content was actually “in the 3D world” vs on the screen.
Make liberal use of affordances and subtle signals to show some connection to the real world. It will really help users understand what is going on. I can’t over-state the importance of at least a drop-shadow, if not dynamic shadows & light sources. Putting grid lines or other means of “painting” the world, separately from your interactive content also were very effective at helping users understand.
Another affordance that is difficult but hugely effective is that animated responses of the virtual object to collisions, occlusions, phone motion and control input are critical to having the object appear “real”. Stable registration (without drift) is only the first part of this. Study virtual characters “walking” in AR and you’ll often pickup that their feet “slide” a little on each step, which feels wrong & breaks presence. Same as if the character “bumps into” a physical structure and doesn’t respond as you would expect. Cues like eye-contact with the camera really feel good. A big part of getting this right is choosing the right type of character to put into your AR scene. The more realistic your character, the more subconscious expectations the user will have re what is “right”. The uncanny-valley is significantly more of a problem for AR than for animated movies, because we have the context of the real-world all around. We found that consciously selecting characters that are not “real” and more cartoony or fantastical ensured people’s expectations of their behavior in the real world were not pre-set, and flaws in the animation were less off-putting. Think Roger Rabbit, not Polar Express.
One big issue that affordances highlight is content overlap and information density. Most AR demos show a happily laid out space with labels all neatly separated and readable. As exhibition designers and architects know, designing labels and signs in 3D space to be readable and understandable while moving — is really hard.
Character & Asset Design
When thinking about what assets and characters to use in your app, we discovered some pretty workable approaches which seem to be spreading across the industry. Silka did a lot of research & user testing of various ideas for our Dekko Monkey back in 2010. She found that “pixar for adults” was the perfect way to think about sophisticated, lo-poly-count models, that didn’t try to cross the uncanny valley, yet with some affordances really looked good in the real world. This style was accessible to both women and men, and children & adults and could work globally culturally. She felt that digital versions of KidRobot collectable vinyl toys were perfect and she reached out to KidRobot’s top designer, Huck Gee, who designed our Monkey and has since become a great friend (seriously, hire him to design your AR characters, he’s amazing — hit me up for his contacts).
A few years later Natasha Jen from Pentagram independently came to almost identical conclusions for a Magic Leap design project, including the KidRobot inspiration. This video discusses her thinking from around the 30 min mark.
The latest demos for ARCore show a similar style, especially the Wizard of Oz characters….
Remote Teams present unique challenges
If your team is spread out geographically, you might want to rethink that. Trying to describe the AR problem or goal you are trying to address is almost impossible without literally standing beside your coworker and watching what they do. Apart from the challenges of describing what you want “I need the lion to look brighter against that wall when I move my hands like this”, the remote person will be using their instance of your AR App against a completely different scene. Your ground plane will be different. Your tracking quality may be different. Your lighting will be different. One of you might have moving objects/shadows in your scene. The app will behave differently even when both designers/developers are doing the same thing. Unless you go to the hassle of building some sort of identical sound stage in every office, you are going to lose many product iteration cycles on the back & forth. Your speed of learning will drop. Your frustrations with each other will increase. Your app quality will drop. Really try & put everyone in the same room while you’re all learning.
I’m hoping that sharing these learnings will reduce a couple of product iteration cycles of learning for everyone, and that better products will come to market sooner, that users engage with. I’ve been seeing YouTube demos of great concepts for almost 10 years, and all I want is some great apps that I want to use. There’s a big difference between an amazing video & an essential UX, and it’s a larger chasm to cross than many people realize.
If I could end this whole post with one lesson to take away, it would be… put a very experienced AR, UX or Industrial Designer on your team in a senior role and user test, user test, user test every little thing
@mattmiesnieks on twitter or email: firstname.lastname@example.org
Thank you to…
This post would not have been possible with the help of these people who have a far better understanding of the AR UX than I ever will, and have been working on solving real AR product UX problems for at least 5 years each.
- Charlie Sutton who led design on our AR team at Samsung & previously led a team at Nokia designing AR products (little known fact… Nokia was arguably the world leader in AR in the late 2000s & still have one of the richest AR patent portfolios), and is now working at Facebook.
- Paul Reynolds who is building Torch to make creating native AR/VR apps simple for developers & designers. Paul previously led the SDK team at Magic Leap figuring out the intersection of apps & platform for AR
- Mark Billinghurst is one of the worlds leading augmented reality researchers & is one of the few industry legends when it comes to AR interactions. Mark has been researching this domain for over 20 years and his work is almost certainly cited in whatever books or papers you learned AR/VR from. Mark is currently working at the Empathic Computing Lab in Adelaide and previously at HIT Lab NZ and most of his work is published at these sites. He is also a good friend & one of my partners at SuperVentures.
- Dr. Jeffrey Lin, PhD is a world expert on how visual perception is represented in the brain. He took those learnings to design products at Valve & Riot Games, before joining Magic Leap as Design Director. His medium posts are a wealth of AR design knowledge.
- Ryan Hickman was the first employee & Program Manager on the Google Tango team and contributed to everything from how the hardware was designed, to helping developers understand what apps to build. He is now CEO of TickTock AI, Inc. figuring out how to blend the physical and digital worlds.
- lastly my wife Silka Miesnieks who has worked designing AR products since 2009, co-founded Dekko with me and solved AR UX problems that many AR people still aren’t aware of. She now leads The Design Lab at Adobe helping design the future of immersive creation tools for people who aren’t engineers.
Additionally, the following designers have contributed greatly to my education around AR and their influence has shaped the thinking that went into this piece. Cliff Warren, Avi Bar-Zeev, Josh Anon, tipatat, John Sietsma, Kars Alfrink, Alysha Naples and BERG
All the errors in here are mine and result from me not translating the advice I received into the post above. You should follow all of them on social media, ask them for help, take their advice, and try to poach them away from their current jobs if you can!