New Challenges AR will pose for the Web

WebAR will have to move beyond the traditional browser and re-define it.

(This article states my personal opinions and I do not represent Meta’s Views, which may be different)

Web VR (Virtual Reality) has taken some strong strides thanks to the efforts of Google and Mozilla. The standards are evolving to be very solid. The current focus for Web VR is delivering a low latency performance on a HMD (Head Mounted Display). The Web VR specs are a good starting point for AR too, but is not sufficient to represent AR.

The fundamental differences in AR introduces a lot of challenges for the web. Within AR, identifying and tracking the environment accurately is critical and it is a hard problem (Read about SLAM if you want to understand this). Within VR, I don’t think SLAM is of immediate concern and its purpose there will be to blend experiences and transition from AR to VR world and vice-versa. A web implementation of SLAM might not be bad idea with benchmarks.

Let us talk about the browser little bit. In the context of VR, a browser based experience makes a lot of sense. With WebGL it is possible to create amazing virtual experiences within the browser like what Mozilla and Google have done. The beauty of Web VR lies in its ability to democratize VR among web developers. There are 273 repositories on Github today that has ‘webvr’ in its name/description. With WebVR, developers can think of things to build with their existing Javascript skills, and that combined with the examples Google and Mozilla have seeded, it has gained momentum. And, there are enough use cases for developers to keep chipping away while the WebVR technology stack keeps working improving latency and tooling. The focus for WebVR is rendering and head tracking. I don’t see it changing for a while.

AR has different needs. It attempts to bring people together in ways different than VR. Check the recent AWE demos and even the TED talks by Kipman and Gribetz. I believe that all of these talks can fall into one of the following themes (obviously with different underlying nuances) —

  1. Make everyone you collaborate with part of the same environment you are in. By making them part of the same environment, workflow becomes seamless and personal. (People participating via holograms, interaction between holographic representation of real people and real people in the room)
  2. Make everything you are working on part of the same environment you are in. Every thing in the environment follows the same rules of interaction. (I can pick objects, interact with them, hand them over to collaborators, etc)

Predominantly VR is exploratory in nature. By that, I mean you can design experiences that allow people to experience without having to interact in the traditional point and click manner. Even such interactions can be modeled via gaze or eye-tracking. For the use cases VR currently targets (visualizations, gaming, cinema) this is perfect.

What is AR web and what will be needed for powering it?

  1. For one, I think it is much more than a browser. Generally, I have seen people equate web to a browser and vice-versa. For most cases, the blurred line between the browser, which is only a representation of the web, and the web itself, does not pose a problem. But in the context of AR, equating the browser to web is not only inaccurate but limiting. (Biggest difference will be that we won’t consume web content via a browser in an AR setting)
  2. A software layer that unifies all networked AR nodes. More on this in (3).
  3. Powered by hardware with networking capabilities. Today all AR hardware is either HMDs or mobile phones (enabled via AR apps). The form factor of hardware will change for AR. In few years, we might see cars with AR enabled windshields or general purpose projection screens on the back of aircraft seats or even smart walls. It will happen, it is just a matter of when. But before all that, we need to define hardware standards. For any hardware that is intended for AR (irrespective of form factors), a networking (or OS-like) kernel packaged as a part of hardware could provide hooks to interacting with the underlying hardware. Think of them akin to APIs into the device. But these APIs have to be standardized. That way, a healthy competition can still exist between makers of the physical hardware platforms. Different platform makers will emerge for different use cases. But, the unifying software layer will the be AR Web operating system.
  4. Open source. For AR to truly live to its potential, its core operating system will have to be co-created. It is still early days for the AR industry. Every dedicated AR hardware today is built as an HMD with different capabilities enabled both by their choice of hardware components (sensors, lens, field of view, etc) and software algorithms providing features like — object tracking, SLAM, etc. The good part is that most of the AR companies offer a software development kit (SDK), which provides hooks into their hardware. The quality and openness of the SDK will determine the quality of the AR apps. Theoretically, companies can control what gets built on their platform by structuring the SDK and curating the APIs. But for the sake of the industry, I hope that companies choose to make their SDKs entirely open, providing developers full access to their hardware. With such a shared mindset, we can start standardizing the SDKs, and make progress towards defining standards that will lead us to a connected world via AR nodes.
  5. Interface driven. Within an AR layer, our interactions with content will be different. We will use much natural ways of interactions. For instance, at Meta, we are very focussed in our attempts to design our interfaces according to the neural path of least resistance. It is important to design interfaces that feel natural. These UX factors will be critical. Check this great talk by Josh Carpenter about UX for Web VR. Web AR will have more things to deal with from the UX perspective because of its tighter relationship with a viewer’s environment.
  6. Natural interactions. Will we use traditional keyboards or a virtual representation of a keyboard? Can we use voice to navigate? Can we gaze at objects? This will be one of the harder questions to answer. This will involve a lot of user studies. For one, I think we will start typing a lot less. There will be an emerging economy within AR ecosystem for people who simplify content interactions.
  7. Peer-to-Peer. The real power of Web AR will be information sharing. AR provides a natural platform to blend digital content into the real world. When everything becomes part of a single environment, informational transactions seem natural. I should be able to seamlessly hand off a digital object (can be anything digital) to someone within my social or professional network.

Web is very important for AR. In fact, AR is a far better medium than a traditional desktop/mobile browser to consume web and bring it to its full glory. In a few years, AR will be all about how people can connect the world by being an information node themselves. For enabling that, we need to define standards for platform agnostic AR software and take steps in implementing them.

If you like the article, consider recommending it.

Thanks to Tony Parisi and Jason Harmon for looking at the early drafts and providing helpful suggestions.