WebAR with WebXR API, Part 1

Wood Neck
NAVER FE Platform
Published in
9 min readNov 30, 2020

Part2: https://medium.com/naver-fe-platform/webar-with-webxr-api-part-2-dc76b20767fb

WebXR API and other WebAR Approaches

WebXR API is added in Chrome 79 and supporting the “immersive-ar” session after version 81, enabling full-fledged mobile AR implementation on the web.
This article discusses how to implement augmented reality(AR) on the web using the WebXR API and explains how to implement AR on the current limitations.

The article consists of two main parts, a description of the WebXR API, a theoretical section that explains the things to consider in applying it, and a practical section that describes the actual code that applies the WebXR API.

This is what I learned while I was working on the open-source 3D model viewer library @egjs/view3d, so check that if you’re interested :)

Web AR Approaches

Augmented reality(AR) can be classified in Sensor-based and Vision-based methods.

Sensor-based AR

A sensor-based AR is an AR that is implemented using sensors such as GPS coordinates or accelerometers and magnetic field systems.
Unlike vision-based methods, this approach has the advantage of using relatively smaller computations. However, there is a problem with accuracy being affected by sensor performance on each device, and it does not use the features like floor detection, it is not possible to set the exact location of the 3D model’s position compared to the ground or wall, so only limited types of applications can be created. You can also combine data from multiple sensors to achieve more accurate results.

This is a representative library of Sensor-based AR. There is GeoAR.js that perform location-based AR functions using GPS information. Currently, the library is included in ar.js.

One of the most famous libraries is GeoAR.js, which perform location-based AR using GPS information. Currently, the library is included in ar.js.

Vision-based AR

For vision-based AR, it can be divided into Marker-based AR and Markerless AR.

Marker-based AR

A marker-based method is a type of vision-based AR and can be divided again into fiducial and natural feature tracking methods.

The fiducial method recognizes space through images with certain patterns called fiducial markers. It has the advantage of having high accuracy and robustness in environmental changes, but it also has the disadvantage that it can only be applied on limited applications because the user has to have the marker prepared himself.

marker generated from ar.js marker generator

In the case of the natural feature tracking method, it recognizes 2D images such as the picture or 3D objects such as the human face, and use them to perform a pose estimation. It can be applied to a wider variety of AR applications compared to the fiducial method, but requires a higher recognition capability and corresponding computations power.

The ar.js is a representative library of marker-based methods.

Markerless AR

A markerless method is an AR that recognizes environmental information, such as walls and floors, and tracks the location of the device using the information of the sensor without the surrounding environment information being given.

An analysis of images from camera feeds is typically performed using a technique called Simulated Localization and Mapping (SLAM). Mobile devices are specifically called Monocular Visual SLAM because they are usually performed using a single camera.

This has a broader range of applications compared to the previous two methods, but because of the high computational demands, mobile devices often use the hybrid method, which combines SLAM with sensors data.

The examples of the markerless method are the 8th wall using its own SLAM engine and the ARCore/ARKit, which is the basis for the WebXR API to be described below.

WebXR API

In late 2017, Google announced ARCore, an AR framework for Android, WebAROnARCore, a Web AR prototype that allows it to be used on the Web, and WebAROnARKit for iOS which is based on Apple’s AR framework ARKit.
These were experimental, and although it was not standardized, it has something in common as an intermediate step before the WebXR API.
Both WebAROnARCore and WebAROnARKit implemented AR-related functions by expanding the WebVR 1.1 API, which was previously implemented on the web. Similarly, the WebXR API deprecated the existing WebVR API and expanded them to include AR functions.

The meaning of “X” in WebXR is described in WebXR Explainer as such in order to substitute “V” or “A” for variable “x”, so that it can be either a WebVR or a WebAR.

WebXR Features

At this point, Android Chrome is the only browser that properly supports AR functions among the WebXR APIs. Of course, it is expected that other chromium browsers will start to be applied gradually, and Samsung Internet currently supports VR sessions out of the WebXR APIs with similar chromium browsers, but it is scheduled to add support for AR sessions. For iOS, the Firefox Reality browser supports the WebXR API, but excluded it because it is not used typically.

For that reason, you can say that the WebXR API was implemented in a way that leverages the capabilities of Google’s ARCore, based on the Android Chrome browser. When the WebXR API is used, if ARCore is not installed on the device, the API redirects to the installation page.

ARCore install page from Google Play store

ARCore performs pose detection by hybridizing SLAM + sensors. By default, motion tracking of 6DOF (Degree of Freedom), which includes the movement and rotation of the device, can be performed. And additional features can be requested as defined in XRSessionInit upon session request.
These features include “hit-test” and “dom-overlay”.

“hit-test” is the ability to extend the light from the device’s camera to viewing direction to obtain the point of collision with the surrounding environment recognized by SLAM. This enables us to recognize the surrounding environment, such as floors, walls, and ceilings, and display 3D objects on them.

WebXR hit-test, from https://codelabs.developers.google.com/codelabs/ar-with-webxr/#4

“dom-overlay” is a feature that renders DOM elements on the WebGL layer, and is one of the great benefits of WebAR. Through this, the GUI of AR session can be simply implemented using CSS + JS event handler without messing with WebGL. Also, elements such as video or image can be displayed on the actual environment using a 3D transform of CSS3.

WebXR Limitations and Overcoming it

Browser & Device Coverage

The biggest obstacle to applying WebXR API today is browser coverage. Currently, only Android Chrome browsers are supported, making it difficult for other browsers to serve WebXR content to all users.
Of course, there are many browsers based on Chromium, and it seems obvious that AR functionality will slowly be added to those browsers, but for Firefox and iOS browsers, we don’t know when the WebXR API will be added, so we just have to wait for now.

WebXR Device API browser coverage from caniuse.com

On the other hand, for Android devices, WebXR cannot be used just because it meets the browser version. Because ARCore, which is the basis for the implementation of Chrome’s WebXR API, has a set of installable devices, it is impossible to allow all devices to use WebXR.
The reason for filtering devices is to provide users with a good quality AR experience, and we filter devices through conditions such as high-quality sensors and cameras, and CPU with sufficient operation capacity.
Details can be found on ARCore supported devices page.

Fallbacks

In the AR example of Google’s <model-viewer>, if WebXR is not available, it can use either SceneViewer, a 3D model viewer embedded in Google App, or AR Quick Look, Apple’s 3D model viewer app provided in iOS. These two methods are limited to floor recognition-based AR, and there are differences in UI and functionality provided. But SceneViewer can support all Android devices that ARCore can be installed regardless of the browser, and AR Quick Look can support all iOS devices with version 11+.

To use these two fallbacks, you must prepare a 3D model in .glTF format and a 3D model in .usdz or .reality format. It is relatively simple to use, and both are implemented using Anchor (<a>) tags. SceneViewer can be opened using intent URL, and AR Quick Look is supported natively by setting rel="ar".

SceneViewer Example, you can check the available options on this page
AR Quick Look Example, you should provide one image element inside the anchor element.

While it is true that SceneViewer and AR Quick Look can definitely cover a wider range of devices than WebXR, WebXR also has a clear advantage. Compared to SceneViewer/AR Quick Look, which simply displays the model, all WebGL effects and custom interactions only can be done when using WebXR.

Polyfill

In the case of WebXR Polyfill, there’s webxr-polyfill Github repository of immersive-web, which is a migration of the previous “webvr-polyfill” into the WebXR API. Basically, it supports only the “immersive-vr” session using a Google cardboard.

Other Things to Consider When Providing Web AR

The size of the 3D model’s files depends on how detailed they are, but they generally have a lot of capacity compared to other content displayed on the web, such as images. Because of this, the user has to wait until the 3D model is fully loaded over the network. So normally 3D viewers fill the viewer area before loading the 3D model by displaying the thumbnail image of the 3D model in the viewer area.

Because the time to interactive can be reduced only by substantially reducing the capacity of the model, @egjs/view3d prepared a level of detail (LOD) feature and mesh-simplifier to greatly reduce the 3D model file size, and also used Google’s Draco Mesh Compression to reduce the overall size of the glTF model.

LOD is a feature that greatly reduces the time before the model is actually displayed by preparing a model that reduces the size of the texture image and the size of the mesh of the 3D model, and then loading the smaller model first. In typical 3D graphics, LOD typically means changing the detail of the 3D object at runtime by camera distance. We use the same term because of the commonality of lowering the details.

Model Simplification

Draco is a 3D model compression technique developed by Google and has the advantage of being able to be applied directly to the glTF format in extension form.

By combining the above technologies, file size can be greatly reduced, for example, by testing with the Frescos at Medieval Convent, São Cucufate, Port., the 3D model’s file size was reduced from 244 MB to 4.2 MB (-98.28%).

Original(244MB, left) Mesh-Simplified + Draco(4.2MB, right)

Other tips

Converting & Generating 3D Model

For the glTF model, it exists in two formats: glTF/glb, and both can be significantly reduced in file size by applying Google’s Draco compression technology. The tool can be used for this is gltf-pipeline, which enables converting glTF to glb, glb to glTF, and applying Draco compression at the same time.

In the case of usdz models, the usda to usdz conversion tool is provided by Pixar’s USD toolset. If you already have a glTF model, you can use Google’s usd_from_gltf, or other converters like gtf2usd and Reality Converter.

See part 2 for actual WebXR API examples :)

--

--