Web-Powered Augmented Reality: a Hands-On Tutorial

A Guided Journey Into the Magical Worlds of ARCore, A-Frame, 3D Programming, and More!

There’s been a lot of cool stuff happening lately around Augmented Reality (AR), and since I love exploring and having fun with new technologies, I thought I would see what I could do with AR and the Web — and it turns out I was able to do quite a lot!

Most AR demos are with static objects, like showing how you can display a cool model on a table, but AR really begins to shine when you start adding in animations!

With animated AR, your models come to life, and you can then start telling a story with them.

But before we dive into code (and there will be lots of code, I promise!), let me tell you a little bit more about how I got into AR in the first place.

Early Fun with Augmented Reality

My first dip into AR was in 2012, when Tom Teman and I built SoundTracker, an experience where the participants would move inside a room, and the music would change based on their positions within the room. For position tracking, we used QR-Code like markers along with 8 tablets that constantly filmed the room, and we used Qualcomm’s Vufuria AR Platform to figure out the coordinates. It wasn’t AR per-se, but it was built on top of AR marker-based positioning technology.

Then, in 2015 I got a Project Tango tablet — Google’s experimental technology for building Augmented Reality experiences. The ability to place virtual objects in the real space, and have them stick in place even when you move around, seemed to me like we were diving down the uncanny valley, where the boundaries between the physical world and the game were beginning to blur. This was the first time I experience AR without the need for markers or special props — it just worked out of the box, everywhere.

And yet, the magic quickly began to lose effect: even though project Tango had a lot of potential for building amazing experiences, the tablet was not a very common product — in fact, there were only 3,000 units sold worldwide during 2015. This meant that even if I did build something out of the ordinary, I would be able to share it with almost nobody.

Fast forward a few years, and Google announced (just a few weeks ago) that Project Tango was deprecated and would be shut down shortly. The little bit of sadness I felt quickly changed to joy — as I heard the reason was that the same technology was coming to Android under the name “ARCore.” It’ll be available on flagship Android phones starting with the Samsung Galaxy S8 and the Google Pixel family.

In Project Tango, the hardware had a special Infrared and fisheye-lens cameras, which were used to assist with the depth perception and motion tracking. However, with the new ARCore technology, these special cameras are no longer needed: advancements in deep learning and tracking technologies, made it possible to achieve a good enough level of motion tracking and depth perception using just a single RGB camera.

To make it even better, unlike the Tango ,where you had to use Unity, Java or C for building the experience, with ARCore you can actually build using Web technologies — which is exactly what we are going to do here today!

AR on the Web

AR is just starting to come to the web. The first time I had an AR-like Web experience was when Konrad Dzwinel created his JavaScript Face Tracking Demo, which allowed you to set your head on fire:

My first AR-Experience, back in August 2014, was burning hot!

This was fun!

Then, in early 2017, Jerome Etienne released AR.js — which brought marker-based AR to the web. After printing your own based marker, you could detect its position in the camera’s video stream, and lay your own 3D-models and creatures on top of it. It had decent performance, and it was possible to achieve 60-FPS on most modern smartphones. It was entirely built on top of standard Web technologies, right from your browser, requiring no plugins or whatsoever. Just like what got me so excited about Web Bluetooth.

A few months later, AR.js added support for Project Tango, but you could only use it with a very specific build of chrome (and of course, you’d need to have a Tango device). It was the first time, to best of my knowledge, you could build a marker-free AR experience on the web. Finally, things were starting to be very interesting.

Nowadays, Google is working on bringing ARCore to the Web, using an experimental web browser, called WebARonARCore (pardon the tongue-twisting name). There is also an iOS version, similarly called WebAROnARKit. Mozilla has also recently joined the party, and has just released the WebXR Viewer iOS app. We’ll get to them soon.

All of this gets very exciting and fun when you combine them with a little technology called A-Frame.

A-Frame

A-Frame is an emerging technology from Mozilla, which allows you to create 3D Scenes and Virtual Reality experiences with just a few HTML tags. It’s built on top of WebGL, Three.js and Custom Elements, a part of the emerging HTML Components standard.

I first heard about A-Frame from Madlaina Kalunder, who taught me a lot about modeling 3D experiences for the Web, and I immediately got hooked by its simplicity. This is all the code it takes to display some 3D objects on your screen:

Yes, just a few lines of HTML code; that’s it: not even a single line of JavaScript! Now compare it with programming WebGL directly. Can you see how big the difference is?

Hello, A-Frame World!

A-Frame has a growing ecosystem of plug ins, which make it even more powerful while keeping the syntax simple (even I created one). One of these plugins adds WebAR support — as we will see next.

Building Our AR Experience

This is where the real fun begins.

If you’d like to follow along, you will need a supported phone, as well as WebARonARCore (for Android users) or WebARonARKit (iOS users). You will also need to run a web-server and to serve your HTML files to the mobile device.

For development, I personally use live-server as a simple HTTP server with built-in live reloading, and then I connect my Android device through a USB cable, and expose my server to the device via the chrome://inspect, where I enable the Port Forwarding option for port 8080. Then, on my Android device, I simply need to open the WebARonARCore browser and go to http://localhost:8080.

Adding WebAR support is very easy — We need to load three.ar.js and aframe-ar, and add the ar attribute to our scene. Here is what the ar-enabled A-Frame “Hello world” program looks like now:

Apart from adding the two script tags at the top, and the ar attribute to our a-scene element (line 5) not much has changed — just some tinkering with positions / dimensions of the objects, and removing the a-sky tag which provided the background (if we use it here, it will hide the camera feed, rendering AR pretty useless).

And we get the same exhibition as before — a box, cylinder and a sphere, but this time, on top of the real world:

A-Frame AR Hello-World, right on my Desktop

You may have to walk around a bit with the phone and rotate it to various angles to find the virtual objects, but once you have found them, the will stay more-or-less fixed at the same position, and you will be able to look at them from all different angles.

So we can place objects in the real world, walk around them — but this gets boring really fast. How about adding some interactivity?

Animating and Interacting with the Augmented Web

First thing’s first: let’s replace the Hello World objects with something less static.

We’ll use an animated model called “Cesium Man”, which is readily available from the glTF sample repository. glTF is a proposed standard for 3D model format, using JSON. It can be easily loaded into A-Frame:

You can download the Cesium Man glTF model from here, or just put directly the link to the file in place of CelsiumMan.glb in the sample above (line 2).

The Cesium Man

Once you added it to the scene, you will probably notice the model is still static. This is because we haven’t told it to move yet!

This can be easily solved using the aframe-animation-mixer, which lets us activate and control the animations in our models. We just need to add the appropriate script tag at the top, andanimation-mixer attribute to our model’s a-entity tag (line 4)…

…and the small guy should start moving, just like in the GIF below.

Is Cesium Man walking on the screen, or “in the world?” That’s a part of the magic of AR — it’s hard to tell! (Though it is “in the world,” in case you were wondering :-)

Now, having a 3D-model walking in AR in quite an achievement already, but wouldn’t it be more fun if we could tell in where to go instead of having him walk in the same spot all day?

Finally we’ve reached the place where we can’t get away with just HTML anymore and need to add some JavaScript logic. This is where our code starts to become aware of the surroundings and their shape.

First, we will add a code to display a cursor which will indicate to the user whether we recognized a surface at the point he is looking at. We will start by gaining control of the camera:

<a-camera ar-raycaster raycaster cursor="fuse:false"> </a-camera>

and a small ring that will serve as a cursor:

<a-ring id="cursor" radius-inner="0.1" radius-outer="0.2"  
color="teal" position="0 -0.8 -5" rotation="-90 0 0">
</a-ring>

At this stage, then ring should appear just below the cesium man:

Next, we want to display the ring where the user is looking at, to indicate the target point where our Cesium man will go. For that, we will use a raycaster.

You were maybe wondering what that raycaster attribute that we used in the camera above was all about. Basically, a raycaster sends a virtual beam from a point on the screen and checks if it hits any object. In this case, we are using an AR raycaster — which will shoot a single beam from the center of your screen, and see if it hits any plane in the real world (as far as the phone could tell). If it does, we will get the exact coordinates of the point where the beam hit the plane in the virtual 3D world — which we will use as the position for the ring (and eventually as a target for the Cesium man to walk, but first things first…).

To rephrase — our raycaster will find where the center of the screen falls in the real world, and return it in the virtual coordinate system used by A-Frame so that when we place an object there, it will appear as it is in the exact same location as the center of the screen. This may not sound impressive — but you’ll start to appreciate the magic when you move the phone around and realize that the object you just placed stays at the same spot in the real world, just where you put it.

Basically, the code waits for A-Frame to load and set up the scene, and then the interesting part happens at line 5: we listen for Raycaster-Intersection events, and then whenever the virtual beam from the center of the phone screen hits a plane in the real world, we update the position to the cursor to the position of the hit (line 6).

When you run this code, start walking around, and you should see how the ring sticks to planes — floor, tables, etc.

The ring will be quite large — we set the outer radius to 0.2, and you may want to note that A-Frame uses Meters for units. Now that the position of the ring is synchronized with the physical world, it should have a radius of about 20cm (so a diameter of 40cm), no matter where you put it.

The 3D engine will take care of scaling it to keep it in proportion with the real world objects. So on my desk, this is what it looks like:

Next step — let’s move the Cesium Man to where the cursor is at, whenever we click on the screen. For that, we’ll add a click listener to the raycaster and update our model’s position to the last intersection point whenever the user taps the screen:

Now the Cesium man appears where the ring is (and becomes enormous) as soon as you tap the screen:

Before we go on, let’s make the cesium man smaller. If you using the development environment I recommended above, your phone is probably cabled to your laptop, so it’s much easier to have the objects small enough to fit on a desk. Change the cesium man scale to 0.15 0.15 0.15 and the radius sizes for the ring to 0.03 and 0.02 for the outer and inner radius, respectively. This should make the model around 20cm tall — still impressive, but small enough to move around your desktop.

So far we have an animated 3D-model, and we can teleport it around between different points in the real world. While I think teleportation is definitely cool, this model has a walking animation — so why not make it walk to the target position?

To achieve that, we will use A-Frame’s animation system, which lets us animate any attribute of our model. In this case, we will simply animate the position:

Basically, we use the firstTime variable to keep track whether it’s the first tap or not. For the first tap, we do as before — just set the position of the model to the raycaster’s intersection point (line 9). For any subsequent tap, we create an a-animation element, asking it to animate the position attribute to the target position (line 14). We have to call A-Frame’s stringify method, as the raycaster gives us the position as a JavaScript object with {x, y, z} properties, and a-animation only accepts strings.

Finally, on line 15 we set the duration of the animation to 5000ms (5 seconds), and then the easing function to linear (so that the walking speed doesn’t change throughout the animation).

When you try this code, you will notice that the walking speed actually depends on the distance — because we told the model to always move for a duration of 5 seconds, regardless of the distance. This can be improved by calculating the distance between the current position and the target position, and then setting the animation duration based on this value:

const currentPosition = walker.object3D.position;
const distance = currentPosition.distanceTo(target);
animation.setAttribute('dur', distance * 7000);

Now the animation finally starts looking somewhat realistic:

But ugh-oh, when you go back, cesium man is suddenly doing the moonwalk! (It’s not a bug, it’s a feature! I swear!)

This can be easily solved by telling the model to look at the target position before trying to go there:

walker.object3D.lookAt(target);

However, you will notice the now the model is always off by 90 degrees — that’s due to the orientation in which the model was saved. A easy fix would be to rotate the model by 90 degrees clockwise around the Y-axis. However, this rotation will be lost as soon as we call the lookAt method.

Instead, what I ended up doing is wrapping the walker inside another 
a-entity tag, so the walker is rotated by 90 degrees, and then the parent 
a-entity tag is the one that is actually animated and rotated to look at the target:

This technique is very common in A-Frame: if we want several objects to move together, or to apply a new transformation to an object without affecting the initial transform (just as the case is here — we don’t want to override the initial rotation, we want lookAt to rotate the object on top of the initial rotatation), we simply wrap it in an a-entity and then manipulate the wrapper instead.

Pretty neat, huh?

Before we finish with this example, I want to add one more thing so that our virtual objects looks somewhat more realistic — a shadow. For that to work, we’d need to add a plane that will receive the shadow, and we also want that plane to always be below the model, and be transparent (so you don’t actually see the plane, just the shadow that is cast on it).

Let’s start by adding the plane, and telling the model to cast a shadow, and the plane to receive it:

Since we’ve already wrapped the model with another a-entity above (to make our cesium man walk in a sensible direction), we can now add the new shadow plane as a sibling of the model, so they will move and rotate together as one unit.

At this point, you will see the shadow, but also, a big white plane:

But no worries! We’ve got another trick up our sleeves that will solve it.

A-Frame can’t do transparent planes with visible shadows out of the box, but luckily, A-Frame is built on top of a library called Three.js, and Three.js can do this for us. All we need to do is to extend A-Frame with a few lines of code and add the new functionality:

Make sure you put this code before your <body> tag, but after you load A-Frame. What it does, essentially, is register a new shadow-material attribute (called Component in the A-Frame’s jargon), which applies the Three.js ShadowMaterial to whatever element you put it on.

In order to use this newly defined material in our scene, we’ll just add the shadow-material attribute to our plane, as follows:

<a-plane width="0.5" height="0.5" position="0 0 0" 
rotation="-90 0 0" color="white" shadow="receive: true"
shadow-material>
</a-plane>

And this seals the deal: we have real-time shadows too (albeit a little pixelated, for performance reasons)!

Casting its shadow on the Raspberry Pi

You can find the final version of the code on my github, and you can also check it out online (using a WebAR capable browser on a supported device, otherwise you will just get a blank webpage). I personally think it is amazing you can build such amazing things with just 75 lines of code!

Bringing the Fun with You… Everywhere!

After building this is one weekend, I decided to create an improved version with new characters. I purchased animated Fox and Diplodocus (type of dinosaur) models, converted them to glTF (that was a challenge, too, but for another post), but they are cute so it was totally worth it.

I even took them for a nice walk in the park:

Out and about town:

And even to the beach!

Closing Thoughts

I found designing a user interface for AR pretty challenging — it is so much different from the standard web!

While shooting the movie, it took me some time to get used to the move-and-tap interface for moving the characters around, and also, as I didn’t want to clutter the screen with various UI elements, I decided to use voice control to select which character to control, as well as activate different animations. Luckily, we have a Web Speech API that was up to the task.

I invite you to take a look at the source code for the Fox + Dino demo. You will find it resembles what we built here — one major improvement being that the walking animation is only running when the animals actually move around, saving us some idle animation. You can also try it for yourself, if you have a supported device (and browser, as explained above).

Two things I wish to see happening soon in WebAR are the availability of more high quality, Web-ready animated 3D models, and more widespread support of the technology.

There are good news in both fronts: SketchFab, which hosts a repository of third party 3D models, had recently added a glTF export option that works pretty well. In addition, they have just announced a new market place, meaning that in addition to the free content that was available up until now, there’s more much more content you can get if you are willing to spend a few bucks. To save your time, here is a direct link to the search result page for animated, low-poly 3D models.

On the availability side, Google has announced they are working on making ARCore, the technology that powers WebAR on Android available on more than 100 million devices, so soon it won’t be limited just to the Pixel/Samsung Galaxy S8 devices. And if we, as Web developers, start building amazing experiences with WebAR, I’m sure we can make the case for including it in Firefox and Chrome sooner than later!