Server Rendering for Augmented Reality

Cloud Rendering with Web Standards

Recently Chrome got a new feature which allows us to capture a MediaStream From a canvas, video or audio element. Not a lot of noise has been made about this, but I still think it matters a lot!

Why does it matter for augmented reality on the phone ? Because it allows to offload the rendering from the phone to a server. Your phone will display 3d which has been rendered on servers. But servers are much more powerful than phones, so we could vastly increase the quality of the rendering.

This post describes an idea. I am unlikely to be first to come up with it. I have not implemented nor tested any of this. Maybe it works, maybe it doesn’t :) I am just recording and sharing the idea. Maybe somebody will find it useful and try to do it.

Works on Any Augmented Reality System

Server rendering works the same way on any AR devices

It is interesting to know that this concept is not limited to AR.js, it could be done on other augmented reality systems e.g. tango or hololens.

Cloud Rendering on any Augmented Reality System

Process to Render Augmented Reality on Server

Here is the workflow of this idea.

  • Step 1: The phone does positional tracking using the camera and keep sending its location to the server (maybe via WebRTC data channel)
  • Step 2: The server runs a browser, and renders the 3d using the phone location. Presumably high quality 3d.
  • Step 3: The server canvas is captured via the new capture() API into a video and streamed back to the phone.
  • Step 4: The phone just displays the webcam video, and on top of it, the 3d rendering video from the server.

TA DA! We got cloud rendering on any Augmented Reality system!


Ok so we got the general concept… Now let’s look at the details :)

A Different Load Balance

The phone load is constant, and independant of 3d complexity.

If we render something big, only the server will be affected. And we can put a GTX 1080 in the server if we want. So we can vastly improve the 3d. The phone cpu/gpu/ram load will be constant no matter what. So it will work even on old phones.

Better Rendering For More Immersive AR

Photo realistic rendering increases immersion.

One area where we could gain a lot from this is in 3d realism. Obviously if we render photo realistic 3d, the augmented reality will be that much more convincing. See below, this sofa looks real ? Well, it is in augmented reality.

Nice Use Case: IKEA

This idea would fit nicely with a business like IKEA. They sell furniture, lots of it. So the prize tag can be high in some cases. Wild guessing: a whole kitchen could cost 5k USD, or you can refurnish your whole living room for 8k USD. This is significant money, so better be sure.

People would like to see the furniture in augmented reality directly in their home. Thus they will know how it looks before actually buying. They would feel a lot more comfortable to buy. So presumably may buy more furniture.

I can totally see a loving mother picking up furniture for the children’s room. Picking between various colors and models. Taking pictures to share with friends and family.

Influence of Server Cost

Server rendering implies to have a server with a GPU, those are not cheap. Say you got 100 simultaneous users and can run 5 users per server, you will need to run 20 servers. So what is the influence of server cost ?

  • Case 1: Obviously if you are IKEA, the server cost is considered negligible because the customer is expected to buy more furnitures. Furniture prize tags are going to easily cover the server cost.
  • Case 2: Another good case is the artistic installation, where you show augmented reality to visitors. Here, the server can simply be a PC in a corner. And if you already got a PC, it is free.
  • Case 3: On the other hand, if you are doing a game, it usually has a very low margin per user. It is about having thousands of users and make a few cents per user. But you would need as many servers as needed to serve those users. The server will eat your margin pretty fast. It doesn’t fit due to the low margin.

Server Rendering and Network Latency

With local rendering, we can start rendering as soon as the position has been detected. With remote rendering, once the position is detected, it is sent over the network to the server, which renders the scene, and streams it back over the network to the phone. The data exchanged over the network increases the latency of the augmented reality.

I have no miracle to fix this, just some directions towards which we can look.

Motion Prediction: We can reduce the perceived latency with motion prediction.

Dynamic Switching: Additionally when the phone is moving a lot, this is when the latency is visible, so in this case use local rendering with low latency, else use remote rendering with high latency but photo realistic 3d.

Reduce Network Distance: Put the server as close as possible to the phone: thus the network path between the 2 is smaller. If you can put them on the same wifi if your use case allows it.

Additionally, you can test the latency produced by the video encoding in this page by the excellent Sam Dutton.

Local and Server Rendering: Best of Both Worlds ?

So server rendering provides better 3d, but higher latency. But local rendering provides lower 3d, but lower latency. Maybe we can have a tradeoff to keep the best of both worlds.

  • When the phone is moving a lot, this is when the latency is the most observable, so we do the local rendering
  • When the phone is stable, this is when the latency is the least observable, so we do the server rendering


Blog post done, I am crazy about the idea. I am not sure how feasible it is. If you play with the idea, please share. I would love to hear about it :)

Contact me anytime @jerome_etienne