Real Time Streaming within Augmented and Virtual Reality

Published in

Agora.io

14 min readJun 13, 2019

In a recent post, I referenced real-time streamed AR experiences as being something that would happen in the future, but that future is not waiting until tomorrow (blame Moore’s law), it’s coming later this afternoon.

Augmented Reality has matured significantly over the last few years and the AR landscape has recently experienced some tectonic shifts¹ which make it clear the demand for realtime shared experiences within immersive virtual realities.

In this post I will take a look beyond just AR and more broadly at at all immersive technologies and how realtime virtual experiences will change the way we interact with each other, in the physical world.

What is XR?

Now is a good time for us to agree on one key term, in this post XR refers to Extended Reality. Think of Extened Reality as the blanket term that covers all immersive technologies (AR/VR/MR) regardless of the level of interaction with the physical world. There are many articles (forbes, medium) that break down the intricacies of each (AV/VR/MR).

image source: https://www.forbes.com/sites/quora/2018/02/02/the-difference-between-virtual-reality-augmented-reality-and-mixed-reality/#529111c62d07

What is a Shared Experience?

One more key concept we need to align on, what is a shared experience? A shared experience is when multiple devices (read: users) are viewing and interacting with the same XR experience, either co-located or remotely.

What does that even mean? Shared XR means, two or more users can view/interact (share) with same virtual content at the same time and both see the updates in realtime. Think of it as playing a multiplayer game or working on a shared Google Doc.

Given the immersiveness of XR when applied to collaborative environments it can make miles feel like millimeters.

State of the (XR) Union

Let’s take a moment to reflect on how far we’ve come and get a lay of the landscape. Its hard to believe that ARKit has reached its 3rd iteration with built in body/pose tracking, occlusion, and support for AR headsets. ARCore is supported by more and more Android devices everyday, and Google’s Cloud Anchors have made the cross platform jump (available on iOS and Android). HoloLens and Google Glass are both launching their second iterations. Magic Leap continues to invest in it’s ecosystem, with its investment fund and recent aquisitions. Lastly we can’t forget about the in-roads that Web-based AR has made with its recent break-throughs to the commercial mainstream.

image source: https://www.bbc.com/news/technology-47350884

Let’s start with the wearables, HoloLens 2 recently launched with improved specs, a wider field of view, and a focus on enterprise customers. Following in a Microsoft’s footprints, the new Google Glass 2.0 has also been announced and is for enterprise customers only. Both major platforms leaving bedroom developers and aspiring entrepreneurs to look elsewhere for their mixed reality headset needs. Don’t worry because Nreal and Epson have both launch sub $500 mixed reality headsets.

Moving on, last year Apple announced .USDZ which could be embedded into web pages to display web content in an AR viewer, but this year they were pretty silent. I would venture to say that may change soon, given the recently announced AR updates coming to Google’s search platform. Google is using the open-source .glTF format to embed 3D assets into its search results, and even offering users the abilty to view them in AR. This would work similar to Apple’s AR QuickView within Safari, but its using open-standards (ties back to our earlier XRCloud limitations).

As part of this initiative Google has partnered with NASA, New Balance, Samsung, Target, Visible Body, Volvo, and Wayfair to help expand its 3D content. I would keep an eye out for these the next time you Google something.

The real breakthroughs aren’t coming just from the Goliaths of the tech world. 8th Wall’s Web-based AR provides a robust SDK/API that supports (industry leading) browser based SLAM and recently added support for Image Targets (read: marker images). 8th Wall should watch its back though, Zappar recently started dropping seeds about their WebAR solution (beta); once launched will make Zapworks Studio the only platform that allows developers to build AR experiences across Native and Web platforms.

Magic Leap recently acquired Mimesys (one of the startups from within its ecosystem). This acquisition shows Magic Leap not only understands the importance of collaborative virtual realities, but is actively investing in it. Mimesys specializes in building solutions enabling co-presense within XR experiences. Unlike most other solutions that use a 3D avatar, Mimesys uses volumetric scans of remote participants instead of Avatars within the Virtual Space to allow users to feel co-located during collaborative activities.

They are making holographic communications a reality, think RD2D’s Leia holographic recording but in real time and with people you know.

Last and most importantly (to this post) we need to mention Google’s recent announcement of Stadia. What better indicator that the future digital world will be streamed. For those that have been using DuckDuckGo or living under a rock, Stadia is a new Video Gaming platform that is device agnostic so it works across web, native mobile, and TV (via Chromecast).

image source: https://finance.yahoo.com/news/google-details-stadia-ahead-apple-202300547.html

The premise for the entire platform is, internet speeds have gotten so good (in certain countries) that Google believes that it can launch an entire Video Game Ecosystem based on completely streamed games (think Steam but without the downloads).

The gaming industry is one of the front-runners of leveraging and disseminating new technologies (maybe second to porn). But seriously, Stadia is going to disrupt the entire video gaming industry so keep an eye out for it this Fall (when it launches).

The #XRCloud

Most XR apps (think FB/Insta and Snapchat) consist of single device experiences, where many people can interact with the experience but they need to be sharing a single device. Of these solo XR apps, they are available in two forms: pre-loaded or cloud-based.

When shipping an app filled with assets and content for the experience pre-loaded, you run into bloated file sizes which lead to long download times. Not to mention, every time you want to update the experience users will have to download a new version of the app. App size matters (don’t let anyone tell you different), and when size and flexibility matter (get your head out of the gutter), developers turn some sort of cloud-based asset delivery.

So we have these virtual experiences experiences accessible via cloud delivery. Sounds familiar, does that mean there’s an XR cloud? Yes, and the XRCloud is the same as any other “Cloud”, as in it consists of many smaller clouds (micro-services) which exists in the form of many layers: Network (Agora.io), Processing (Google Vision), Storage (CMS — Zappar/Scape.io), etc.

Cloud Delivery pain points

Most major XR platforms use a cloud based CMS that downloads and launches the experience based on some entry point (link, marker, face, qr-code, etc). This trigger prompts the app to make a request to the backend system to send the assets and code. The problem with this implementation is, the client app has to download the entire experience prior to launching it.

For this reason platforms limit the size of the experience, to keep things loading quickly. At Blippar we were delivering a variety of experiences, sometimes with thousands of assets sometimes. Projects ranged from playing recipe videos on a pasta box, to time-trial races up 3D replicas of mountains, all the way to full on Apps; so we had campaigns that ranged from >1mb upwards of 85mb or greater, so we had to get creative with loading.

Why are download times such a big deal? As it turns out there are some pretty impactful numbers behind why size matters and getting the balance right is crucial to the success of your AR experience. During my time at Blippar, we realized there is ~50% drop off for any experience that takes over 30 seconds to load (download and initialize) and another ~75% churn from those users that stuck it out for the initial interaction.

Let’s unpack that for a moment. That means a percieved long download time can lead to losing up to 90% of your audience, leaving roughly 10% of users that will re-engage. So now beyond having to somehow get someone to download an app, in order to keep the user your app needs to load fast?!?!

It sounds like such an onerous task, why should we even try? Turns out, if you get it right, the experience can see up to 3x engagements per user, with 2x the dwell time. This entire process screams (or more like begs) for realtime streaming.

With this next argument I need to be conscious of walking the line because what I’m about to say next can go down the rabbit hole, so to speak.

As the XR industry matures we will see expansion/growth of the XRCloud but currently we are missing key infrastructure and standards needed to support and facilitate adoption to the levels of the general internet.

One major thing to note, unlike the current “Internet” infrastructure where everything shares common standards for its content markdown (html/css/js) but within AR this doesn’t exist. Each platform has its own SDK, APIs, and even in some cases model formats.²

Let’s think about this for a moment, and put this in a context that we are all familiar with, such as web-pages and web-browsers. Let’s assume that each web-browser could only understand web-pages that were written specifically for it. Wait, what?!?

Yeah, in the XR world native Cloud delivery is like that. Each company that produces an SDK/API for building in AR has their own function names, their own file formats, everything is proprietary and different. Not to mention what happens when one of these startups gets bought or worse falls into administration?

For example, Blippar fell into Administration and every customer went on a mad dash to figure out how to pull their content off the Blippar CMS. Only to later come back from beyond. Can’t worry about smaller companies because even big companies shut down entire products (pointing finger directly at Alphabet/Google).

Even though that may sound like a fairly standard issue when dealing with cloud environments, you have to keep in mind that because of the lack of standardization its a heavy lift to migrate from one XR Cloud provider to another.

All of this has essentially created many siloed virtual worlds, where each “world” is only accessible if the developer implements the specific XRLens. Web based AR (AR.js/8thWall/Zappar)is looking to change that by allowing users to use any lens (browser) they want.

In an effort to avoid the discussion over the infinite/limitless parallel Virtual worlds/universes and the what-ifs of the world, I’ll leave you with one final thought on the subject.

One could strongly argue that the internet/world-wide-web did not truly gain traction until after Tim Bernards Lee created HTML and the Web Browser, and everyone had a common way to post and interact with information on the internet. Unless the XR community can better standardize³ components of the frontend, backend, and delivery, we will not see a healthy diversification within the XRCloud landscape and it will struggle to realize its potential.

The Need to Stream!

With everything moving to the AR cloud and the release of 5G the demand for leaner apps with minimal load times which means greater reliance on edge computing and multi user synchronization will continue to increase. For immersive worlds to grow and scale to the size of the physical world, content and assets will need to be streamed.

Take for example the XR soccer game (pictured below). Researchers use AI to capture player movement, and projects it through a HoloLens for the audience.

image source: http://grail.cs.washington.edu/projects/soccer

In order for such experiences to be enjoyable and mainstream, the game has to be streamed. If multiple people are watching the same game it’s extremely important not to have different latencies as it would ruin the experience when users are co-located.

For such apps to become mainstream, they need to integrate with a real-time-network, such as the SD-RTN and SD-RTM platforms by Agora.io, to distribute that experience at scale with guaranteed latencies and without issues.

Shared Experiences focused Technologies

Earlier I stated that the XR streaming future is coming later this afternoon, but so far I’ve haven’t really highlighted anyone (aside from Mimesys) that’s doing collaborate live XR with a product in market now. This is for good reason, not a lot of companies have moved there yet so it presents us with a bit of a gap in the market for new companies to fill.

Let’s take a moment to discuss the Spatial AR, one of the (few) major platforms focused on shared XR experiences in realtime.

Spatial has partnered with Microsoft to bring collaborative realtime experiences to the Hololens.

This mixed-reality/collaboration endeavor is a big push for Microsoft. The company is adding a “Spatial Rooms” tab to its (Slack-like) Teams collaboration app. In Spatial Rooms, people can work on projects, customize the room, and come back to continue the work later. The meetings are also more inclusive now, because Spatial is enabling people to join a meeting via the web or smartphone. — Microsoft and Spatial building collaborative conferences

We also can’t forget to mention PTC’s Vuforia platform that also supports collaborative AR, though not nearly as robust as Spatial.

Another interesting app worth noting is Pharos. The app features collaborative XR leveraging Google’s ARCore Cloud Anchors API. Pharos offers users a multiplayer experience where they can interact with XR objects simultaneously with friends, making use of ARCore’s Cloud Anchors API. Even though they are promising cross-platform features, currently its only available on Android.

The last two apps I’d like to highlight are Roomful and VRJam. Both are leveraging real-time networks to provide their users with collaborative immersive experiences.

Roomful is an immersive conferencing solution. The team has built a 3D private social platform, where organizers of trade shows, conferences, summits, seminars, and webinars can create their own private networks with custom rooms, rules/memberships fees, etc.

One of the most important advantages of Roomful’s video conference platform is that all the participants can re-experience that content on demand. Even after the “event” has ended, users within Roomful can visit the Virtual Spaces/Rooms to re-expereince and engage with the content. This solves a huge pain point within video and IRL conferences.

VRJam is a virtual conferencing solution that blends Virtual Worlds with real-time motion capture, to power simultaneous live and virtual presentations. How does that work you ask? The VRJam team creates 3D avatars and environments that leverage motion capture and video streaming technology to capture the live presentation and live stream the audio, video and 3D motion capture to recreate the live event within a virtual world. No longer will you have to feel bummed out about missing a conference due to distance, now you can virtually experience it in an immersive environment.

Tools to start building today

All this talk of collaborative XR and the gap in the market, leads me to my next topic. What are the available solutions for building in XR?

Currently Apple, Google, and Microsoft provide native APIs for cloud-based spatial anchors. This means that all the major device manufactures have exposed APIs for developers to build shared XR experiences. There’s only one major downfall, all require the developer to follow complicated setup procedures and while it may do a lot to simplify the process it still leaves a fair amount of the heavy lifting to the developer.

For example, how do these native APIs scale when implemented? TBH that’s part of the heavy lifting you need to do as a developer. Let’s say you implement Apple’s multi user AR session, once you’ve established a connection between devices you have to maintain the session and account for all that intricate logic associated with synchronization at scale.

Have no fear because there are a number of 3rd party providers that can help with such solutions.

Want to implement volumetric video similar to Mimesys? Check out DepthKit, a Brooklyn based software provider that merges image data from DSLR cameras with depth sensor point-clouds to capture volumetric video.

Interesting in Face-filters similar to Snapchat or Facebook/Instagram, then check out Banuba; a cutting-edge computer vision lab, specializes in Face AR technologies and provides users with an AR SDK that brings the most immersive face filters, 3D masks, facial animation and AR beauty features to any app or website.

If you are looking to achieve functionality similar to Spatial, there’s Scape.io which provides SDKs to simplify building Geotagged AR experiences.

Update — May 11, 2020: Scape.io was purchased by FB in early 2020.

Unity has also launched a new signaling feature, but it’s only available within Unity. The feature was announced/debuted at GDC this year and sounds like an improvement to their previous product.

Let’s not forget Agora.io’s Realtime Messaging (RTM) SDK which simplifies the entire networking layer and is available across all platforms. Agora.io’s SDKs make the entire process easier to the point where you don’t feel any of the headaches related to scaling and synchronization.

Considerations

Now that we are all riled up and ready to build our shared XR experience. what are things we need to consider? When implementing shared experiences, one of the biggest considerations has to be scaling. I know i’ve mentioned this a few times in this post but any application that involves more than a single user needs to scale and synchronize gracefully.

The new SD-RTM from Agora will allow developers to push any data to all users in realtime with minimal latency.

Another issue when working with realtime applications is packet loss. Google researchers recently published a paper, showing how Ai can be used to recreate “in-between” frames given a set of start and end frames. This breakthrough will pave the way for solutions to the painful dropped frames shortcomings of UDP. This will take some time to hit the mainstream but keep an eye out for this technology.

But why wait for Google to develop this ai packet-loss technology from their white paper? You can leverage Agora.io’s platform that already has similar technology, patented proprietary algorithms for preventing degradation even at 60% pocketless.

Conclusion

Building realtime XR experiences may have become democratized, but all the best solutions do not exist and there is a lot of room for innovation.

With all the advancements we are seeing in native and emerging web-based capabilities, the foundation and supports for a more complex and amazing ecosystem exist, now it’s time for developers to step up to the plate.

image source: Microsoft Debuts Spatial Anchor AR Cloud for HoloLens 2 — WinBuzzer

About the Author

I am Hermes a Developer Evangelist for Agora.io, and a former engineer at Blippar. In my time at Blippar I had an opportunity lead the NY development team working with Blippar’s AR and Computer Vision products to create custom solutions for a variety of brands spanning every industry.

Footnotes

ODG filed for Bankruptcy, Meta is re-launching, Blippar was bought out of Administration, Nreal launched sub $500 AR glasses, and Google Glass is making a comeback.
Kronos Group are working to change that with the glTF & glb open standard formats.
OpenXR is trying to form the standard but still in a provisional state.