Norkart’s appeal for an open AR-Cloud community

Creating the Open AR-Cloud

By Jan-Erik Vinje, Senior AR Architect and lead developer at Norkart AS and main contact for the Open-ARCloud initiative

Date: 1. March 2018

In this article Norkart AS, a medium sized Norwegian company of 160 employees, providing geospatial solutions and services will present the story behind what motivated us to start the informal Open-ARCloud initiative and hopefully inspire you to join in on the historic endeavor of creating an open AR-Cloud together with us.

Before we begin we would also like to thank professor Terje Midtbø from NTNU and Ori Inbar from Augmented World Expo and Superventures for joing the Open-ARCloud team at this early stage!

What is the AR-Cloud and why is Norkart so excited about it?

The core scaffolding/substrate of a shared AR experience where virtual objects can appear and persist in the same physical place for all users on all AR-platform is often referred to as the “AR-cloud”.

The internet and the web brought us an abstract space for the digital world, sometimes called cyberspace. Needless to say, the internet has transformed our society extensively on multiple levels.

The AR-cloud enables a new form of virtual space attached directly to the physical world around us. It might have the potential to fundamentally change the world.

In contrast to the abstract and dimensionless internet this new virtual space is superimposed or superpositioned over our familiar three-dimensional world but is not restricted by the limits of time and matter. Importantly it is programmable. One might call it the “Superspace”, while the experience could be called “Super Reality” or a shared persistent Augmented/Mixed Reality.

Its inception will enable new ways of interacting with each other, with information, with new forms of content, with our past and our future and the physical world around us. Superspace can be experienced as an integral part of our surroundings, allowing us to interact with it in a way that is more direct, intuitive and familiar to us. Since Superspace is also free of the limitations of time and matter it opens up infinite possibilities.

Reaching escape velocity — as an open community

Creating, maintaining and improving this AR-Cloud & SuperSpace will be a monumental undertaking.

Sure, the tech-titans of Silicon Valley and China such as Google, Alibaba, Facebook or Apple would be able to create their own version of an AR-Cloud and the Superspace. But it will likely be shaped more by their business models and competitive situation rather than the needs and desires of a wider community.

Small actors like ourselves, probably do not have sufficient human and capital resources to create an AR-Cloud in time to challenge the tech-titans. Alone in the darkness, we might lack the fuel to reach escape velocity and could face the prospect of being stuck on the ground.

We risk having to adapt to and getting locked into suboptimal and contrived AR-Cloud solutions provided by a few dominant corporations for many many years. That would not be good for a technology with such powerful transformative possibilities.

Speaking of the power of the AR-Cloud, we believe it is paramount that privacy of users are given priority, given the potential for the extreme invasiveness that such a technology could allow.

For this and other reasons, Norkart believes that the best path towards a great AR-Cloud for all is to create it together as a global community. That is why we started the informal Open-ARCloud initiative and started developing the Open-ARCloud website.

This is an invitation to bring your rocket fuel to make sure this collaborative moonshot can reach escape velocity. The time is now for a free-thinking vibrant Open-ARCloud community basing their work on open research, open standards, open source and open data to show the way and reach for the stars!

Core building blocks of an AR-Cloud system

The core requirements for an AR-Cloud allowing shared persistent AR content in “Superspace” is something like this:

  1. A cloud model: A continuously updated 1:1 virtual representation of the world accessible in the cloud.
  2. Methods for estimating position and orientation (Pose) of devices and virtual objects relative to fixed points in the physical world: There can be a great many methods for locating/registering the estimated position and orientation of AR Devices and virtual objects relative to the physical world. Most of such method will in all likelihood rely to a great extent on the cloud model.
  3. Standard formats: To enable users on different devices to collaborate in real-time they need to be able to share poses, virtual objects, observations and maybe sensor data using standard formats.

As far as we know there could be a multitude of practical ways to create these building blocks, many of whom ought to be tried out in parallel now in the early days. Exploring many options at once and sharing what we learn with each other would enable the community to identify the most promising approaches faster. We hope many people will join in on this effort with different experiences, perspectives, and ideas.

We believe that those who contribute to the endeavor of creating an open AR-Cloud at this point in history are going to be remembered as the pioneers of “Superspace”.

How is Norkart contributing?

As the company that started the informal Open-ARCloud initiative, we feel that we should share some of our practical experience, ideas, and perspectives, and provide some insights into the approach we are currently exploring in collaboration with NTNU (The Norwegian University for Science and Technology).

Early experiments

In 2017 Norkart implemented a groundbreaking R&D prototype app (BorderGO) for the Norwegian Mapping Authority (Kartverket) that used AR on a consumer-grade smartphone to show cadastral boundaries directly on the ground. The source code is available on GitHub:

The results were exciting and we achieved what we believe could be world leading precision of synchronization between the local AR coordinate system and geographical coordinates using only a consumer smartphone.

Arriving at GeoPose

At the core of how this app worked was the process of estimating the geographical position and orientation of the device, what we eventually named the GeoPose. Being able to use GeoPose to show geographical data directly in their real physical location with only a few centimeters of error showed a lot of promise for many interesting use-cases. However, our solution required complex manual calibration which would be a showstopper for most normal users.

Automatic for the people!

What is needed to reach every user is an automatic GeoPose. In Norkart Labs, being inspired by what has been done in the field of Autonomous cars, we started forming some ideas on a possible approach that could achieve this on smartphones.

Too long, didn’t read:

In essense, we think that using AI on the mobile AR-devices (such as smartphones) to detect different classes of objects around the user and then matching those observations with previous observations in a continuously updating cloud model will enable such a cloud service to provide the estimated GeoPose of the device.

The approach is already demonstrated to work

The AI-powered self-driving cars have to continuously sense the surrounding environment monitoring the locations and categories of the objects around them in real time. By matching what the cars “observe” with cloud models often referred to as HD-maps it is already possible to determine automatically with an accuracy of about 20 cm where the car is on the road and where it is heading. We are encouraged by the proven results with this approach but we also feel it is sad that many vital details of the research and the collected data are not shared openly with the world.

Why could this work well for mobile AR-devices?

The two main smartphone platforms both have a similar API for running neural net inference models in real-time on the phone. The latest models have specialized AI-processors. The next generation of Hololens will also get an AI-processor. It is pretty clear that AI on mobile devices is rapidly becoming more powerful.

Using the AI power of current mobile devices for object detection in conjunction with the GPS, compass and the AR coordinate system enables a valuable semantic understanding of the environment around the user but it also compresses the massive raw sensory data into a very sparse semantic description of the world. This is a win-win in terms of reduced mobile data traffic, cloud storage & computational load. The same time it is reducing storage and computational requirements to a small fraction compared to handling the raw sensor data, the semantic description is also many times more useful and valuable per bit.

Why could this scale well in the cloud?

As a thought experiment, we asked ourselves how many bytes would be needed to store each semantic object in such a cloud model.

A minimal description might require 56 bytes for the GeoPose, uncertainty and the semantic class. Adding a bit more metadata, such as timestamps of first and last observation, the number of observations and the 3d-bounding box would increase the space needed to about 144 bytes. In either way, one could maintain a live 3d representation of for example all the 20-million manholes in the USA with something like 1 to 3 terabytes of storage space with no compression. Adding 50–100 or more categories (trash bins, benches, road signs, light poles, etc) should still enable a semantic representation of most of the world in something in the order of magnitude of a petabyte (1000 terabyte). This might sound like a lot, but according to Domo, more than 2 500 petabytes of digital data is generated every single day. You could buy yourself a petabyte of ram for about 6 million USD given a price of about 6 USD per GB. A relatively modest datacenter could in theory host such an AR-Cloud service for the entire world at an insignificant cost per user. We don’t recommend centralizing such a service to one data center and one organization, we are just arguing that a basic semantic model based AR-Cloud solution should be achievable with relatively small resources.

What is our research plan?

At the end of 2017, we came up with the outline for how to set up a continuously running R&D program aimed at creating a system for automatic GeoPose. The program consists of 4 complementary parts that will benefit from the results of the other parts.

  1. Using AI on a mobile AR-device to detect constellations of know types of objects around the device and then send those observations to the cloud.
  2. A continuously improving cloud model aggregating observations from the devices, and providing estimates of the device GeoPose if a match is found.
  3. Real-time 3d visualization of both the observations from the devices and the cloud model along with “ground truth” geometry.
  4. Simulation of synthetic sensor data from a virtual AR-device at any time of day, any season and with the possibility to modify the world geometry at will. This will allow the research effort to test the performance of their most recent version in a wide variety of conditions in a much shorter time. We think a gaming engine like Unity or Unreal could work for this purpose. We found out at the CES 2018 that NVidia has developed something called AutoSim for a similar purpose aimed at R&D for Autonomous Car.

We will soon have people from Norkart and two master students from NTNU working on the first three parts. We currently have no plans for the simulation part but hope to get it started later this year.

As we learn from our research we look forward to sharing our results with this community, and we hope you will too!