Global Relocalization: A Better GPS

Joe Boyle
8 min readJun 27, 2018

--

Since first appearing in a cellular phone twenty years ago, GPS has provided billions of dollars in value annually to mobile device users. With the maturing world of IoT coming online, the importance of GPS is increasingly evident. However, in the nascent world of Augmented Reality, while still foundational to AR Cloud infrastructure, GPS alone is not sufficient for the purposes of building AR. As discussed in my previous post on relocalization, emergent positional technologies are some of the most exciting AR developments in 2018. By combining legacy GPS with some specific techniques, we will have a worthy successor to GPS: global relocalization. This post will discuss what will be required to reach this point, laying out the relationship between these new highly accurate SLAM techniques and the adjacent global positioning system found in all our phones and tablets.

First, it’s important to understand the two types of positioning to be considered: local and global. Local positioning is a device’s understanding of its position in relation to the immediate environment (such as the room it’s in), whereas global positioning is a device’s understanding of its location in relation to the entire planet. As mentioned above, GPS alone itself misses the mark. Here are the deal-breakers:

  1. It doesn’t work well indoors or in built-up urban environments
  2. Even where it does work indoors, GPS based tech has a very poor understanding of device elevation and most 2D map apps fail to accurately represent multi-story indoor locations.
  3. At the best of times, the range of error in satellite-based GPS is complex and large; registered in meters, not centimeters.
  4. Orientation is not provided, so applications must rely on unreliable electromagnetic compasses for heading.

However, GPS is still useful to us in several ways. First, by extending the decimal precision of the highly developed coordinate systems (like ECEF or WGS84), we can build upon and extend decades of GIS software development.

Let’s break the journey toward global relocalization into two key techniques: AR feature mapping and georeferencing.

AR Feature Mapping

All contemporary AR apps work by building a 3D model of the surrounding environment. ARKit and ARCore do this using only a single RGB camera, though other systems also include beacons, markers, or IR depth sensing. Irrespective of sensor type, a model is made from a few different types of 3D model data including point clouds and meshes. For the purposes of this post, these varied digital representations of a given environment are treated interchangeably and boiled down to “AR features” or “feature maps”. The sections below outline how these points will be detected and used by AR applications.

Tracking Feature Points

Using existing ARKit/ARCore applications involves a fickle initialization process: your device will ask you to point the camera at large horizontal surfaces like a table top or the floor. After the user has waved their device around for long enough, the initialization will complete and the app will start up.

This demo video shows how ARKit views AR features in a scene.

The camera is searching for points of reference (called feature points) throughout an environment. High contrast surface transitions work well: a magazine on a coffee table, a knot of wood in the floorboards, where the couch meets the carpet, etc. With enough features, and by incorporating other sensor data, a device can begin to continuously infer its position and orientation with a high degree of accuracy.

In other words, at this stage of localization, the app only knows where it is in relation to where it was when it first started up. Since no two startup sequences are the same, object coordinates will be different at each start up of the app, which is why no persistence or multi-user is possible in the 2017 launch versions of ARKit or ARCore. The environment is unique to each user, device and every start up. Without agreement on a shared environment, these and similar applications will have limited capabilities with tracked features.

Sharing feature maps = Multi-user AR

The next step then, is to have devices come to agreement on a shared environment where feature points must be shared between them. Either device sends its feature points to the other, or they mutually share what they know and try to align the two maps into one coherent view of the world. This alignment process is challenging to say the least.

Additionally, instead of whimsically assigning the origin to wherever the device happened to be during startup, the devices must also come to consensus on a new shared origin and orientation for this new shared world.

Of the technologies listed in my previous post, this is what is provided by ARCore Cloud Anchors and ARKit 2.0’s ARWorldMap.

How devices share the feature map data also varies by implementation, but the usual options apply: local wireless networking or a cloud service. In the case of Google Cloud Anchors, they provide a cloud storage component for sharing their feature maps, but Google deliberately discards this data after just 24 hours. There is some speculation that this was a strategic decision to avoid the highly topical General Data Protection Regulation requirements, though this decision may have more to do with the completing the work involved in the steps outlined below.

Saving feature maps = Persistent AR

Once the industry can get two devices to agree on a local feature map, the next step will be to establish capabilities to save and reload feature maps, without the requirement to rescan a given space. This is the step with the most promise to make AR apps launch faster, but to also ensure it is possible to create persistent augmentations whereby virtual characters and objects can be reliably saved and reloaded into the same position.

Saving and loading these feature maps introduces new challenges: upon launching a persistent AR app, how will it know which feature map to use?

Also unclear, is how sensitive relocalization algorithms are to routine environmental changes such as closed doors, open windows, moving furniture, or even people. Relocalization algorithms would benefit tremendously from being able to ignore commonly “transient” objects using semantic segmentation, but in the meantime, we should expect performance to be poor in some environments. Ryan Hickman’s badly behaved virtual cat exhibits such challenges really well.

The most favorable solution to this problem of loading feature maps would be a system that just assesses and determines location all on its own, though it’s more likely we will see applications asking for some human intervention here; perhaps prompting users to select from a list of previously scanned locations.

Geo-indexing feature maps

A cloud service that stores feature maps and makes them available for users to reload at a later time was noted earlier, and although current GPS is not accurate enough to use directly for AR, it will be essential as a means to geo-spatially index feature maps in databases for ease of access and management. In other words, when creating and saving feature map data into a cloud service, the device’s current GPS should be saved with location data, allowing AR applications to load only those feature maps created nearby, and not those created on the other side of town .

Georeferencing Feature Maps

Turning local coordinates into global coordinates

We can add a great deal of accuracy to our devices by telling them exactly where each of these feature maps are in relation to the world, not by using legacy GPS, but by using a process called georeferencing. Georeferencing is the practice of aligning local and global maps so coordinates can be easily translated from local to global or vice-versa.

In this georeferencing tutorial provided by WRLD3D (a 3d map provider), we can see how to use QGIS, an open source tool that supports georeferencing assets onto known coordinates.

WRLD3D headquarters in Scotland

In their example, the building on the left is recognizable in both the architectural drawings and satellite imagery. By aligning points on both images, a global positional offset and rotation can be determined. This offset can then be used to determine the global coordinates for features only visible on the “local”, architectural drawing.

If we georeference a machine-readable AR feature map instead of an architectural drawing, one can see how it is now possible to translate the local 3D scene coordinates to global 3D coordinates. (This positional offset can also replace the less accurate GPS based data in the geospatial index.)

The accuracy of this approach is bound only by the accuracy of the local positioning system and the georeferencing process itself. Once this is complete, it is possible for each device to know its global position with centimeter accuracy.

Scaling Up

Above we outline some steps that require human intervention (such as selecting stored feature maps and later georeferencing them), though it’s possible we will see automated solutions to both problems in the future.

Autonomous drones map a 3d environment (2010)

Once the friction to collect and use these feature maps is reduced or eliminated, georeferenced feature maps can be used to georeference other feature maps, in a continuous generative process that “fills in the blanks” on unmapped areas of the world. It isn’t clear how long this could take, but given the GPS and camera capabilities already in every smartphone, crowd-sourced 3D mapping is a significant opportunity.

The AR Cloud Foundation has recognized this opportunity, and has started outlining a block-chain-based incentive scheme to reward users for scanning and sharing 3D map data.

Naturally, as is the case with any form of crowd-sourced data, the accuracy of the resulting maps should be suspect and subject to constant revalidation.

Looking forward

The first GPS enabled smartphone appeared in 1999. Today we can hardly imagine a phone without GPS, or the rich wealth of services made possible by GPS. It’s easy to take these services for granted, but in retrospect, it has taken almost twenty years to get where we are today. Looking forward, we can be sure of a few things:

  1. An improved accurate 3D version of the global map will surely be an ongoing process many years in the making.
  2. It will never be “finished” in exactly the same way today’s 2D map is never finished — a current map will require continuous scanning and rescanning.
  3. Globally localized mobile AR based use cases are possible right now in those spaces where the benefit provided can justify the cost of manual feature map selection and manual georeferencing.
  4. As this kind of map comes online, 3D map based indoor applications will explode in popularity and number.

Just like mobile applications, not every AR application will benefit from or require global relocalization — many amazing AR experiences will be had without bringing in the outside world. However, those experiences are distant from the inherent opportunity in AR: to make the computers in our pockets observing participants that understand where they are even more accurately than we do.

In the coming posts, issues surrounding AR privacy and security (specifically access control and rights management) will be introduced and analyzed.

--

--