Building the ‘AR Cloud’: Part Three —3D Maps, the Digital Scaffolding of the 21st Century

Why British startup ‘Scape Technologies’ decided to 3D-map over 100 cities worldwide, across 5 different continents

Edward Miller
Apr 17, 2019 · 6 min read

Note: This is part three of a multi-part series on what we believe is THE biggest and most exciting challenge in computing today. Read part one and two of the series on Medium.

Over the last year, our team have been quietly working to build a 3D map of the world. After announcing Scape’s ‘Vision Engine’ and an $8.5m seed round in January, today we can reveal we have succeeded in capturing our own version of Google’s Street View Imagery in over 100 cities around the world, spanning 5 different continents.

In an ambitious challenge, the team has captured over 2 billion images, over enough distance to travel the circumference of the earth twice, in just 12 months.

In this post, we will dive into why this imagery is important, how it will be used and why it will change the nature of computing forever.

A brief history of maps

Before diving in, it’s important to understand that every map starts with a baseline — an initial measurement, which serves as the foundations for the map to be expanded.

For example, in 150 BC, Greek Cartographer Ptolemy used logbooks from naval ships & infantry marches to calculate the scale and shape of European countries. This essentially formed the baseline to construct the first map of Europe.

In 1784, English cartographer William Roy painstakingly measured the distance between two points in London, to form the baseline for the first comprehensive map of Great Britain.

Fast forward to 1985, two engineers named Barry Karlin and Galen Collins drove around the Bay Area, using a Dictaphone to describe in detail everything they observed. In the absence of GPS, Karlin & Collins combined these observations with aerial photographs and vehicle sensor data to create a baseline for the first in-car navigation system. The company they formed would later be known as ‘Navteq’ which in 2007, was sold to Nokia for $8 billion. The company remains operational today, under the brand name ‘HERE’.

It’s the initial measurement or baseline, which acts as the initial point of reference and is crucially needed to jumpstart the scaling of maps.

The first product from Karlin & Collins, Inc (later known as Navteq) was a coin-operated kiosk called ‘DriverGuide’. The kiosk provided turn-by-turn navigation instructions and was sold to car rental agencies and hotels for $12,000.

The status quo of maps today

Nowadays, the world of maps is largely dominated by Google, thanks in a large part to their famous Street View project.

A project which initially began in 2007, Google Street View rapidly became a household phenomenon, allowing everybody from around the world to digitally roam the streets, one image at a time. However, Street View served another important purpose, which was to form the baseline for Google’s own maps, in a secretive project entitled ‘Ground Truth’.

By analysing Street View imagery using computer vision, Google was able to extract relevant details like house numbers and street names, helping them to build their own baseline or ‘Ground Truth’, eliminating their dependency on existing map-data provider ‘TeleAtlas’.

A new ‘baseline’ for 3D maps

Fast forward to 2030 and our cities will be saturated with a new class of devices that need to understand the world around them. For example, self-driving vehicles, augmented reality headsets, delivery robots, and drones, all need to interpret the physical environment with more detail than ever before.

We believe that the opportunity to do this lies within the camera, as we transition from cameras that take photographs, to cameras that can ‘see’.

For these camera devices to operate, we need a new class of map, one which is image-centric and can be used by devices to understand where they are & what’s around them. Consequently, we needed to create a new baseline, which could be used to jump-start the infrastructure that allows devices to see.

100 cities captured in one year, over tens of thousands of miles

Beyond Google, only a handful of companies have ground level-imagery at the quality and density that is required for this new type of infrastructure. Due to the cost of acquiring this imagery, combined with the exponentially-increasing ability to extract insights (and therefore value) from these photos, no company was willing to share these with us.

We, therefore, took it upon ourselves to capture our own street-level imagery in 100 cities, within one year.

Some of the cities include London, Manhattan, San Francisco, Rio de Janeiro, Sydney, Paris, Moscow & Tokyo.

The dashboard we use to visualise the imagery we have captured

Unlike Google, who capture the majority of their Street View images using fleets of vehicles, we achieved our mapping efforts by mobilising local teams on the ground, equipped with cameras. This approach has allowed us to capture our own imagery in a fraction of the time & cost.

How the imagery will be used

A high-level schematic of Scape Technologies’ pipeline

The street-level images will be processed by Scape’s ‘Vision Engine’ — our proprietary world-scale mapping pipeline that builds and references three-dimensional HD maps from ordinary images, in the cloud.

Within mapped areas, any device will be able to determine its location and orientation with cm-level precision, thanks to a set of APIs, powered by computer vision.

The initial maps serve as our own ‘baseline’ for infinitely-scalable camera-based mapping and location services around the world, kick-starting the infrastructure that will enable the devices of tomorrow.

An important benefit

In addition to providing camera-based location services in new cities, the imagery also serves an important secondary purpose.

Our research teams have been able to significantly improve upon state-of-the-art methods for recognising a location using deep learning and computer vision, thanks (in part) to the imagery captured. The imagery allows our algorithms to learn what cities around the world look like and what makes them unique. It means regardless of time of day, weather, season or location, camera devices can be located more accurately & more robustly than other methods to-date.

We’re tremendously excited to share some of this research at this year’s ‘CVPR’ conference, the largest conference for computer vision. The research will be presented alongside Scape’s new ‘benchmark dataset’, which we announced earlier this week. The dataset, entitled ‘SILDa’ (Scape-Imperial Localization Dataset), was created in collaboration with Imperial College London and can be used by other companies & academics to measure and assess their own methods for image-based location recognition.

In addition to our research papers, we will also be running two workshops, ‘Image Matching: Local Features & Beyond’ and ‘Long-Term Visual Localization under Changing Conditions’ alongside legendary researchers from Google & Microsoft, to help spur efforts into these important areas of research.

If you are keen to learn more about the work we are doing at Scape or what you can do to partner with us, please, get in touch.

Additionally, if you are interested in learning more about our research projects, would like to collaborate, or would like to join the team, reach out to

This article is part three of a series on building the AR or ‘Machine Perception’ Cloud. Read part one and two the series on Medium.

Edward is co-founder & CEO of Scape Technologies, a computer vision startup in London, working to build a digital framework for the physical world.

Follow Edward and the company on Twitter here.

Interested to learn more about Scape Technologies?

We send a newsletter every couple of months, making sense of the AR industry and sharing our progress.

Sign Up to our newsletter.

Scape Technologies

Scape Technologies is building a cloud-based ‘visual…

Thanks to Huub Heijnen and Grace Gimson

Edward Miller

Written by

Co-Founder Scape Technologies, building a digital framework for the physical world

Scape Technologies

Scape Technologies is building a cloud-based ‘visual engine’ that allows camera devices to understand their environment, using computer vision.

Edward Miller

Written by

Co-Founder Scape Technologies, building a digital framework for the physical world

Scape Technologies

Scape Technologies is building a cloud-based ‘visual engine’ that allows camera devices to understand their environment, using computer vision.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store