Building the ‘AR Cloud’: Part Three —3D Maps, the Digital Scaffolding of the 21st Century
Why British startup ‘Scape Technologies’ decided to 3D-map over 100 cities worldwide, across 5 different continents
Over the last year, our team have been quietly working to build a 3D map of the world. After announcing Scape’s ‘Vision Engine’ and an $8.5m seed round in January, today we can reveal we have succeeded in capturing our own version of Google’s Street View Imagery in over 100 cities around the world, spanning 5 different continents.
In an ambitious challenge, the team has captured over 2 billion images, over enough distance to travel the circumference of the earth twice, in just 12 months.
In this post, we will dive into why this imagery is important, how it will be used and why it will change the nature of computing forever.
A brief history of maps
Before diving in, it’s important to understand that every map starts with a baseline — an initial measurement, which serves as the foundations for the map to be expanded.
For example, in 150 BC, Greek Cartographer Ptolemy used logbooks from naval ships & infantry marches to calculate the scale and shape of European countries. This essentially formed the baseline to construct the first map of Europe.
In 1784, English cartographer William Roy painstakingly measured the distance between two points in London, to form the baseline for the first comprehensive map of Great Britain.
Fast forward to 1985, two engineers named Barry Karlin and Galen Collins drove around the Bay Area, using a Dictaphone to describe in detail everything they observed. In the absence of GPS, Karlin & Collins combined these observations with aerial photographs and vehicle sensor data to create a baseline for the first in-car navigation system. The company they formed would later be known as ‘Navteq’ which in 2007, was sold to Nokia for $8 billion. The company remains operational today, under the brand name ‘HERE’.
It’s the initial measurement or baseline, which acts as the initial point of reference and is crucially needed to jumpstart the scaling of maps.
The status quo of maps today
Nowadays, the world of maps is largely dominated by Google, thanks in a large part to their famous Street View project.
A project which initially began in 2007, Google Street View rapidly became a household phenomenon, allowing everybody from around the world to digitally roam the streets, one image at a time. However, Street View served another important purpose, which was to form the baseline for Google’s own maps, in a secretive project entitled ‘Ground Truth’.
By analysing Street View imagery using computer vision, Google was able to extract relevant details like house numbers and street names, helping them to build their own baseline or ‘Ground Truth’, eliminating their dependency on existing map-data provider ‘TeleAtlas’.
A new ‘baseline’ for 3D maps
Fast forward to 2030 and our cities will be saturated with a new class of devices that need to understand the world around them. For example, self-driving vehicles, augmented reality headsets, delivery robots, and drones, all need to interpret the physical environment with more detail than ever before.
We believe that the opportunity to do this lies within the camera, as we transition from cameras that take photographs, to cameras that can ‘see’.
For these camera devices to operate, we need a new class of map, one which is image-centric and can be used by devices to understand where they are & what’s around them. Consequently, we needed to create a new baseline, which could be used to jump-start the infrastructure that allows devices to see.
100 cities captured in one year, over tens of thousands of miles
Beyond Google, only a handful of companies, have ground level-imagery at the quality and density that is required for this new type of infrastructure. Due to the cost of acquiring this imagery, combined with the exponentially-increasing ability to extract insights (and therefore value) from these photos, no company was willing to share these with us.
We, therefore, took it upon ourselves to capture our own street-level imagery in 100 cities, within one year.
Some of the cities include London, Manhattan, San Francisco, Rio de Janeiro, Sydney, Paris, Moscow & Tokyo.
Unlike Google, who capture the majority of their Street View images using fleets of vehicles, we achieved our mapping efforts by mobilising local teams on the ground, equipped with cameras. This approach has allowed us to capture our own imagery in a fraction of the time & cost.
How the imagery will be used
The street-level images will be processed by Scape’s ‘Vision Engine’ — our proprietary world-scale mapping pipeline that builds and references three-dimensional HD maps from ordinary images, in the cloud.
Within mapped areas, any device will be able to determine its location and orientation with cm-level precision, thanks to a set of APIs, powered by computer-vision.
The initial maps serve as our own ‘baseline’ for infinitely-scalable camera-based mapping and location services around the world, kick-starting the infrastructure that will enable the devices of tomorrow.
An important benefit
In addition to providing camera-based location services in new cities, the imagery also serves an important secondary purpose.
Our research teams have been able to significantly improve upon state-of-the-art methods for recognising a location using deep learning and computer vision, thanks (in part) to the imagery captured. The imagery allows our algorithms to learn what cities around the world look like and what makes them unique. It means regardless of time of day, weather, season or location, camera devices can be located more accurately & more robustly than other methods to-date.
We’re tremendously excited to share some of this research at this year’s ‘CVPR’ conference, the largest conference for computer vision. The research will be presented alongside Scape’s new ‘benchmark dataset’, which we announced earlier this week. The dataset, entitled ‘SILDa’ (Scape-Imperial Localization Dataset), was created in collaboration with Imperial College London and can be used by other companies & academics to measure and assess their own methods for image-based location recognition.
In addition to our research papers, we will also be running two workshops, ‘Image Matching: Local Features & Beyond’ and ‘Long-Term Visual Localization under Changing Conditions’ alongside legendary researchers from Google & Microsoft, to help spur efforts into these important areas of research.
If you are keen to learn more about the work we are doing at Scape or what you can do to partner with us, please, get in touch.
Additionally, if you are interested in learning more about our research projects, would like to collaborate, or would like to join the team, reach out to firstname.lastname@example.org
Edward is co-founder & CEO of Scape Technologies, a computer vision startup in London, working to build a digital framework for the physical world.
Interested to learn more about Scape Technologies?
We send a newsletter every couple of months, making sense of the AR industry and sharing our progress.
Sign Up to our newsletter.