Vassileios Balntas
Apr 2 · 5 min read

Over the last year, the Scape Technologies team has been working on several problems related to large-scale visual localization while building Scape’s Vision Engine. Today, and in subsequent posts, we will discuss a few of the exciting things that we have been working on.

We start with describing the Scape-Imperial Localization Dataset (SILDa), a dataset which was created in collaboration with Imperial College London, and is suitable for evaluating several aspects related to large-scale visual localization.

Visual Localization

To equip machines with the ability to make intelligent decisions on a global scale, they need to know their exact position in the world. Scape Technologies has been working on creating a localization system, that unlike GPS, uses visual information to infer the exact position and orientation of the device that was used to capture the image. Since visual scenes are ever-changing due to the effect of weather, seasons or time of day, it is clear that such methods need to be able to correctly estimate positions and orientations across significant visual variations.

Visual localization enables devices to understand their exact position and orientation by using visual cues.

Benchmarking Visual Localization

One of the most important tools when working on novel and challenging problems is the availability of benchmarking methods for analysing progress. However, this is non-trivial, especially in cases where the data are sparse, and the evaluation protocols are not well defined.

Recent work from academics and researchers from institutes including ETH Zurich, Oxford Robotics Institute and Tokyo Institute of Technology, presented in CVPR 2018, introduced a benchmark for Evaluating Visual Localization in Changing Conditions. Nonetheless, it focused on high-quality images, and it was solely concerned with benchmarking the singular task of camera pose estimation, whereas our goal is to evaluate several aspects related to visual localization.

We collected a large quantity of data with a low-end, consumer spherical camera over a period of 1 year from the area around Imperial College London. This dataset includes images taken in rain, snow, evening, day, night, winter, summer, blue sky and clouds. Well, mostly clouds, since this is London.

The Scape-Imperial Localization Dataset (SILDa) contains images across several conditions, and can be used to evaluate several tasks related to visual localization.

There are numerous factors that might be potentially useful for large-scale localization, including image retrieval, aerial to ground view image matching, and building classification. Most of these methods have not been thoroughly explored in the context of localization. To that end, in SILDa, we focus on evaluating several such related tasks.

Local Patches

Local features are a very important element in the localization pipeline since they allow us to find areas in images that are easily identifiable and repeatable across visual changes. After the important local areas are detected, small patches are extracted around them to create a unique identifier, similar to a barcode.

Detecting the same feature point across different conditions and times.

By detecting a large set of such features across several images, we are able to create a patch dataset which outnumbers the commonly used patch datasets in academic evaluations and contains significantly more challenging scenarios. This will allow researchers to evaluate their methods and provides a way to understand how local regions of images are observed given different illumination and weather conditions.

Example of a set of patches extracted around a feature point across different images.

Image Matching

Matching a pair of images using local features allows us to recover the relative camera pose between the two frames. This is one of the most crucial steps towards building a large scale SfM and localization pipeline. We provide a benchmark for researchers to test their methods across a large set of image pairs with significant variations in conditions and difficulty.

Example of the challenging image matching pairs that can be found in SILDa.

Building Recognition

Traditional pipelines heavily rely on brute-force feature matching for localization. One unexplored area is the potential to use additional cues such as building recognition to allow for a more contextual and semantic understanding of the scene, to take part in the localization process. To that end, we provide labels for all buildings visible in each of the individual images in SILDa.

Building recognition across seasons and weather.

Aerial to Ground Image Matching

Another unexplored area is the potential to combine aerial images with ground images, for more accurate map building and even localization. To that end, we provide pairs of landmark images, with each pair consisting of one ground image view and one aerial image view.

Examples of pairs of ground view and aerial view images for 3 landmarks.

Global Image Retrieval

Image retrieval has recently been examined in the context of visual localization, where it showed promising results. In retrieval methods, every image is assigned with a “barcode” (global descriptor) that is used as a reference when searching for similar images in the database. Using this “barcode”, we are able to quickly retrieve the most similar images to any given query.

We provide a benchmark to evaluate the retrieval success, which contains significant variations in the global scenes, something that makes it quite challenging.

A query image (left) with some positive retrieval images that are taken from the same place from the available dataset.

CVPR 2019 Workshops

Two of the tasks that are defined in SILDa, are parts of challenges that are organised across two CVPR 2019 workshops.

The workshops are organised in collaboration with several researchers from academia and industry. Please refer to the individual workshop websites for details with respect to the challenges, invited talks and prizes.

If you are keen to learn more about the work we are doing at Scape or what you can do to partner with us, please, visit our website or email us.

Additionally, if you are interested in learning more about our research projects, would like to collaborate, or would like to join the team, reach out to

Vassileios Balntas leads the research team at Scape Technologies, which focuses on several problems related to large-scale visual localization.

Follow Vassileios and the company on Twitter.

Interested to learn more about Scape Technologies?

We send a newsletter every couple of months, making sense of the AR industry and sharing our progress.

Sign Up to our newsletter.

Scape Technologies

Scape Technologies is building a cloud-based ‘visual engine’ that allows camera devices to understand their environment, using computer vision.

Thanks to Edward Miller and Huub Heijnen

Vassileios Balntas

Written by

Scape Technologies

Scape Technologies is building a cloud-based ‘visual engine’ that allows camera devices to understand their environment, using computer vision.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade