SILDa: A Multi-Task Dataset for Evaluating Visual Localization
Over the last year, the Scape Technologies team has been working on several problems related to large-scale visual localization while building Scape’s Vision Engine. Today, and in subsequent posts, we will discuss a few of the exciting things that we have been working on.
We start with describing the Scape-Imperial Localization Dataset (SILDa), a dataset which was created in collaboration with Imperial College London, and is suitable for evaluating several aspects related to large-scale visual localization.
Visual Localization
To equip machines with the ability to make intelligent decisions on a global scale, they need to know their exact position in the world. Scape Technologies has been working on creating a localization system, that unlike GPS, uses visual information to infer the exact position and orientation of the device that was used to capture the image. Since visual scenes are ever-changing due to the effect of weather, seasons or time of day, it is clear that such methods need to be able to correctly estimate positions and orientations across significant visual variations.
Benchmarking Visual Localization
One of the most important tools when working on novel and challenging problems is the availability of benchmarking methods for analysing progress. However, this is non-trivial, especially in cases where the data are sparse, and the evaluation protocols are not well defined.
Recent work from academics and researchers from institutes including ETH Zurich, Oxford Robotics Institute and Tokyo Institute of Technology, presented in CVPR 2018, introduced a benchmark for Evaluating Visual Localization in Changing Conditions. Nonetheless, it focused on high-quality images, and it was solely concerned with benchmarking the singular task of camera pose estimation, whereas our goal is to evaluate several aspects related to visual localization.
We collected a large quantity of data with a low-end, consumer spherical camera over a period of 1 year from the area around Imperial College London. This dataset includes images taken in rain, snow, evening, day, night, winter, summer, blue sky and clouds. Well, mostly clouds, since this is London.
There are numerous factors that might be potentially useful for large-scale localization, including image retrieval, aerial to ground view image matching, and building classification. Most of these methods have not been thoroughly explored in the context of localization. To that end, in SILDa, we focus on evaluating several such related tasks.
Local Patches
Local features are a very important element in the localization pipeline since they allow us to find areas in images that are easily identifiable and repeatable across visual changes. After the important local areas are detected, small patches are extracted around them to create a unique identifier, similar to a barcode.
By detecting a large set of such features across several images, we are able to create a patch dataset which outnumbers the commonly used patch datasets in academic evaluations and contains significantly more challenging scenarios. This will allow researchers to evaluate their methods and provides a way to understand how local regions of images are observed given different illumination and weather conditions.
Image Matching
Matching a pair of images using local features allows us to recover the relative camera pose between the two frames. This is one of the most crucial steps towards building a large scale SfM and localization pipeline. We provide a benchmark for researchers to test their methods across a large set of image pairs with significant variations in conditions and difficulty.
Building Recognition
Traditional pipelines heavily rely on brute-force feature matching for localization. One unexplored area is the potential to use additional cues such as building recognition to allow for a more contextual and semantic understanding of the scene, to take part in the localization process. To that end, we provide labels for all buildings visible in each of the individual images in SILDa.
Aerial to Ground Image Matching
Another unexplored area is the potential to combine aerial images with ground images, for more accurate map building and even localization. To that end, we provide pairs of landmark images, with each pair consisting of one ground image view and one aerial image view.
Global Image Retrieval
Image retrieval has recently been examined in the context of visual localization, where it showed promising results. In retrieval methods, every image is assigned with a “barcode” (global descriptor) that is used as a reference when searching for similar images in the database. Using this “barcode”, we are able to quickly retrieve the most similar images to any given query.
We provide a benchmark to evaluate the retrieval success, which contains significant variations in the global scenes, something that makes it quite challenging.
CVPR 2019 Workshops
Two of the tasks that are defined in SILDa, are parts of challenges that are organised across two CVPR 2019 workshops.
The workshops are organised in collaboration with several researchers from academia and industry. Please refer to the individual workshop websites for details with respect to the challenges, invited talks and prizes.
If you are keen to learn more about the work we are doing at Scape or what you can do to partner with us, please, visit our website or email us.
Additionally, if you are interested in learning more about our research projects, would like to collaborate, or would like to join the team, reach out to research@scape.io
Vassileios Balntas leads the research team at Scape Technologies, which focuses on several problems related to large-scale visual localization.
Follow Vassileios and the company on Twitter.
Interested to learn more about Scape Technologies?
We send a newsletter every couple of months, making sense of the AR industry and sharing our progress.
Sign Up to our newsletter.