Discover the Power of AI-Based Landmark Detection and Recognition

OpenVINO™ toolkit
OpenVINO-toolkit
Published in
6 min readJan 26, 2023

Author: Raymond Lo, AI Software Evangelist, OpenVINO™ Toolkit

Now that we’ve introduced some basic AI inference software fundamentals like optical character recognition (OCR) and object detection, it’s time to build on those learnings and extend them to more real-world applications. In this post, we walk through building a landmark retrieval app to showcase some unique challenges on a much larger scale.

Landmark detection and recognition is an AI capability that automatically identifies different landmarks and can determine which parts of the landscape are landmark structures. The concept is similar to image classification but requires tens of thousands more image classes to train algorithms on, which can be extremely challenging.

The trick is that both “landmark” and “structure” are very generic terms here. The most frequent targets of landmark detection and recognition may be actual historical or artistic landmarks like monuments, buildings, or even paintings. But a landmark can be really any artificial structure, in any condition, from construction cranes to bridges and ruins.

And, in some cases, there may be different replicas of landmarks throughout the world. For instance, while the Eiffel Tower is in Paris, France, there is a half-scale replica of the Eiffel Tower in Las Vegas, Nevada. To ensure your application can detect the location accurately, this is where the learnings from OCR and object detection come in. You can use a combination of OpenVINO™ pretrained object detection and OCR models for image retrieval, size detection, and sound suppression to improve the accuracy of your application.

Be sure that your image classes come from careful, unbiased definitions of what a landmark is, and may look like. A training set containing only western European monuments, for example, may be of limited use to recognize monuments from Africa or Oceania, and the reverse is equally true.

Getting Started with Landmark Detection and Recognition

Despite its challenges, with the right software, landmark detection and recognition can actually be quite easy.

Let’s walk through an example straight from the Google API documentation for object recognition. The approach is generic enough that you can easily port it to other landmark detection and recognition libraries and APIs, like the Azure API, for example.

To use Google landmark detection and recognition services in your code, you must first get the corresponding credentials, which is a JSON file called cred.json, and save it to a location on your computer known by your program.

The second step, not mandatory but highly recommended, is to work inside a Python virtual environment without altering the rest of your system. The commands to create the right landmark detection and recognition environment for Google services and install the corresponding libraries inside it are as follows:

python3 -m venv google_cv
source google_cv/bin/activate
pip install --upgrade google-cloud-vision

At this point, writing a program that takes an image and prints out the landmarks it contains, and their location, is simpler than you may expect. The first thing to do is to load the “vision” libraries, and define a “client” that will talk with the Google cloud services for computer vision:

from google.cloud import vision
import io
client = vision.ImageAnnotatorClient()

After this initialization, it is possible to load the image (“tower.jpg” here) to analyze into that vision.Image client:

path = “tower.jpg”
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)

The commands necessary to submit that image to the Google servers, download their “response,” and extract from them all the metadata of interest (“landmarks_annotations”) are even simpler:

response = client.landmark_detection(image=image)
landmarks = response.landmark_annotations

The “landmarks” in the second line of code are an array, because the actual answer by Google is a JSON file with one entry for each match. Showing those matches to the user takes only two nested, very simple loops:

print('Landmarks:')
for landmark in landmarks:
print(landmark.description)
for location in landmark.locations:
lat_lng = location.lat_lng
print(‘Latitude {}’.format(lat_lng.latitude))
print(‘Longitude {}’.format(lat_lng.longitude))
Figure 1. Eiffel Tower and Jardins du Trocadéro from the Palais de Chaillot, Paris. By NonOmnisMoriar, licensed under CC BY-SA 3.0.

If the file “tower.jpg” is the one in Figure 1, the output will look like this:

Landmarks:
Trocadero Gardens
Latitude: 48.861596
Longitude: 2.2892
Eiffel Tower:
Latitude 48.8584
Longitude 2.2943

Here, there is more than one result because the Eiffel Tower appears behind another famous Paris landmark that is the Trocadero Gardens.

Landmark Detection and Recognition Applications Use Cases

Now that you know how to perform landmark detection and recognition, what types of applications can you build? Use cases and applications can be extremely diverse and range from:

  • Family memories: Landmark detection and recognition software may very quickly add value to billions of family photographs, by answering questions likeWhere on Earth is this place where Grandma brought me on vacation when I was only two years old?”
  • History studies: Automatic landmark detection and recognition may help historians reconstruct the evolution of a specific building or whole neighborhood, by recognizing that landmark, inside thousands of pictures of the same area, taken from all possible sides and angles over many decades.
  • Landmark discovery: Landmarks already present in millions of pictures may be easy to recognize, but what about hidden or forgotten ones? Human analysis of aerial images has already led to discovering Roman and Mayan ruins. Automating such analyses with LDR may lead to many more such discoveries than it may ever be possible by hand, with obvious cultural and economic advantages.
  • Navigation and investigation: Software that can understand commands like “find the closest medieval statue, or three-story brown building” may make it much easier to interact with any autonomous-drive vehicle, as well as finding all sort of places for totally different reasons, from buildings that violate building codes to crime scenes known only by rough descriptions of the victims.

The important thing is that you are using the right tools when creating these applications. For example, when real-time or almost real-time performance is needed for your application, the more complex approaches presented in our OpenVINO Notebooks may be preferable to the one presented here. Other times, you may have to combine multiple libraries and platforms.

I should also note that modern CPUs and GPUs — like Intel® Core™ processors, Intel® Xeon® processors, and Intel® Arc™ GPUs — make it easy to run and scale the same LDR application (or, for that matter, any other AI workload) wherever and however you need: from workstations to edge servers placed as close as possible to their users, or in the cloud when very large models or quantities of data are involved.

What matters is that you understand all the tools at your disposal, and that you can recognize every time which one is best for the job at hand and you aren’t afraid to try! Start building your application with the tools that seem best, try it, deploy it wherever it could be more effective, and if you find some mistake, update and repeat the whole cycle.

That is, after all, what the hacker’s mindset is all about, isn’t it?

To learn more AI concepts and how to apply them to real-world use cases, keep up with us at Intel AI Dev Team Adventures and check out our OpenVINO developer resources.

Notices & Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

--

--

OpenVINO™ toolkit
OpenVINO-toolkit

Deploy high-performance deep learning productively from edge to cloud with the OpenVINO™ toolkit.