AI and Geolocation: Evaluating State-of-the-Art Model StreetClip

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

8 min readSep 12, 2023

In the realm of geolocation, StreetClip’s AI prowess stands as a beacon of promise. Building upon our previous exploration in “Geolocation and AI with StreetClip,” it’s now time to rigorously evaluate its performance across a diverse array of images.

Before delving into the model validation, let’s take a moment to revisit what StreetClip is and what it can accomplish. With just a few lines of code, one can effortlessly create an application capable of classifying street view images and providing scores for the top ten most probable countries. To experience StreetClip in action, visit the demo.

For instance, when presented with an image from a geolocation challenge, StreetClip consistently pinpointed the correct country with a high confidence score.

Therefore, ensuring its optimal performance across a diverse range of images is of paramount importance.

Validation on 50k Street View Images

Dataset

Kaggle offers a dataset comprising 50,000 images from the Geoguessr game (link). This dataset aligns perfectly with StreetClip’s training on street view images.

Distribution of the images per country:


+--------------------+------+-------------+------+----------------------+-------+
| Albania            |   41 | Guam        |    8 | Palestine            |    46 |
| Andorra            |   13 | Guatemala   |   79 | Peru                 |   270 |
| Argentina          |  689 | Hungary     |  168 | Philippines          |   219 |
| Australia          | 1704 | Iceland     |   54 | Poland               |   863 |
| Austria            |  347 | India       |  160 | Portugal             |   242 |
| Bangladesh         |  106 | Indonesia   |  288 | Puerto Rico          |    46 |
| Belgium            |  219 | Ireland     |  290 | Romania              |   346 |
| Bermuda            |    3 | Israel      |  326 | Russia               |  1761 |
| Bhutan             |   20 | Italy       |  789 | Senegal              |    75 |
| Bolivia            |  116 | Japan       | 3840 | Serbia               |    62 |
| Botswana           |  144 | Jordan      |   85 | Singapore            |   707 |
| Brazil             | 2320 | Kenya       |  130 | Slovakia             |   108 |
| Bulgaria           |  217 | Kyrgyzstan  |   72 | Slovenia             |    66 |
| Cambodia           |  118 | Laos        |   61 | South Africa         |  1183 |
| Canada             | 1382 | Latvia      |  117 | South Korea          |   243 |
| Chile              |  326 | Lesotho     |   65 | Spain                |  1075 |
| China              |   13 | Lithuania   |  140 | Sri Lanka            |    85 |
| Colombia           |  251 | Luxembourg  |   25 | Sweden               |   726 |
| Croatia            |  129 | Madagascar  |   13 | Switzerland          |   173 |
| Czech Republic     |  257 | Malaysia    |  423 | Taiwan               |   547 |
| Denmark            |  198 | Malta       |   55 | Thailand             |   944 |
| Dominican Republic |   22 | Mexico      |  901 | Tunisia              |    87 |
| Ecuador            |   93 | Monaco      |    2 | Turkey               |   268 |
| Estonia            |   99 | Mongolia    |   83 | Uganda               |    55 |
| Finland            | 1049 | Montenegro  |   32 | Ukraine              |   114 |
| France             | 3573 | Netherlands |  579 | United Arab Emirates |    70 |
| Germany            |  698 | New Zealand |  557 | United Kingdom       |  2484 |
| Ghana              |  107 | Nigeria     |  123 | United States        | 12014 |
| Greece             |  248 | Norway      |  675 | Uruguay              |    57 |
| Greenland          |   11 | Pakistan    |   19 |                      |       |
+--------------------+------+-------------+------+----------------------+-------+

The images are organized into folders, each dedicated to a specific country. Let’s now explore how to validate the model on this dataset and derive pertinent metrics from this process.

Metrics

In terms of metrics, I opted to calculate:

Accuracy: the ratio of correctly predicted images to the total number of images.
Top-3 Accuracy: the ratio of times the correct country is predicted within the top three countries to the total number of images.
Accuracy After Filtering: considering predictions where the top country’s score exceeds 50%, 60%, 70% and 80%. This step is crucial as it ensures that predictions with high confidence scores are more accurate.

Implementation

Before we proceed with code implementation for model validation, it’s prudent to establish a virtual environment for proper package separation.

virtualenv venv
source venv/bin/activate
pip install transformers torch Pillow 
#Packages needed to generate the sankey charts
pip install pysankey matplotlib pandas seaborn

The initial step involves importing the requisite modules, loading the StreetClip model, and defining a prediction function that returns the top 10 classified countries.

#Import relevant modules
from PIL import Image
import torch
from transformers import CLIPProcessor, CLIPModel
from tqdm import tqdm
import os

#Define the list of labels (92 countries)
list_countries = ['Albania', 'Andorra', 'Argentina', 'Australia', 'Austria', 'Bangladesh', 'Belgium', 'Bermuda', 'Bhutan', 'Bolivia', 'Botswana', 'Brazil', 'Bulgaria', 'Cambodia', 'Canada', 'Chile', 'China', 'Colombia', 'Croatia', 'Czech Republic', 'Denmark', 'Dominican Republic', 'Ecuador', 'Estonia', 'Finland', 'France', 'Germany', 'Ghana', 'Greece', 'Greenland', 'Guam', 'Guatemala', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Ireland', 'Israel', 'Italy', 'Japan', 'Jordan', 'Kenya', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lesotho', 'Lithuania', 'Luxembourg', 'Macedonia', 'Madagascar', 'Malaysia', 'Malta', 'Mexico', 'Monaco', 'Mongolia', 'Montenegro', 'Netherlands', 'New Zealand', 'Nigeria', 'Norway', 'Pakistan', 'Palestine', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Romania', 'Russia', 'Rwanda', 'Senegal', 'Serbia', 'Singapore', 'Slovakia', 'Slovenia', 'South Africa', 'South Korea', 'Spain', 'Sri Lanka', 'Swaziland', 'Sweden', 'Switzerland', 'Taiwan', 'Thailand', 'Tunisia', 'Turkey', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay']

#run model on GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#Load the model and the processor
model = CLIPModel.from_pretrained("geolocal/StreetCLIP")
model.to(device)
processor = CLIPProcessor.from_pretrained("geolocal/StreetCLIP")

#Define the prediction function that return the top-3 countries with their score
def prediction(inp):
    inputs = processor(text=list_countries, images=inp, return_tensors="pt", padding=True).to(device)
    with torch.no_grad():
        outputs = model(**inputs)
    logits_per_image = outputs.logits_per_image
    prediction = logits_per_image.softmax(dim=1)
    #return top-3 countries and their score
    top = torch.topk(prediction[0],3)
    indices = top[1].tolist()
    scores=top[0].tolist()
    return [indices,scores]

The dataset’s file structure is as follows: ./dataset/[countryname]/[filename]. We’ll iterate through the images, country by country, and compute the designated metrics.

#Define metrics variable
metrics = ['accuracy', 'top3-accuracy', '+50%', '+60%', '+70%', '+80%']
stats = {metric: {'pos':0, 'total':0} for metric in metrics}
#Define a variable  to compute a sankey chart afterwards
confusion_mat = {'true': [], 'predicted': []}

#Read all the folder names. If country name is in list_countries, 
#retrieve all image filenames and process them
for ind, probe in enumerate(list_countries):

    image_paths = []
    for root, dirs, files in os.walk('./dataset/'+probe):
        for file in files:
            if file.endswith('jpg'):
                image_paths.append(  root  + '/'+ file)
    print(probe)
    for img in tqdm(image_paths):
        [indices,scores]=prediction(Image.open(img))
        top_countries = [list_countries[i] for i in indices]
        if scores[0]>=0.5:
            stats['+50%']['total']+=1
            if top_countries[0]==probe:
                stats['+50%']['pos']+=1
        if scores[0]>=0.6:
            stats['+60%']['total']+=1
            if top_countries[0]==probe:
                stats['+60%']['pos']+=1
        if scores[0]>=0.7:
            stats['+70%']['total']+=1
            if top_countries[0]==probe:
                stats['+70%']['pos']+=1
        if scores[0]>=0.8:
            stats['+80%']['total']+=1
            if top_countries[0]==probe:
                stats['+80%']['pos']+=1
        confusion_mat['true'].append(probe)
        confusion_mat['predicted'].append(top_countries[0] )
        if top_countries[0]==probe:
            stats['accuracy']['pos']+=1
        if probe in top_countries:
            stats['top3-accuracy']['pos']+=1
        stats['accuracy']['total']+=1
        stats['top3-accuracy']['total']+=1

Ultimately, here’s the code to produce the Sankey chart, using the data stored in the confusion_mat variable. As you will see below, we had to separate the data per country to ensure a proper visualisation of the chart.

#Those modules will be used to plot a sankey chart
from pySankey.sankey import sankey
import seaborn as sns
import matplotlib.pyplot as plt

def generate_colors(num_colors):
    palette = sns.color_palette("Set3", num_colors)  # Using the "Set3" palette for diversity
    hex_colors = [sns.set_hls_values(color, l=.5) for color in palette]
    return ["#%02x%02x%02x" % (int(r * 255), int(g * 255), int(b * 255)) for r, g, b in hex_colors]

colors_dict = {country: color for country, color in zip(list_countries, generate_colors(len(list_countries)))}

sankey(confusion_mat['true'], confusion_mat['predicted'], aspect=20, colorDict=colors_dict, fontsize=12)
fig = plt.gcf()
fig.set_size_inches(20, 20)
fig.set_facecolor("w")
fig.savefig("sankey_chart.png", bbox_inches="tight", dpi=150)

By adhering to these steps, we’ll gain valuable insights into the performance and accuracy of StreetClip in geolocating images.

Outcome of the validation

The validation results shed light on StreetClip’s robust performance in geolocating images.

Graph of the metrics:

Nb of images used for each metric

Sankey chart for a few countries

Charts of France, Spain and Italy.

Sankey chart for France, Spain and Italy

Same charts with top score greater than 80%:

Sankey charts for France, Spain and Italy with top score over 80%

Accuracy and top-3-accuracy per country

Additionally, we conducted individual accuracy and top-3 accuracy computations for each country to discern potential areas of under-performance:

Analysis

Accuracy and Top-3 Accuracy:

The overall accuracy is about 74%, which is decent but not perfect. However, when we look at the top-3 accuracy, it’s much better. In more than 9 out of 10 cases, the correct prediction is in those top three countries.

2. Filtering Scores for Higher Accuracy:

When we only consider predictions with scores above 70%, the accuracy shoots up to over 92%. This means we get very reliable results. However, it’s important to note that about 40% of the images may not have a location prediction.

3. Sankey Charts and Misclassifications:

The Sankey charts provide us with valuable insights. For example, France is sometimes mistaken for its neighboring countries such as Belgium, Luxembourg, Monaco, or Andorra. Similar situations occur with Spain and Portugal, or Italy with Austria and Slovenia.

4. Accuracy by Country:

Looking at the accuracy and top-3 accuracy for each country highlights which ones are often misclassified. Some countries with fewer images, like Monaco, Andorra, or Greenland, achieve perfect accuracy. On the other hand, countries such as Lithuania, Latvia, and Slovakia face challenges. For instance, Lithuania is frequently identified as Estonia or Poland when the confidence score is over 80%.

Conclusion

When it comes to geolocation, StreetClip has proven itself as a powerful tool, demonstrating the potential of AI in accurately classifying images. Through meticulous evaluation, we’ve witnessed its robustness in pinpointing the correct countries, particularly when high-confidence predictions are prioritized.

The validation process, conducted on a dataset of 50,000 images from Geoguessr, has provided valuable insights into StreetClip’s performance. While the overall accuracy of around 74% is commendable, the top-3 accuracy of over 91% speaks volumes about the model’s proficiency. Filtering predictions with scores above specific thresholds further refines its precision, albeit with a trade-off in prediction coverage.

It’s worth noting that fine-tuning the model based on these insights could potentially yield even better results.

If this article has been informative, please show your support by claping this story or leaving a comment below.

#Geolocation #AI #AIModelBenchmark

PS: a last challenge for the road. Can you find in which country this picture was taken?